Although I’m a big fan of the tidyverse philosophy of handling and wrangling data, one must admit that there are quit powerful functions in base R. One of theses functions is subset()
which returns a dataframe according to some defined subsetting properties. Let’s dive into one example using the simple mtcars data:
head(mtcars) # A quick look at the mtcars data
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Suppose, we want to extract all the vehicules that have an mpg greater than 20:
subset(mtcars, mpg > 20) # the first argument is the dataframe
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
You can observe that the subset function is flexible. We don’t have to specify the column name with the dollar sign (mtcars$mpg).
Let’s take another more complex example. We will extract all vehicules that have an mpg superior to 30 and a cyl equal to 4:
subset(mtcars, mpg > 30 & cyl == 4) # & <=> AND
## mpg cyl disp hp drat wt qsec vs am gear carb
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
As you can see, the subset function works smoothly with the R logical expressions.
Finally, there is another important argument that we must see. Suppose, we want to extract a specified column, let’s say that we’re solely interested in the wt variable. We can extract this column depending using the select
argument:
subset(mtcars, mpg > 30 & cyl == 4, select = wt)
## wt
## Fiat 128 2.200
## Honda Civic 1.615
## Toyota Corolla 1.835
## Lotus Europa 1.513
In the same way, we can extract several columns :
subset(mtcars, mpg > 30 & cyl == 4, select = c(wt, disp, am))
## wt disp am
## Fiat 128 2.200 78.7 1
## Honda Civic 1.615 75.7 1
## Toyota Corolla 1.835 71.1 1
## Lotus Europa 1.513 95.1 1