Working with R Built-in Data Sets

In my tutorials, I prefer to work with the mtcars data set (DST) because I find it pretty straightforward. Nonetheless, one may be attempted to play and try some functions on a specific kind of DST. What is great in R is that it cames with a variety of pre-built DST. Yes, you just have to choose!

R Base Data Sets

Base DSTs are the data bases available within the base R. They are not related to any installed package. If you want to display the title of the full list of R base DSTs, you can write the following very simple command:

data <- data() 

R_base <- data$results[,3] # All base R data sets

head(R_base)
## [1] "AirPassengers"          "BJsales"               
## [3] "BJsales.lead (BJsales)" "BOD"                   
## [5] "CO2"                    "ChickWeight"

We can also print the description of each DST.

R_base_description <- data$results[,4]

head(R_base_description)
## [1] "Monthly Airline Passenger Numbers 1949-1960"   
## [2] "Sales Data with Leading Indicator"             
## [3] "Sales Data with Leading Indicator"             
## [4] "Biochemical Oxygen Demand"                     
## [5] "Carbon Dioxide Uptake in Grass Plants"         
## [6] "Weight versus age of chicks on different diets"

We can have both …

base_base <- cbind(R_base,R_base_description )

base_base <- as.data.frame(base_base)

head(base_base)
##                   R_base                             R_base_description
## 1          AirPassengers    Monthly Airline Passenger Numbers 1949-1960
## 2                BJsales              Sales Data with Leading Indicator
## 3 BJsales.lead (BJsales)              Sales Data with Leading Indicator
## 4                    BOD                      Biochemical Oxygen Demand
## 5                    CO2          Carbon Dioxide Uptake in Grass Plants
## 6            ChickWeight Weight versus age of chicks on different diets

Getting All the Available DSTs

Some packages, available on CRAN or Github come with one or many data sets. For example, if you install the famous dplyr, you’ll get for free the cool starwars DST. In order to list all your DSTs according to the packages that you’ve installed, execute the following command:

data(package = .packages(all.available = TRUE))

Imagine you’re a big fan of carbon (why not) and you’re too lazy (like me) to open your browser. The following command lists ALL (base and installed) packages that contains the word “carbon” in their DSTs’ description.

ALL_packages <- data(package = .packages(all.available = TRUE))

ALL_packages <- ALL_packages[[3]]

ALL_packages[grep("carbon", ALL_packages[, 4], ignore.case = TRUE), ]
##      Package    LibPath                              Item          
## [1,] "datasets" "C:/Program Files/R/R-3.6.1/library" "CO2"         
## [2,] "abd"      "C:/Program Files/R/R-3.6.1/library" "AlgaeCO2"    
## [3,] "agridat"  "C:/Program Files/R/R-3.6.1/library" "waynick.soil"
## [4,] "boot"     "C:/Program Files/R/R-3.6.1/library" "co.transfer" 
## [5,] "fields"   "C:/Program Files/R/R-3.6.1/library" "WorldBankCO2"
##      Title                                                              
## [1,] "Carbon Dioxide Uptake in Grass Plants"                            
## [2,] "Carbon Dioxide and Growth Rate in Algae"                          
## [3,] "Soil nitrogen and carbon in two fields"                           
## [4,] "Carbon Monoxide Transfer"                                         
## [5,] "Carbon emissions and demographic covariables by country for 1999."

Finally, let’s say you have found your âme sœur and it’s the Carbon Dioxide and Growth Rate in Algae DST. To load it into your environment, use:

library(abd) # First load the corresponding package

data("AlgaeCO2") # load the AlgaeCO2

AlgaeCO2 
##     treatment growthrate
## 1  normal CO2       2.31
## 2  normal CO2       1.95
## 3  normal CO2       1.86
## 4  normal CO2       1.59
## 5  normal CO2       1.55
## 6  normal CO2       1.30
## 7  normal CO2       1.07
## 8    high CO2       2.37
## 9    high CO2       1.89
## 10   high CO2       1.55
## 11   high CO2       1.49
## 12   high CO2       1.26
## 13   high CO2       1.20
## 14   high CO2       0.98
Avatar
Mohamed El Fodil Ihaddaden
Ph.D candidate in Economics.

My research interests include Performance Management, Data Envelopment Analysis and Artificial Intelligence applied to Economics.