R Base Gems

agrep()

from Unsplash by Wolfgang Hasselmann

The countries vector below lists some African countries. The last three values contain mistakes (Algerria, Morocoo and algeri). In real life, it’s usual to work with imperfect data. The agrep() function allows us to deal with this specific situation by looking at approximate patterns. Suppose, we want to extract the elements that contains the word Algeria within the countries vector:

countries <- c("Algeria", "Morocco", "Tunisia", "Mali", "Tchad", "Kenya", "Algerria", "Morocoo", "algeri")

indexes <- agrep(pattern = "Algeria", x = countries, ignore.case = TRUE)

countries[indexes]
## [1] "Algeria"  "Algerria" "algeri"

abbreviate()

from Unsplash by Kirill Pershin

The above problem can also be handled using the abbreviate() function:

# Transform the words to lower cases 

countries_lower <- tolower(countries)

abbreviate(
  
  names.arg = countries_lower, 
  
  minlength = 3, 
  
  strict = TRUE, # We permit duplications
  
  named = FALSE
)
## [1] "alg" "mrc" "tns" "mal" "tch" "kny" "alg" "mrc" "alg"

table()

from Unsplash by Marcus Spiske

table() is a famous function that displays the counts of appearance of each value within a vector.

countries <- c("Algeria", "Algeria", "Mali", "Kenya", "Mali", "Mali", "Senegal", "Uganda", "Senegal", "Morocco", "Senegal", "Senegal", "Senegal", NA, NA, NA, NA, NA, NA)

table(countries, useNA = "no")
## countries
## Algeria   Kenya    Mali Morocco Senegal  Uganda 
##       2       1       3       1       5       1

We can change the useNA argument to "always" to get the count of NAs:

table(countries, useNA = "always")
## countries
## Algeria   Kenya    Mali Morocco Senegal  Uganda    <NA> 
##       2       1       3       1       5       1       6

If you want to sort by count of appearance:

my_tab <- table(countries, useNA = "no")

sort(x = my_tab, decreasing = TRUE)
## countries
## Senegal    Mali Algeria   Kenya Morocco  Uganda 
##       5       3       2       1       1       1

You can quickly visualize the distribution of the countries vector:

sort_tab <- sort(x = my_tab, decreasing = TRUE)

barplot(sort_tab, ylab = "Counts", col = "steelblue")

jitter()

from Unsplash by Dragisa Braunovic

jitter() allows you to introduce some fluctuations to a vector of values

# run mtcars$mpg to check the difference
jitter(mtcars$mpg)
##  [1] 20.99445 21.00335 22.79127 21.39489 18.71063 18.10950 14.28911 24.40835
##  [9] 22.79397 19.21747 17.81193 16.41927 17.28185 15.20250 10.41295 10.41621
## [17] 14.68476 32.41836 30.39589 33.88208 21.50608 15.51768 15.19893 13.31650
## [25] 19.19906 27.30775 26.00450 30.38734 15.81134 19.70238 15.00235 21.39310

comment()

from Unsplash by Wolfgang Hasselmann

The comment() function is particularly useful when you want to bind some comments to a specific object. When the object is printed, the comments won’t be displayed.

comment(mtcars) <- "This data frame has no NAs, go ahead !"

comment(mtcars)
## [1] "This data frame has no NAs, go ahead !"

The attributes() function will also retrieve the comments

attributes(mtcars)
## $names
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
## 
## $row.names
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"         
## 
## $class
## [1] "data.frame"
## 
## $comment
## [1] "This data frame has no NAs, go ahead !"

make.unique()

from Unsplash by Wolfgang Hasselmann

make.unique() is a pretty powerful function. It appends a sequence of numbers to duplicates in order to make vector’s elements unique:

countries <- c("Algeria", "Morocco", "Algeria", "Algeria", "Morocco", "Tunisia", "Morocco", "Tunisia")

make.unique(names = countries, sep = " -_- ")
## [1] "Algeria"       "Morocco"       "Algeria -_- 1" "Algeria -_- 2"
## [5] "Morocco -_- 1" "Tunisia"       "Morocco -_- 2" "Tunisia -_- 1"

startsWith() and endsWith()

from Unsplash by Gilberto Olimpio

startsWith()/endsWith() detect the elements of a vector (character) that start/end with a specific character(s):

countries <- c("Armania", "Argentina", "Antalya", "Adelaide", "Abidjan")

startsWith(x = countries,
           prefix = "Ar")
## [1]  TRUE  TRUE FALSE FALSE FALSE
countries <- c("Armania", "Argentina", "Antalya", "Adelaide", "Abidjan")

endsWith(x = countries,
         suffix = "an")
## [1] FALSE FALSE FALSE FALSE  TRUE

quarters.Date()

from Unsplash by Annie Spratt

quarters.Date() converts a date to its corresponding quarter (Q1, Q2, Q3 or Q4):

my_dates <- c("2020-01-01", "2005-03-25", "2010-04-02", "2020-12-10", "2011-08-15")

quarters.Date(my_dates)
## [1] "Q1" "Q1" "Q2" "Q4" "Q3"
Avatar
Mohamed El Fodil Ihaddaden
Ph.D candidate in Economics.

My research interests include Performance Management, Efficiency Analysis and Experimental Economics.

Related