agrep()
from Unsplash by Wolfgang Hasselmann
The countries
vector below lists some African countries. The last three values contain mistakes (Algerria, Morocoo and algeri). In real life, it’s usual to work with imperfect data. The agrep()
function allows us to deal with this specific situation by looking at approximate patterns. Suppose, we want to extract the elements that contains the word Algeria within the countries
vector:
countries <- c("Algeria", "Morocco", "Tunisia", "Mali", "Tchad", "Kenya", "Algerria", "Morocoo", "algeri")
indexes <- agrep(pattern = "Algeria", x = countries, ignore.case = TRUE)
countries[indexes]
## [1] "Algeria" "Algerria" "algeri"
abbreviate()
from Unsplash by Kirill Pershin
The above problem can also be handled using the abbreviate()
function:
# Transform the words to lower cases
countries_lower <- tolower(countries)
abbreviate(
names.arg = countries_lower,
minlength = 3,
strict = TRUE, # We permit duplications
named = FALSE
)
## [1] "alg" "mrc" "tns" "mal" "tch" "kny" "alg" "mrc" "alg"
table()
from Unsplash by Marcus Spiske
table()
is a famous function that displays the counts of appearance of each value within a vector.
countries <- c("Algeria", "Algeria", "Mali", "Kenya", "Mali", "Mali", "Senegal", "Uganda", "Senegal", "Morocco", "Senegal", "Senegal", "Senegal", NA, NA, NA, NA, NA, NA)
table(countries, useNA = "no")
## countries
## Algeria Kenya Mali Morocco Senegal Uganda
## 2 1 3 1 5 1
We can change the useNA
argument to "always"
to get the count of NAs:
table(countries, useNA = "always")
## countries
## Algeria Kenya Mali Morocco Senegal Uganda <NA>
## 2 1 3 1 5 1 6
If you want to sort by count of appearance:
my_tab <- table(countries, useNA = "no")
sort(x = my_tab, decreasing = TRUE)
## countries
## Senegal Mali Algeria Kenya Morocco Uganda
## 5 3 2 1 1 1
You can quickly visualize the distribution of the countries
vector:
sort_tab <- sort(x = my_tab, decreasing = TRUE)
barplot(sort_tab, ylab = "Counts", col = "steelblue")
search()
from Unsplash by Ak Ka
search()
is a nice little function that tells you which packages are attached to your current session.
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
jitter()
from Unsplash by Dragisa Braunovic
jitter()
allows you to introduce some fluctuations to a vector of values
# run mtcars$mpg to check the difference
jitter(mtcars$mpg)
## [1] 20.99445 21.00335 22.79127 21.39489 18.71063 18.10950 14.28911 24.40835
## [9] 22.79397 19.21747 17.81193 16.41927 17.28185 15.20250 10.41295 10.41621
## [17] 14.68476 32.41836 30.39589 33.88208 21.50608 15.51768 15.19893 13.31650
## [25] 19.19906 27.30775 26.00450 30.38734 15.81134 19.70238 15.00235 21.39310
comment()
from Unsplash by Wolfgang Hasselmann
The comment()
function is particularly useful when you want to bind some comments to a specific object. When the object is printed, the comments won’t be displayed.
comment(mtcars) <- "This data frame has no NAs, go ahead !"
comment(mtcars)
## [1] "This data frame has no NAs, go ahead !"
The attributes()
function will also retrieve the comments
attributes(mtcars)
## $names
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
##
## $row.names
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
##
## $class
## [1] "data.frame"
##
## $comment
## [1] "This data frame has no NAs, go ahead !"
make.unique()
from Unsplash by Wolfgang Hasselmann
make.unique()
is a pretty powerful function. It appends a sequence of numbers to duplicates in order to make vector’s elements unique:
countries <- c("Algeria", "Morocco", "Algeria", "Algeria", "Morocco", "Tunisia", "Morocco", "Tunisia")
make.unique(names = countries, sep = " -_- ")
## [1] "Algeria" "Morocco" "Algeria -_- 1" "Algeria -_- 2"
## [5] "Morocco -_- 1" "Tunisia" "Morocco -_- 2" "Tunisia -_- 1"
startsWith()
and endsWith()
from Unsplash by Gilberto Olimpio
startsWith()
/endsWith()
detect the elements of a vector (character) that start/end with a specific character(s):
countries <- c("Armania", "Argentina", "Antalya", "Adelaide", "Abidjan")
startsWith(x = countries,
prefix = "Ar")
## [1] TRUE TRUE FALSE FALSE FALSE
countries <- c("Armania", "Argentina", "Antalya", "Adelaide", "Abidjan")
endsWith(x = countries,
suffix = "an")
## [1] FALSE FALSE FALSE FALSE TRUE
quarters.Date()
from Unsplash by Annie Spratt
quarters.Date()
converts a date to its corresponding quarter (Q1, Q2, Q3 or Q4):
my_dates <- c("2020-01-01", "2005-03-25", "2010-04-02", "2020-12-10", "2011-08-15")
quarters.Date(my_dates)
## [1] "Q1" "Q1" "Q2" "Q4" "Q3"