Intro to the case_when function

case_when() is a dplyr function that returns a value according to predifined conditions. It’s a very powerful function though not very famous. In our example, we’ll use the mtcars dataset (just as usual).

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Suppose in the context of a study we want to categorize the cars into two factors : Climate_Friendly and Climate_Unfriendly according to the horsepower (hp) median. We can create a new column that incoporates this information using the case_when() and mutate functions:

library(dplyr)
library(magrittr) # required for the %>% 

median <- median(mtcars$hp) # first, we calculate our median 

new_df <- mtcars %>% mutate(climat_categ = case_when(
    hp < median ~ "friendly",
    hp >= median ~ "unfriendly"
    ))


new_df$climat_categ <- as.factor(new_df$climat_categ)

new_df %>% select(hp, climat_categ)  # A quick check ! 
##     hp climat_categ
## 1  110     friendly
## 2  110     friendly
## 3   93     friendly
## 4  110     friendly
## 5  175   unfriendly
## 6  105     friendly
## 7  245   unfriendly
## 8   62     friendly
## 9   95     friendly
## 10 123   unfriendly
## 11 123   unfriendly
## 12 180   unfriendly
## 13 180   unfriendly
## 14 180   unfriendly
## 15 205   unfriendly
## 16 215   unfriendly
## 17 230   unfriendly
## 18  66     friendly
## 19  52     friendly
## 20  65     friendly
## 21  97     friendly
## 22 150   unfriendly
## 23 150   unfriendly
## 24 245   unfriendly
## 25 175   unfriendly
## 26  66     friendly
## 27  91     friendly
## 28 113     friendly
## 29 264   unfriendly
## 30 175   unfriendly
## 31 335   unfriendly
## 32 109     friendly

Let’us plot the count of the cars according to the ‘climat_categ’ variable.

library(ggplot2)

ggplot(new_df, aes(x =climat_categ))+ 
  geom_bar(stat = "count", fill = "darkolivegreen1", color = "blue") +
  theme_classic()+
  labs(title ="Count distribution of eco and non-eco friendly cars", x = "" )

Avatar
Mohamed El Fodil Ihaddaden
Ph.D candidate in Economics.

My research interests include Performance Management, Data Envelopment Analysis and Artificial Intelligence applied to Economics.