An Introduction to the ggplot2 package.

The ggplot2 package integrates the tidy philosophy of programming introduced mainly by Hadley Wickham. It is part of the tidyverse package which includes several other packages related to the tidy world. Through this is tutorial, we’ll present some basic ggplot functionalities using the mtcars data base.

First of of all, you need to install the ggplot2 package which can be done either by

install.packages("tidyverse")

or

install.packages("ggplot2")

Plotting with ggplot

The mtcars object is a dataframe included in R that displays 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Let us have a quick overview of the mtcars dataset.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

We want to plot the relation between the miles per galon (mpg) and the weights (wt) of our cars’ models. To do that in ggplot2 one need to identify three principal elements (or two depending on your desired graph): The database from which the information will be extracted, the x-axis variable and the y-axis variable.

library(ggplot2) # or library(tidyverse)

ggplot(data = mtcars, mapping = aes(x = mpg, y = wt))

In the exemple above, we used the function ggplot and two parameters data and mapping to structure our graph. The mapping parameters determine which variable should be considered in which axis. To plot our volue we need to add another argument to our code. Suppose we want to plot a scatterplot :

ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
  geom_point()   # geom_point() refers to scatterplot

To modify the size of the color of the displayed points, we just introduce the following parameters in the geom_point() function :

ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
  geom_point(size = 3, color = "red")   # geom_point() refers to scatterplot

For an exhaustive list of R colours, just check this website. Now, if we want to modify the theme of our plot, proceed as follow :

ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
  geom_point(size = 3, color = "red") + 
  theme_bw()  #just tape theme and wait to see several choices

finally, it is possible to assign a plot to a defined variable :

plot1 <- ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
  geom_point(size = 3, color = "red") + 
  theme_bw()  

Plotting Interactively with Plotly

Plotly is an interesting package that allows us to create interactive web graphics from ggplot2. Just convert our above scatterplot into an interactive one, just use the function ggplotly :

library(plotly)  # Load the package after installing it

ggplotly(plot1)

Several functionalities appear in the top-right of the plot.

Plotting Distributions with Histograms

Histogram are suitable for plotting a continous variable. If you want to make a histogram use geom_histogram:

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can change the size of the bins using the binwidth parameter :

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(binwidth = 3)

By default, the data is grouped into 30 bins. To can change the number of bins, we use the parameter bins:

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(bins = 6)

To modify the color of the histogram, just proceed as follows:

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(fill = "pink", color = "yellow") # Mind the difference

We can also use the plotly package to plot an interactive histogram.

histogram <- ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(fill = "pink", color = "yellow") 

ggplotly(histogram)

Filling Histograms with Categories

Suppose we are interested in ploting the distribution of the miles per gallon (mpg) variable but this time we want to differentiate between automatic and manuel cars (am : 0 = automatic, 1 = manual). First of all, we must convert the am column into a factor variable:

mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual")) # We provide labels 

ggplot(data = mtcars, mapping = aes(x = mpg, fill = am)) +
  geom_histogram() 

From the above histogram, we observe that manual vehicules have a higher mpg performance.

Splitting Plots with Facets

We show above how to plot a distribution according to a specific category. One may proceed otherwise and generate two distinc plots using Face_grid() :

ggplot(data = mtcars, mapping = aes(x = mpg)) +
  geom_histogram(fill = "yellow", color = "pink") +
  facet_grid(~am) +
  theme_light()

Avatar
Mohamed El Fodil Ihaddaden
Ph.D candidate in Economics.

My research interests include Performance Management, Data Envelopment Analysis and Artificial Intelligence applied to Economics.