Three ways to get a summary of your data

Summary

To get a first idea about a certain dataframe’s statistical metrics, there is of course the well known summary() function :

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

However, there are other interesting functions that provide more structured and exhaustive information.

describe

The psych package has a function called describe() that provides several statistics that are not available within the summary() function.

library(psych)

describe(mtcars)
##      vars  n   mean     sd median trimmed    mad   min    max  range  skew
## mpg     1 32  20.09   6.03  19.20   19.70   5.41 10.40  33.90  23.50  0.61
## cyl     2 32   6.19   1.79   6.00    6.23   2.97  4.00   8.00   4.00 -0.17
## disp    3 32 230.72 123.94 196.30  222.52 140.48 71.10 472.00 400.90  0.38
## hp      4 32 146.69  68.56 123.00  141.19  77.10 52.00 335.00 283.00  0.73
## drat    5 32   3.60   0.53   3.70    3.58   0.70  2.76   4.93   2.17  0.27
## wt      6 32   3.22   0.98   3.33    3.15   0.77  1.51   5.42   3.91  0.42
## qsec    7 32  17.85   1.79  17.71   17.83   1.42 14.50  22.90   8.40  0.37
## vs      8 32   0.44   0.50   0.00    0.42   0.00  0.00   1.00   1.00  0.24
## am      9 32   0.41   0.50   0.00    0.38   0.00  0.00   1.00   1.00  0.36
## gear   10 32   3.69   0.74   4.00    3.62   1.48  3.00   5.00   2.00  0.53
## carb   11 32   2.81   1.62   2.00    2.65   1.48  1.00   8.00   7.00  1.05
##      kurtosis    se
## mpg     -0.37  1.07
## cyl     -1.76  0.32
## disp    -1.21 21.91
## hp      -0.14 12.12
## drat    -0.71  0.09
## wt      -0.02  0.17
## qsec     0.34  0.32
## vs      -2.00  0.09
## am      -1.92  0.09
## gear    -1.07  0.13
## carb     1.26  0.29

You can see that there is a star next to the am variable. This way, the describe() function is telling us that the am variable is a factor and it doesn’t make sense to measure for example its mean or its standard deviation.

In this context, there is an argument that allows us to omit non-numeric variables.

describe(mtcars, omit = TRUE)
##      vars  n   mean     sd median trimmed    mad   min    max  range  skew
## mpg     1 32  20.09   6.03  19.20   19.70   5.41 10.40  33.90  23.50  0.61
## cyl     2 32   6.19   1.79   6.00    6.23   2.97  4.00   8.00   4.00 -0.17
## disp    3 32 230.72 123.94 196.30  222.52 140.48 71.10 472.00 400.90  0.38
## hp      4 32 146.69  68.56 123.00  141.19  77.10 52.00 335.00 283.00  0.73
## drat    5 32   3.60   0.53   3.70    3.58   0.70  2.76   4.93   2.17  0.27
## wt      6 32   3.22   0.98   3.33    3.15   0.77  1.51   5.42   3.91  0.42
## qsec    7 32  17.85   1.79  17.71   17.83   1.42 14.50  22.90   8.40  0.37
## vs      8 32   0.44   0.50   0.00    0.42   0.00  0.00   1.00   1.00  0.24
## am      9 32   0.41   0.50   0.00    0.38   0.00  0.00   1.00   1.00  0.36
## gear   10 32   3.69   0.74   4.00    3.62   1.48  3.00   5.00   2.00  0.53
## carb   11 32   2.81   1.62   2.00    2.65   1.48  1.00   8.00   7.00  1.05
##      kurtosis    se
## mpg     -0.37  1.07
## cyl     -1.76  0.32
## disp    -1.21 21.91
## hp      -0.14 12.12
## drat    -0.71  0.09
## wt      -0.02  0.17
## qsec     0.34  0.32
## vs      -2.00  0.09
## am      -1.92  0.09
## gear    -1.07  0.13
## carb     1.26  0.29

skim

Finally, we present the skim() function from the skimr package which provides some statistics and a tiny nice histogram for the numerical variables (the histograms cannot be rendered in the website, but just try it out) !!!

library(skimr)
## 
## Attachement du package : 'skimr'
## The following object is masked from 'package:stats':
## 
##     filter
skim(mtcars)
## Skim summary statistics
##  n obs: 32 
##  n variables: 11 
## 
## -- Variable type:numeric --------------------------
##  variable missing complete  n   mean     sd    p0    p25    p50    p75
##        am       0       32 32   0.41   0.5   0      0      0      1   
##      carb       0       32 32   2.81   1.62  1      2      2      4   
##       cyl       0       32 32   6.19   1.79  4      4      6      8   
##      disp       0       32 32 230.72 123.94 71.1  120.83 196.3  326   
##      drat       0       32 32   3.6    0.53  2.76   3.08   3.7    3.92
##      gear       0       32 32   3.69   0.74  3      3      4      4   
##        hp       0       32 32 146.69  68.56 52     96.5  123    180   
##       mpg       0       32 32  20.09   6.03 10.4   15.43  19.2   22.8 
##      qsec       0       32 32  17.85   1.79 14.5   16.89  17.71  18.9 
##        vs       0       32 32   0.44   0.5   0      0      0      1   
##        wt       0       32 32   3.22   0.98  1.51   2.58   3.33   3.61
##    p100     hist
##    1    <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586>
##    8    <U+2586><U+2587><U+2582><U+2587><U+2581><U+2581><U+2581><U+2581>
##    8    <U+2586><U+2581><U+2581><U+2583><U+2581><U+2581><U+2581><U+2587>
##  472    <U+2587><U+2586><U+2581><U+2582><U+2585><U+2583><U+2581><U+2582>
##    4.93 <U+2583><U+2587><U+2581><U+2585><U+2587><U+2582><U+2581><U+2581>
##    5    <U+2587><U+2581><U+2581><U+2586><U+2581><U+2581><U+2581><U+2582>
##  335    <U+2583><U+2587><U+2583><U+2585><U+2582><U+2583><U+2581><U+2581>
##   33.9  <U+2583><U+2587><U+2587><U+2587><U+2583><U+2582><U+2582><U+2582>
##   22.9  <U+2583><U+2582><U+2587><U+2586><U+2583><U+2583><U+2581><U+2581>
##    1    <U+2587><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581><U+2586>
##    5.42 <U+2583><U+2583><U+2583><U+2587><U+2586><U+2581><U+2581><U+2582>

I’m sure there may be many other super cool functions, please DM me at @IhaddadenFodil if I’ve missed something.

Avatar
Mohamed El Fodil Ihaddaden
Ph.D candidate in Economics.

My research interests include Performance Management, Data Envelopment Analysis and Artificial Intelligence applied to Economics.

Related