Introduction to the tidyverse
The tidyverse is a set of packages which attempts to make R more consistent and more similar to programming languages which were developed by computer scientists rather than statisticians.
You can think of it as a more modern version of R.
Base R or tidyverse?
“Base R” refers to the use of the standard R library. The expression is often used in contrast to the tidyverse.
There are a many things that you can do with either base R or the tidyverse. Because the syntaxes are quite different, it almost feels like using two different languages and people tend to favour one or the other.
Which one you should use is really up to you.
Base R | Tidyverse |
---|---|
Preferred by old-schoolers | Increasingly becoming the norm with newer R users |
More stable | More consistent syntax and behaviour |
Doesn’t require installing and loading packages | More and more resources and documentation available |
In truth, even though the tidyverse has many detractors amongst old R users, it is increasingly becoming the norm.
A glimpse of the tidyverse
The best introduction to the tidyverse is probably the book R for Data Science by Hadley Wickham and Garrett Grolemund.
Posit (the company formerly known as RStudio Inc. behind the tidyverse) developed a series of useful cheatsheets. Below are links to the ones you are the most likely to use as you get started with R.
Data import
The first thing you often need to do is to import your data into R. This is done with readr
.
Data transformation
You then often need to transformation your data into the right format. This is done with the packages dplyr
and tidyr
.
Visualization
Visualization in the tidyverse is done with the ggplot2
package which we will explore in the next section.
Working with factors
The package forcats
offers the tidyverse approach to working with factors.
Working with strings
stringr
is for strings.
Working with dates
lubridate
will help you deal with dates.
Functional programming
Finally, purrr
is the tidyverse equivalent to the apply functions in base R: a way to run functions on functions.