In this section, we’ll load in external data using several built-in packages to the tidyverse. In particular, we’ll cover:

  1. Loading excel spreadsheets with readxl
  2. Loading in flat text file data, comma separated.
  3. Loading datafiles from SAS/SPSS with haven

Some things to keep in mind

For the most part, we will be loading in our data as data.frames or tbls with these packages.

One thing to keep in mind is that oftentimes, files have a bit of explanatory text and the header is not always going to be on the first line in our files.

Important note: The most common problem encountered when reading in data from a file is not reading specifying where to tell RStudio to look for the file. RStudio Projects were created to help with this process.

Look down to just below the words Console in the tabs in the bottom left of RStudio. Here you will find the “current working directory.” This is where R is working from at the moment. You can assess all files relative to this directory without doing anything extra at all in terms of specifying the path to get to the file. We’ll see examples of this in a bit. If you want to read in files from somewhere outside of this current working directory, you’d need to tell R exactly where to find the file. More on this too below.

Directories and file management is often really tricky for beginners. If you struggle with this, that’s OK. It will get better as you get more practice. Pay careful attention to the error messages and learn to love the Files tab in the bottom right of RStudio. This will help you track down where your files are located relative to the current working directory.

readxl package

The data we will use here comes from the movies data frame in the ggplot2movies package.

The readxl package handles loading both 2008 (xlsx) and earlier excel files (xls). To read in Excel files, use the read_excel() function:

movies_from_excel <- read_excel("data/movies.xlsx", sheet = 1)