In this section, we’ll load in external data using several built-in packages to the tidyverse. In particular, we’ll cover:
- Loading excel spreadsheets with readxl
- Loading in flat text file data, comma separated.
- Loading datafiles from SAS/SPSS with haven
Some things to keep in mind
For the most part, we will be loading in our data as data.frames or tbls with these packages.
One thing to keep in mind is that oftentimes, files have a bit of explanatory text and the header is not always going to be on the first line in our files.
Important note: The most common problem encountered when reading in data from a file is not reading specifying where to tell RStudio to look for the file. RStudio Projects were created to help with this process.
Look down to just below the words Console in the tabs in the bottom left of RStudio. Here you will find the “current working directory.” This is where R is working from at the moment. You can assess all files relative to this directory without doing anything extra at all in terms of specifying the path to get to the file. We’ll see examples of this in a bit. If you want to read in files from somewhere outside of this current working directory, you’d need to tell R exactly where to find the file. More on this too below.
Directories and file management is often really tricky for beginners. If you struggle with this, that’s OK. It will get better as you get more practice. Pay careful attention to the error messages and learn to love the Files tab in the bottom right of RStudio. This will help you track down where your files are located relative to the current working directory.
 
readxl package
The data we will use here comes from the movies data frame in the ggplot2movies package.
The readxl package handles loading both 2008 (xlsx) and earlier excel files (xls). To read in Excel files, use the read_excel() function:
library(readxl)
movies_from_excel <- read_excel("data/movies.xlsx", sheet = 1)
movies_from_excel
 
readr package
Flat files, such as comma-separated-value (.csv), and other types of simple rectangular data.
- read_csv: Reading in comma separated value files
 - library(readr)
movies_from_csv <- read_csv("data/movies.csv")
 - ## Parsed with column specification:
## cols(
##   .default = col_double(),
##   title = col_character(),
##   year = col_integer(),
##   length = col_integer(),
##   budget = col_integer(),
##   votes = col_integer(),
##   mpaa = col_character(),
##   Action = col_integer(),
##   Animation = col_integer(),
##   Comedy = col_integer(),
##   Drama = col_integer(),
##   Documentary = col_integer(),
##   Romance = col_integer(),
##   Short = col_integer()
## )
 - ## See spec(...) for full column specifications.
 - movies_from_csv
 - 
 
- read_delim: Your general purpose flat file reader. For example, if you have a tab delimited file, you can specify- delim= “” (“” is how you specify tabs).
 - library(readr)
movies <- read_delim("data/movies.csv", delim = ",")
 - ## Parsed with column specification:
## cols(
##   .default = col_double(),
##   title = col_character(),
##   year = col_integer(),
##   length = col_integer(),
##   budget = col_integer(),
##   votes = col_integer(),
##   mpaa = col_character(),
##   Action = col_integer(),
##   Animation = col_integer(),
##   Comedy = col_integer(),
##   Drama = col_integer(),
##   Documentary = col_integer(),
##   Romance = col_integer(),
##   Short = col_integer()
## )
 - ## See spec(...) for full column specifications.
 - movies
 - 
 
Be sure to check the help for all of these (via ?read_delim) to see all of the other options available. Some that come up frequently in our own experience are
- na- If your missing values are coded differently than- NA, you can set the value here.
- skip- if there’s explanatory text before the header, use this argument to skip these lines. For example, if your header starts at line 4, use- skip = 3
- col_types- this is where you have more control on the data types for each column.
 
haven package
Reading in “external” data from inferior, I mean, other software packages like SPSS (.sav), Stata (.dta), SAS, etc. Check the haven package website for assistance.
 
Your turn
Exercise 2.1
Load the periodic table data stored in the data folder. (Hint: you’ll have to use the haven package - look for a read_ function that can read the .sav file format.)
# Write and check your answer here.
 
 
What you learned in this section
Read in data from files into data.frames from
- Excel files using the read_excel()function in thereadxlpackage,
- CSV files using the read_csv()andread_delim()functions in thereadrpackage
- data from proprietary statistical software using the read_spss(),read_stata(), andread_sas()functions in thehavenpackage
 
LS0tCnRpdGxlOiAiUGFydCAyIC0gTG9hZGluZyBkYXRhIHdpdGggYHJlYWR4bGAsIGBoYXZlbmAgYW5kIGByZWFkcmAiCmF1dGhvcjogIkNoZXN0ZXIgSXNtYXkgYW5kIFRlZCBMYWRlcmFzIgpvdXRwdXQ6IAogIGh0bWxfZG9jdW1lbnQ6CiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUKICAgIGRmX3ByaW50OiBwYWdlZAotLS0KCkluIHRoaXMgc2VjdGlvbiwgd2UnbGwgbG9hZCBpbiBleHRlcm5hbCBkYXRhIHVzaW5nIHNldmVyYWwgYnVpbHQtaW4gcGFja2FnZXMgdG8gdGhlIGB0aWR5dmVyc2VgLiBJbiBwYXJ0aWN1bGFyLCB3ZSdsbCBjb3ZlcjoKCjEuIExvYWRpbmcgZXhjZWwgc3ByZWFkc2hlZXRzIHdpdGggYHJlYWR4bGAKMi4gTG9hZGluZyBpbiBmbGF0IHRleHQgZmlsZSBkYXRhLCBjb21tYSBzZXBhcmF0ZWQuCjMuIExvYWRpbmcgZGF0YWZpbGVzIGZyb20gU0FTL1NQU1Mgd2l0aCBgaGF2ZW5gCgoKIyMgU29tZSB0aGluZ3MgdG8ga2VlcCBpbiBtaW5kCgpGb3IgdGhlIG1vc3QgcGFydCwgd2Ugd2lsbCBiZSBsb2FkaW5nIGluIG91ciBkYXRhIGFzIGBkYXRhLmZyYW1lYHMgb3IgYHRibGBzIHdpdGggdGhlc2UgcGFja2FnZXMuIAoKT25lIHRoaW5nIHRvIGtlZXAgaW4gbWluZCBpcyB0aGF0IG9mdGVudGltZXMsIGZpbGVzIGhhdmUgYSBiaXQgb2YgZXhwbGFuYXRvcnkgdGV4dCBhbmQgdGhlIGhlYWRlciBpcyBub3QgYWx3YXlzIGdvaW5nIHRvIGJlIG9uIHRoZSBmaXJzdCBsaW5lIGluIG91ciBmaWxlcy4KCioqSW1wb3J0YW50IG5vdGUqKjogVGhlIG1vc3QgY29tbW9uIHByb2JsZW0gZW5jb3VudGVyZWQgd2hlbiByZWFkaW5nIGluIGRhdGEgZnJvbSBhIGZpbGUgaXMgbm90IHJlYWRpbmcgc3BlY2lmeWluZyB3aGVyZSB0byB0ZWxsIFJTdHVkaW8gdG8gbG9vayBmb3IgdGhlIGZpbGUuIFJTdHVkaW8gUHJvamVjdHMgd2VyZSBjcmVhdGVkIHRvIGhlbHAgd2l0aCB0aGlzIHByb2Nlc3MuCgpMb29rIGRvd24gdG8ganVzdCBiZWxvdyB0aGUgd29yZHMgKipDb25zb2xlKiogaW4gdGhlIHRhYnMgaW4gdGhlIGJvdHRvbSBsZWZ0IG9mIFJTdHVkaW8uIEhlcmUgeW91IHdpbGwgZmluZCB0aGUgImN1cnJlbnQgd29ya2luZyBkaXJlY3RvcnkuIiBUaGlzIGlzIHdoZXJlIFIgaXMgd29ya2luZyBmcm9tIGF0IHRoZSBtb21lbnQuIFlvdSBjYW4gYXNzZXNzIGFsbCBmaWxlcyByZWxhdGl2ZSB0byB0aGlzIGRpcmVjdG9yeSB3aXRob3V0IGRvaW5nIGFueXRoaW5nIGV4dHJhIGF0IGFsbCBpbiB0ZXJtcyBvZiBzcGVjaWZ5aW5nIHRoZSBwYXRoIHRvIGdldCB0byB0aGUgZmlsZS4gV2UnbGwgc2VlIGV4YW1wbGVzIG9mIHRoaXMgaW4gYSBiaXQuIElmIHlvdSB3YW50IHRvIHJlYWQgaW4gZmlsZXMgZnJvbSBzb21ld2hlcmUgb3V0c2lkZSBvZiB0aGlzIGN1cnJlbnQgd29ya2luZyBkaXJlY3RvcnksIHlvdSdkIG5lZWQgdG8gdGVsbCBSIGV4YWN0bHkgd2hlcmUgdG8gZmluZCB0aGUgZmlsZS4gTW9yZSBvbiB0aGlzIHRvbyBiZWxvdy4KCkRpcmVjdG9yaWVzIGFuZCBmaWxlIG1hbmFnZW1lbnQgaXMgb2Z0ZW4gcmVhbGx5IHRyaWNreSBmb3IgYmVnaW5uZXJzLiBJZiB5b3Ugc3RydWdnbGUgd2l0aCB0aGlzLCB0aGF0J3MgT0suIEl0IHdpbGwgZ2V0IGJldHRlciBhcyB5b3UgZ2V0IG1vcmUgcHJhY3RpY2UuIFBheSBjYXJlZnVsIGF0dGVudGlvbiB0byB0aGUgZXJyb3IgbWVzc2FnZXMgYW5kIGxlYXJuIHRvIGxvdmUgdGhlICoqRmlsZXMqKiB0YWIgaW4gdGhlIGJvdHRvbSByaWdodCBvZiBSU3R1ZGlvLiBUaGlzIHdpbGwgaGVscCB5b3UgdHJhY2sgZG93biB3aGVyZSB5b3VyIGZpbGVzIGFyZSBsb2NhdGVkIHJlbGF0aXZlIHRvIHRoZSBjdXJyZW50IHdvcmtpbmcgZGlyZWN0b3J5LgoKIyMgYHJlYWR4bGAgcGFja2FnZQoKVGhlIGRhdGEgd2Ugd2lsbCB1c2UgaGVyZSBjb21lcyBmcm9tIHRoZSBgbW92aWVzYCBkYXRhIGZyYW1lIGluIHRoZSBgZ2dwbG90Mm1vdmllc2AgcGFja2FnZS4KClRoZSBgcmVhZHhsYCBwYWNrYWdlIGhhbmRsZXMgbG9hZGluZyBib3RoIDIwMDggKHhsc3gpIGFuZCBlYXJsaWVyIGV4Y2VsIGZpbGVzICh4bHMpLiBUbyByZWFkIGluIEV4Y2VsIGZpbGVzLCB1c2UgdGhlIGByZWFkX2V4Y2VsKClgIGZ1bmN0aW9uOgoKYGBge3J9CmxpYnJhcnkocmVhZHhsKQptb3ZpZXNfZnJvbV9leGNlbCA8LSByZWFkX2V4Y2VsKCJkYXRhL21vdmllcy54bHN4Iiwgc2hlZXQgPSAxKQptb3ZpZXNfZnJvbV9leGNlbApgYGAKCiMjIGByZWFkcmAgcGFja2FnZQoKRmxhdCBmaWxlcywgc3VjaCBhcyBjb21tYS1zZXBhcmF0ZWQtdmFsdWUgKC5jc3YpLCBhbmQgb3RoZXIgdHlwZXMgb2Ygc2ltcGxlIHJlY3Rhbmd1bGFyIGRhdGEuCgotIGByZWFkX2NzdmA6IFJlYWRpbmcgaW4gY29tbWEgc2VwYXJhdGVkIHZhbHVlIGZpbGVzCgogICAgYGBge3J9CiAgICBsaWJyYXJ5KHJlYWRyKQogICAgbW92aWVzX2Zyb21fY3N2IDwtIHJlYWRfY3N2KCJkYXRhL21vdmllcy5jc3YiKQogICAgbW92aWVzX2Zyb21fY3N2CiAgICBgYGAKCi0gYHJlYWRfZGVsaW1gOiBZb3VyIGdlbmVyYWwgcHVycG9zZSBmbGF0IGZpbGUgcmVhZGVyLiBGb3IgZXhhbXBsZSwgaWYgeW91IGhhdmUgYSAqdGFiIGRlbGltaXRlZCBmaWxlKiwgeW91IGNhbiBzcGVjaWZ5IGBkZWxpbWAgPSAiXHQiICgiXHQiIGlzIGhvdyB5b3Ugc3BlY2lmeSB0YWJzKS4gCgogICAgYGBge3J9CiAgICBsaWJyYXJ5KHJlYWRyKQogICAgbW92aWVzIDwtIHJlYWRfZGVsaW0oImRhdGEvbW92aWVzLmNzdiIsIGRlbGltID0gIiwiKQogICAgbW92aWVzCiAgICBgYGAKCkJlIHN1cmUgdG8gY2hlY2sgdGhlIGhlbHAgZm9yIGFsbCBvZiB0aGVzZSAodmlhIGA/cmVhZF9kZWxpbWApIHRvIHNlZSBhbGwgb2YgdGhlIG90aGVyIG9wdGlvbnMgYXZhaWxhYmxlLiBTb21lIHRoYXQgY29tZSB1cCBmcmVxdWVudGx5IGluIG91ciBvd24gZXhwZXJpZW5jZSBhcmUgCgotIGBuYWAgLSBJZiB5b3VyIG1pc3NpbmcgdmFsdWVzIGFyZSBjb2RlZCBkaWZmZXJlbnRseSB0aGFuIGBOQWAsIHlvdSBjYW4gc2V0IHRoZSB2YWx1ZSBoZXJlLgotIGBza2lwYCAtIGlmIHRoZXJlJ3MgZXhwbGFuYXRvcnkgdGV4dCBiZWZvcmUgdGhlIGhlYWRlciwgdXNlIHRoaXMgYXJndW1lbnQgdG8gc2tpcCB0aGVzZSBsaW5lcy4gRm9yIGV4YW1wbGUsIGlmIHlvdXIgaGVhZGVyIHN0YXJ0cyBhdCBsaW5lIDQsIHVzZSBgc2tpcCA9IDNgIAotIGBjb2xfdHlwZXNgIC0gdGhpcyBpcyB3aGVyZSB5b3UgaGF2ZSBtb3JlIGNvbnRyb2wgb24gdGhlIGRhdGEgdHlwZXMgZm9yIGVhY2ggY29sdW1uLgoKIyMgYGhhdmVuYCBwYWNrYWdlCgpSZWFkaW5nIGluICJleHRlcm5hbCIgZGF0YSBmcm9tIH5+aW5mZXJpb3J+fiwgSSBtZWFuLCBvdGhlciBzb2Z0d2FyZSBwYWNrYWdlcyBsaWtlIFNQU1MgKGAuc2F2YCksIFN0YXRhIChgLmR0YWApLCBTQVMsIGV0Yy4gQ2hlY2sgdGhlIFtoYXZlbiBwYWNrYWdlIHdlYnNpdGVdKGh0dHA6Ly9oYXZlbi50aWR5dmVyc2Uub3JnLykgZm9yIGFzc2lzdGFuY2UuCgojIyBZb3VyIHR1cm4KCiMjIyBFeGVyY2lzZSAyLjEKCkxvYWQgdGhlIHBlcmlvZGljIHRhYmxlIGRhdGEgc3RvcmVkIGluIHRoZSBgZGF0YWAgZm9sZGVyLiAoSGludDogeW91J2xsIGhhdmUgdG8gdXNlIHRoZSBgaGF2ZW5gIHBhY2thZ2UgLSBsb29rIGZvciBhIGByZWFkX2AgZnVuY3Rpb24gdGhhdCBjYW4gcmVhZCB0aGUgYC5zYXZgIGZpbGUgZm9ybWF0LikKCmBgYHtyfQojIFdyaXRlIGFuZCBjaGVjayB5b3VyIGFuc3dlciBoZXJlLgpgYGAKCiMjIFdoYXQgeW91IGxlYXJuZWQgaW4gdGhpcyBzZWN0aW9uCgpSZWFkIGluIGRhdGEgZnJvbSBmaWxlcyBpbnRvIGBkYXRhLmZyYW1lYHMgZnJvbQoKLSBFeGNlbCBmaWxlcyB1c2luZyB0aGUgYHJlYWRfZXhjZWwoKWAgZnVuY3Rpb24gaW4gdGhlIGByZWFkeGxgIHBhY2thZ2UsCi0gQ1NWIGZpbGVzIHVzaW5nIHRoZSBgcmVhZF9jc3YoKWAgYW5kIGByZWFkX2RlbGltKClgIGZ1bmN0aW9ucyBpbiB0aGUgYHJlYWRyYCBwYWNrYWdlCi0gZGF0YSBmcm9tIHByb3ByaWV0YXJ5IHN0YXRpc3RpY2FsIHNvZnR3YXJlIHVzaW5nIHRoZSBgcmVhZF9zcHNzKClgLCBgcmVhZF9zdGF0YSgpYCwgYW5kIGByZWFkX3NhcygpYCBmdW5jdGlvbnMgaW4gdGhlIGBoYXZlbmAgcGFja2FnZQoKCg==