In this section, we’ll load in external data using several built-in packages to the tidyverse. In particular, we’ll cover:

  1. Loading excel spreadsheets with readxl
  2. Loading in flat text file data, comma separated.
  3. Loading datafiles from SAS/SPSS with haven

Some things to keep in mind

For the most part, we will be loading in our data as data.frames or tbls with these packages.

One thing to keep in mind is that oftentimes, files have a bit of explanatory text and the header is not always going to be on the first line in our files.

Important note: The most common problem encountered when reading in data from a file is not reading specifying where to tell RStudio to look for the file. RStudio Projects were created to help with this process.

Look down to just below the words Console in the tabs in the bottom left of RStudio. Here you will find the “current working directory.” This is where R is working from at the moment. You can assess all files relative to this directory without doing anything extra at all in terms of specifying the path to get to the file. We’ll see examples of this in a bit. If you want to read in files from somewhere outside of this current working directory, you’d need to tell R exactly where to find the file. More on this too below.

Directories and file management is often really tricky for beginners. If you struggle with this, that’s OK. It will get better as you get more practice. Pay careful attention to the error messages and learn to love the Files tab in the bottom right of RStudio. This will help you track down where your files are located relative to the current working directory.

readxl package

The data we will use here comes from the movies data frame in the ggplot2movies package.

The readxl package handles loading both 2008 (xlsx) and earlier excel files (xls). To read in Excel files, use the read_excel() function:

library(readxl)
movies_from_excel <- read_excel("data/movies.xlsx", sheet = 1)
movies_from_excel

readr package

Flat files, such as comma-separated-value (.csv), and other types of simple rectangular data.

Be sure to check the help for all of these (via ?read_delim) to see all of the other options available. Some that come up frequently in our own experience are

haven package

Reading in “external” data from inferior, I mean, other software packages like SPSS (.sav), Stata (.dta), SAS, etc. Check the haven package website for assistance.

Your turn

Exercise 2.1

Load the periodic table data stored in the data folder. (Hint: you’ll have to use the haven package - look for a read_ function that can read the .sav file format.)

# Write and check your answer here.

What you learned in this section

Read in data from files into data.frames from

LS0tCnRpdGxlOiAiUGFydCAyIC0gTG9hZGluZyBkYXRhIHdpdGggYHJlYWR4bGAsIGBoYXZlbmAgYW5kIGByZWFkcmAiCmF1dGhvcjogIkNoZXN0ZXIgSXNtYXkgYW5kIFRlZCBMYWRlcmFzIgpvdXRwdXQ6IAogIGh0bWxfZG9jdW1lbnQ6CiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUKICAgIGRmX3ByaW50OiBwYWdlZAotLS0KCkluIHRoaXMgc2VjdGlvbiwgd2UnbGwgbG9hZCBpbiBleHRlcm5hbCBkYXRhIHVzaW5nIHNldmVyYWwgYnVpbHQtaW4gcGFja2FnZXMgdG8gdGhlIGB0aWR5dmVyc2VgLiBJbiBwYXJ0aWN1bGFyLCB3ZSdsbCBjb3ZlcjoKCjEuIExvYWRpbmcgZXhjZWwgc3ByZWFkc2hlZXRzIHdpdGggYHJlYWR4bGAKMi4gTG9hZGluZyBpbiBmbGF0IHRleHQgZmlsZSBkYXRhLCBjb21tYSBzZXBhcmF0ZWQuCjMuIExvYWRpbmcgZGF0YWZpbGVzIGZyb20gU0FTL1NQU1Mgd2l0aCBgaGF2ZW5gCgoKIyMgU29tZSB0aGluZ3MgdG8ga2VlcCBpbiBtaW5kCgpGb3IgdGhlIG1vc3QgcGFydCwgd2Ugd2lsbCBiZSBsb2FkaW5nIGluIG91ciBkYXRhIGFzIGBkYXRhLmZyYW1lYHMgb3IgYHRibGBzIHdpdGggdGhlc2UgcGFja2FnZXMuIAoKT25lIHRoaW5nIHRvIGtlZXAgaW4gbWluZCBpcyB0aGF0IG9mdGVudGltZXMsIGZpbGVzIGhhdmUgYSBiaXQgb2YgZXhwbGFuYXRvcnkgdGV4dCBhbmQgdGhlIGhlYWRlciBpcyBub3QgYWx3YXlzIGdvaW5nIHRvIGJlIG9uIHRoZSBmaXJzdCBsaW5lIGluIG91ciBmaWxlcy4KCioqSW1wb3J0YW50IG5vdGUqKjogVGhlIG1vc3QgY29tbW9uIHByb2JsZW0gZW5jb3VudGVyZWQgd2hlbiByZWFkaW5nIGluIGRhdGEgZnJvbSBhIGZpbGUgaXMgbm90IHJlYWRpbmcgc3BlY2lmeWluZyB3aGVyZSB0byB0ZWxsIFJTdHVkaW8gdG8gbG9vayBmb3IgdGhlIGZpbGUuIFJTdHVkaW8gUHJvamVjdHMgd2VyZSBjcmVhdGVkIHRvIGhlbHAgd2l0aCB0aGlzIHByb2Nlc3MuCgpMb29rIGRvd24gdG8ganVzdCBiZWxvdyB0aGUgd29yZHMgKipDb25zb2xlKiogaW4gdGhlIHRhYnMgaW4gdGhlIGJvdHRvbSBsZWZ0IG9mIFJTdHVkaW8uIEhlcmUgeW91IHdpbGwgZmluZCB0aGUgImN1cnJlbnQgd29ya2luZyBkaXJlY3RvcnkuIiBUaGlzIGlzIHdoZXJlIFIgaXMgd29ya2luZyBmcm9tIGF0IHRoZSBtb21lbnQuIFlvdSBjYW4gYXNzZXNzIGFsbCBmaWxlcyByZWxhdGl2ZSB0byB0aGlzIGRpcmVjdG9yeSB3aXRob3V0IGRvaW5nIGFueXRoaW5nIGV4dHJhIGF0IGFsbCBpbiB0ZXJtcyBvZiBzcGVjaWZ5aW5nIHRoZSBwYXRoIHRvIGdldCB0byB0aGUgZmlsZS4gV2UnbGwgc2VlIGV4YW1wbGVzIG9mIHRoaXMgaW4gYSBiaXQuIElmIHlvdSB3YW50IHRvIHJlYWQgaW4gZmlsZXMgZnJvbSBzb21ld2hlcmUgb3V0c2lkZSBvZiB0aGlzIGN1cnJlbnQgd29ya2luZyBkaXJlY3RvcnksIHlvdSdkIG5lZWQgdG8gdGVsbCBSIGV4YWN0bHkgd2hlcmUgdG8gZmluZCB0aGUgZmlsZS4gTW9yZSBvbiB0aGlzIHRvbyBiZWxvdy4KCkRpcmVjdG9yaWVzIGFuZCBmaWxlIG1hbmFnZW1lbnQgaXMgb2Z0ZW4gcmVhbGx5IHRyaWNreSBmb3IgYmVnaW5uZXJzLiBJZiB5b3Ugc3RydWdnbGUgd2l0aCB0aGlzLCB0aGF0J3MgT0suIEl0IHdpbGwgZ2V0IGJldHRlciBhcyB5b3UgZ2V0IG1vcmUgcHJhY3RpY2UuIFBheSBjYXJlZnVsIGF0dGVudGlvbiB0byB0aGUgZXJyb3IgbWVzc2FnZXMgYW5kIGxlYXJuIHRvIGxvdmUgdGhlICoqRmlsZXMqKiB0YWIgaW4gdGhlIGJvdHRvbSByaWdodCBvZiBSU3R1ZGlvLiBUaGlzIHdpbGwgaGVscCB5b3UgdHJhY2sgZG93biB3aGVyZSB5b3VyIGZpbGVzIGFyZSBsb2NhdGVkIHJlbGF0aXZlIHRvIHRoZSBjdXJyZW50IHdvcmtpbmcgZGlyZWN0b3J5LgoKIyMgYHJlYWR4bGAgcGFja2FnZQoKVGhlIGRhdGEgd2Ugd2lsbCB1c2UgaGVyZSBjb21lcyBmcm9tIHRoZSBgbW92aWVzYCBkYXRhIGZyYW1lIGluIHRoZSBgZ2dwbG90Mm1vdmllc2AgcGFja2FnZS4KClRoZSBgcmVhZHhsYCBwYWNrYWdlIGhhbmRsZXMgbG9hZGluZyBib3RoIDIwMDggKHhsc3gpIGFuZCBlYXJsaWVyIGV4Y2VsIGZpbGVzICh4bHMpLiBUbyByZWFkIGluIEV4Y2VsIGZpbGVzLCB1c2UgdGhlIGByZWFkX2V4Y2VsKClgIGZ1bmN0aW9uOgoKYGBge3J9CmxpYnJhcnkocmVhZHhsKQptb3ZpZXNfZnJvbV9leGNlbCA8LSByZWFkX2V4Y2VsKCJkYXRhL21vdmllcy54bHN4Iiwgc2hlZXQgPSAxKQptb3ZpZXNfZnJvbV9leGNlbApgYGAKCiMjIGByZWFkcmAgcGFja2FnZQoKRmxhdCBmaWxlcywgc3VjaCBhcyBjb21tYS1zZXBhcmF0ZWQtdmFsdWUgKC5jc3YpLCBhbmQgb3RoZXIgdHlwZXMgb2Ygc2ltcGxlIHJlY3Rhbmd1bGFyIGRhdGEuCgotIGByZWFkX2NzdmA6IFJlYWRpbmcgaW4gY29tbWEgc2VwYXJhdGVkIHZhbHVlIGZpbGVzCgogICAgYGBge3J9CiAgICBsaWJyYXJ5KHJlYWRyKQogICAgbW92aWVzX2Zyb21fY3N2IDwtIHJlYWRfY3N2KCJkYXRhL21vdmllcy5jc3YiKQogICAgbW92aWVzX2Zyb21fY3N2CiAgICBgYGAKCi0gYHJlYWRfZGVsaW1gOiBZb3VyIGdlbmVyYWwgcHVycG9zZSBmbGF0IGZpbGUgcmVhZGVyLiBGb3IgZXhhbXBsZSwgaWYgeW91IGhhdmUgYSAqdGFiIGRlbGltaXRlZCBmaWxlKiwgeW91IGNhbiBzcGVjaWZ5IGBkZWxpbWAgPSAiXHQiICgiXHQiIGlzIGhvdyB5b3Ugc3BlY2lmeSB0YWJzKS4gCgogICAgYGBge3J9CiAgICBsaWJyYXJ5KHJlYWRyKQogICAgbW92aWVzIDwtIHJlYWRfZGVsaW0oImRhdGEvbW92aWVzLmNzdiIsIGRlbGltID0gIiwiKQogICAgbW92aWVzCiAgICBgYGAKCkJlIHN1cmUgdG8gY2hlY2sgdGhlIGhlbHAgZm9yIGFsbCBvZiB0aGVzZSAodmlhIGA/cmVhZF9kZWxpbWApIHRvIHNlZSBhbGwgb2YgdGhlIG90aGVyIG9wdGlvbnMgYXZhaWxhYmxlLiBTb21lIHRoYXQgY29tZSB1cCBmcmVxdWVudGx5IGluIG91ciBvd24gZXhwZXJpZW5jZSBhcmUgCgotIGBuYWAgLSBJZiB5b3VyIG1pc3NpbmcgdmFsdWVzIGFyZSBjb2RlZCBkaWZmZXJlbnRseSB0aGFuIGBOQWAsIHlvdSBjYW4gc2V0IHRoZSB2YWx1ZSBoZXJlLgotIGBza2lwYCAtIGlmIHRoZXJlJ3MgZXhwbGFuYXRvcnkgdGV4dCBiZWZvcmUgdGhlIGhlYWRlciwgdXNlIHRoaXMgYXJndW1lbnQgdG8gc2tpcCB0aGVzZSBsaW5lcy4gRm9yIGV4YW1wbGUsIGlmIHlvdXIgaGVhZGVyIHN0YXJ0cyBhdCBsaW5lIDQsIHVzZSBgc2tpcCA9IDNgIAotIGBjb2xfdHlwZXNgIC0gdGhpcyBpcyB3aGVyZSB5b3UgaGF2ZSBtb3JlIGNvbnRyb2wgb24gdGhlIGRhdGEgdHlwZXMgZm9yIGVhY2ggY29sdW1uLgoKIyMgYGhhdmVuYCBwYWNrYWdlCgpSZWFkaW5nIGluICJleHRlcm5hbCIgZGF0YSBmcm9tIH5+aW5mZXJpb3J+fiwgSSBtZWFuLCBvdGhlciBzb2Z0d2FyZSBwYWNrYWdlcyBsaWtlIFNQU1MgKGAuc2F2YCksIFN0YXRhIChgLmR0YWApLCBTQVMsIGV0Yy4gQ2hlY2sgdGhlIFtoYXZlbiBwYWNrYWdlIHdlYnNpdGVdKGh0dHA6Ly9oYXZlbi50aWR5dmVyc2Uub3JnLykgZm9yIGFzc2lzdGFuY2UuCgojIyBZb3VyIHR1cm4KCiMjIyBFeGVyY2lzZSAyLjEKCkxvYWQgdGhlIHBlcmlvZGljIHRhYmxlIGRhdGEgc3RvcmVkIGluIHRoZSBgZGF0YWAgZm9sZGVyLiAoSGludDogeW91J2xsIGhhdmUgdG8gdXNlIHRoZSBgaGF2ZW5gIHBhY2thZ2UgLSBsb29rIGZvciBhIGByZWFkX2AgZnVuY3Rpb24gdGhhdCBjYW4gcmVhZCB0aGUgYC5zYXZgIGZpbGUgZm9ybWF0LikKCmBgYHtyfQojIFdyaXRlIGFuZCBjaGVjayB5b3VyIGFuc3dlciBoZXJlLgpgYGAKCiMjIFdoYXQgeW91IGxlYXJuZWQgaW4gdGhpcyBzZWN0aW9uCgpSZWFkIGluIGRhdGEgZnJvbSBmaWxlcyBpbnRvIGBkYXRhLmZyYW1lYHMgZnJvbQoKLSBFeGNlbCBmaWxlcyB1c2luZyB0aGUgYHJlYWRfZXhjZWwoKWAgZnVuY3Rpb24gaW4gdGhlIGByZWFkeGxgIHBhY2thZ2UsCi0gQ1NWIGZpbGVzIHVzaW5nIHRoZSBgcmVhZF9jc3YoKWAgYW5kIGByZWFkX2RlbGltKClgIGZ1bmN0aW9ucyBpbiB0aGUgYHJlYWRyYCBwYWNrYWdlCi0gZGF0YSBmcm9tIHByb3ByaWV0YXJ5IHN0YXRpc3RpY2FsIHNvZnR3YXJlIHVzaW5nIHRoZSBgcmVhZF9zcHNzKClgLCBgcmVhZF9zdGF0YSgpYCwgYW5kIGByZWFkX3NhcygpYCBmdW5jdGlvbnMgaW4gdGhlIGBoYXZlbmAgcGFja2FnZQoKCg==