Lesson 7: Import diverse data files and structures

 

Functions for Lesson 7
read_csv, read_delim, write_csv
 

Packages for Lesson 7
readr
 

Agenda

Use the readr package to easily read in different data file types.

Cheat sheet for the readr package.
 

 

Do First

Recreate the below PDF using RMarkdown with the following conditions from the smaller NYC Airbnb dataset. Download the final PDF file here.

# smaller csv file (16 cols)
url <- "http://data.insideairbnb.com/united-states/ny/new-york-city/2021-04-07/data/listings.csv.gz"
nyc <- readr::read_csv(url)
nyc <- nyc[nyc$id < 1e+06, ]  # get smaller subet of data
  • Accommodation less than $200 per night, between 5 and 15 nights, and excluding Staten Island.
  • Show only the data structure of the above subsetted data as a code output. No need to show the code for how you subsetted the data (see the PDF).
  • Show the plotting code along with the plot.
  • Use the below yaml for your Rmd file.

---
title: "Dissecting property availability in New York City using Airbnb open data"
author: "<your name here>"
urlcolor: blue
params:
  source: "http://insideairbnb.com/new-york-city/"
output:
  pdf_document:
    toc: yes
    toc_depth: 2
---
      
  • Append the below code at the beginning of your Rmd file to load the appropriate packages and suppress the package loading messages and warnings. Use a {r, echo=T, eval=T, message=F} header in the code chunk.
pacman::p_load(ggthemes, ggplot2, readr, dplyr)

 


Reading in different data file types

read_csv("file.csv")  # read in csv 
read_csv2("file2.csv")  # read in csv data for  ';' as separator and ',' as decimal point
read_delim("file.txt", delim = "|")  # read flat txt files and specify the delimiter 
read_tsv("file.tsv")  # read flat data separated by tabs 
read_table()  # read data separated by white space, i.e. a table 

write_file(x = "a,b,c\n1,2,3\n4,5,NA", path = "file.csv")
write_file(x = "a;b;c\n1;2;3\n4;5;NA", path = "file2.csv")
write_file(x = "a|b|c\n1|2|3\n4|5|NA", path = "file.txt")
write_file(x = "a b c\n1 2 3\n4 5 NA", path = "file.fwf")
write_file(x = "a\tb\tc\n1\t2\t3\n4\t5\tNA", path = "file.tsv")

 

Saving your data from R

Save x, an R object, to path, a file path

# Comma delimited file
write_csv(x, path, na = "NA", append = FALSE, col_names = !append)

# File with arbitrary delimiter
write_delim(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append)

# CSV for excel
write_excel_csv(x, path, na = "NA", append = FALSE, col_names = !append)

# String to file
write_file(x, path, append = FALSE) String vector to file, one element per line
write_lines(x,path, na = "NA", append = FALSE) Object to RDS file
write_rds(x, path, compress = c("none", "gz", "bz2", "xz"), ...)

# Tab delimited files
write_tsv(x, path, na = "NA", append = FALSE, col_names = !append)