Week 12: Data Wrangling

class: title-slide, middle, inverse

.leftcol30[
<center>
<img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250>
</center>
]
.rightcol70[
# Week 12: .fancy[Data Wrangling]

### EMSE 4574: Intro to Programming for Analytics
### John Paul Helveston
### November 17, 2020
]

---

class: inverse, middle

# Week 12: .fancy[Data Wrangling]

## 1. Selecting & filtering
## 2. Sequences with pipes
## 3. Creating new variables
## 4. Grouped operations

---

class: inverse, middle

# Week 12: .fancy[Data Wrangling]

## 1. .orange[Selecting & filtering]
## 2. Sequences with pipes
## 3. Creating new variables
## 4. Grouped operations

---
# Before we start

Make sure you have the "tidyverse" installed

```r
install.packages('tidyverse')
```

(this is at the top of the notes.R file)

Remember: you only need to install packages once!

---
class: center

### The tidyverse: `stringr` + `dplyr` + `readr` +  `ggplot2` + ...

<center>
<img src="images/horst_monsters_tidyverse.jpeg" width="950">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---
class: center

## Today: better data wrangling with **dplyr**

<center>
<img src="images/horst_monsters_data_wrangling.png" width="600">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---
class: center

# 80% of the job is data wrangling

.noborder[<center>
<img src="images/data-wrangle.png" width="800">
</center>]

---
# The main `dplyr` verbs

- `select()`: subset columns
- `filter()`: subset rows on conditions
- `arrange()`: sort data frame
- `mutate()`: create new columns by using information from other columns
- `group_by()`: group data to perform grouped operations
- `summarize()`: create summary statistics (usually on grouped data)
- `count()`: count discrete rows

---
# This week's British Band: **The Spice Girls**

```r
spicegirls <- tibble(
    firstName   = c("Melanie", "Melanie", "Emma", "Geri", "Victoria"),
    lastName    = c("Brown", "Chisholm", "Bunton", "Halliwell", "Beckham"),
    spice       = c("Scary", "Sporty", "Baby", "Ginger", "Posh"),
    yearOfBirth = c(1975, 1974, 1976, 1972, 1974),
    deceased    = c(FALSE, FALSE, FALSE, FALSE, FALSE)
)
spicegirls
```

```
## # A tibble: 5 x 5
##   firstName lastName  spice  yearOfBirth deceased
##   <chr>     <chr>     <chr>        <dbl> <lgl>   
## 1 Melanie   Brown     Scary         1975 FALSE   
## 2 Melanie   Chisholm  Sporty        1974 FALSE   
## 3 Emma      Bunton    Baby          1976 FALSE   
## 4 Geri      Halliwell Ginger        1972 FALSE   
## 5 Victoria  Beckham   Posh          1974 FALSE
```

---
# Select columns with `select()`

<br>
<center>
<img src="images/rstudio-cheatsheet-select.png" width="900">
</center>

---
# Select columns with `select()`

Example: Select the columns `firstName` & `lastName`

--
&zwj;**Base R**:

```r
spicegirls[c('firstName', 'lastName')]
```

```
## # A tibble: 5 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 Melanie   Brown    
## 2 Melanie   Chisholm 
## 3 Emma      Bunton   
## 4 Geri      Halliwell
## 5 Victoria  Beckham
```

---
# Select columns with `select()`

Example: Select the columns `firstName` & `lastName`

**dplyr**: (note that you don't need `""` around names)

```r
select(spicegirls, firstName, lastName)
```

```
## # A tibble: 5 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 Melanie   Brown    
## 2 Melanie   Chisholm 
## 3 Emma      Bunton   
## 4 Geri      Halliwell
## 5 Victoria  Beckham
```

---
# Select columns with `select()`

Use the `-` sign to drop columns:

```r
select(spicegirls, -firstName, -lastName)
```

```
## # A tibble: 5 x 3
##   spice  yearOfBirth deceased
##   <chr>        <dbl> <lgl>   
## 1 Scary         1975 FALSE   
## 2 Sporty        1974 FALSE   
## 3 Baby          1976 FALSE   
## 4 Ginger        1972 FALSE   
## 5 Posh          1974 FALSE
```

---
# Select columns with `select()`

Select columns based on name criteria:

- `ends_with()` = Select columns that end with a character string
- `contains()` = Select columns that contain a character string
- `matches()` = Select columns that match a regular expression
- `one_of()` = Select column names that are from a group of names

---
# Select columns with `select()`

Select only the "name" columns

```r
select(spicegirls, ends_with('name'))
```

```
## # A tibble: 5 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 Melanie   Brown    
## 2 Melanie   Chisholm 
## 3 Emma      Bunton   
## 4 Geri      Halliwell
## 5 Victoria  Beckham
```

---
# Select rows with `filter()`

<br>
<center>
<img src="images/rstudio-cheatsheet-filter.png" width="900">
</center>

---
# Select rows with `filter()`

Example: Filter the band members born after 1974

```r
## # A tibble: 5 x 5
##   firstName lastName  spice  yearOfBirth deceased
##   <chr>     <chr>     <chr>        <dbl> <lgl>
*## 1 Melanie   Brown     Scary         1975 FALSE
## 2 Melanie   Chisholm  Sporty        1974 FALSE
*## 3 Emma      Bunton    Baby          1976 FALSE
## 4 Geri      Halliwell Ginger        1972 FALSE
## 5 Victoria  Beckham   Posh          1974 FALSE
```

---
# Select rows with `filter()`

Example: Filter the band members born after 1974

**Base R**:

```r
spicegirls[spicegirls$yearOfBirth > 1974,]
```

```
## # A tibble: 2 x 5
##   firstName lastName spice yearOfBirth deceased
##   <chr>     <chr>    <chr>       <dbl> <lgl>   
## 1 Melanie   Brown    Scary        1975 FALSE   
## 2 Emma      Bunton   Baby         1976 FALSE
```

---
# Select rows with `filter()`

Example: Filter the band members born after 1974

**dplyr**:

```r
filter(spicegirls, yearOfBirth > 1974)
```

```
## # A tibble: 2 x 5
##   firstName lastName spice yearOfBirth deceased
##   <chr>     <chr>    <chr>       <dbl> <lgl>   
## 1 Melanie   Brown    Scary        1975 FALSE   
## 2 Emma      Bunton   Baby         1976 FALSE
```

---
# Select rows with `filter()`

Example: Filter the band members named "Melanie"
--

```r
filter(spicegirls, firstName == "Melanie")
```

```
## # A tibble: 2 x 5
##   firstName lastName spice  yearOfBirth deceased
##   <chr>     <chr>    <chr>        <dbl> <lgl>   
## 1 Melanie   Brown    Scary         1975 FALSE   
## 2 Melanie   Chisholm Sporty        1974 FALSE
```

---
class: inverse

<div class="countdown" id="timer_5fc80fb5" style="top:0;right:0;font-size:2em;" data-warnwhen="30">
<code class="countdown-time"><span class="countdown-digits minutes">10</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

## Think pair share: wildlife impacts data

.font90[
1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and the `select()` and `filter()` functions to answer the following questions:

- Create a new data frame, `df_birds`, that contains only the variables (columns) about the species of bird.
- Create a new data frame, `dc`, that contains only the observations (rows) from DC airports.
- Create a new data frame, `dc_birds_known`, that contains only the observations (rows) from DC airports and those where the species of bird is known.
- How many _known_ unique species of birds have been involved in accidents at DC airports?
]

---

class: inverse, middle

# Week 12: .fancy[Data Wrangling]

## 1. Selecting & filtering
## 2. .orange[Sequences with pipes]
## 3. Creating new variables
## 4. Grouped operations

---
# Create sequences of operations with "pipes"

--
.leftcol[
<br>
<center>
<img src="images/magrittePipe.jpg" width="400">
</center>
[The Treachery of Images](https://en.wikipedia.org/wiki/The_Treachery_of_Images), René Magritte
]
--
<center>
.rightcol[.noborder[
<img src="images/magrittr.png" width="350">
</center>
]]

---
# Think of `%>%` as the words "...and then..."

--
**Without Pipes** (read from inside-out):

```r
leave_house(get_dressed(get_out_of_bed(wake_up(me))))
```

--
**With Pipes**:

```r
me %>%
    wake_up %>%
    get_out_of_bed %>%
    get_dressed %>%
    leave_house
```

---
# Sequence operations with pipes: `%>%`

--
&zwj;Example:
1. Filter the band members born after 1974
2. Select only the columns `firstName` & `lastName`

--
**Without Pipes**:

```r
select(filter(spicegirls, yearOfBirth > 1974), firstName, lastName)
```

```
## # A tibble: 2 x 2
##   firstName lastName
##   <chr>     <chr>   
## 1 Melanie   Brown   
## 2 Emma      Bunton
```

---
# Sequence operations with pipes: `%>%`

&zwj;Example:
1. Filter the band members born after 1974
2. Select only the columns `firstName` & `lastName`

**With Pipes**:

```r
spicegirls %>%
    filter(yearOfBirth > 1974) %>%
    select(firstName, lastName)
```

```
## # A tibble: 2 x 2
##   firstName lastName
##   <chr>     <chr>   
## 1 Melanie   Brown   
## 2 Emma      Bunton
```

---
# Think of the words "...and then..."

**Without Pipes**:

```r
select(filter(spicegirls, yearOfBirth > 1974), firstName, lastName)
```

**With Pipes**: Note that you don't need to repeat the dataframe name

```r
spicegirls %>%
    filter(yearOfBirth > 1974) %>%
    select(firstName, lastName)
```

---
# Sort rows with `arrange()`

--
Sort the data frame by year of birth:

```r
spicegirls %>%
    arrange(yearOfBirth)
```

```
## # A tibble: 5 x 5
##   firstName lastName  spice  yearOfBirth deceased
##   <chr>     <chr>     <chr>        <dbl> <lgl>   
## 1 Geri      Halliwell Ginger        1972 FALSE   
## 2 Melanie   Chisholm  Sporty        1974 FALSE   
## 3 Victoria  Beckham   Posh          1974 FALSE   
## 4 Melanie   Brown     Scary         1975 FALSE   
## 5 Emma      Bunton    Baby          1976 FALSE
```

---
# Sort rows with `arrange()`

Use the `desc()` function to sort in descending order:

```r
spicegirls %>%
    arrange(desc(yearOfBirth))
```

```
## # A tibble: 5 x 5
##   firstName lastName  spice  yearOfBirth deceased
##   <chr>     <chr>     <chr>        <dbl> <lgl>   
## 1 Emma      Bunton    Baby          1976 FALSE   
## 2 Melanie   Brown     Scary         1975 FALSE   
## 3 Melanie   Chisholm  Sporty        1974 FALSE   
## 4 Victoria  Beckham   Posh          1974 FALSE   
## 5 Geri      Halliwell Ginger        1972 FALSE
```

---
# Sort rows with `arrange()`

Example of filtering, arranging, and selecting:

```r
spicegirls %>%
    filter(yearOfBirth < 1975) %>%
    arrange(desc(yearOfBirth)) %>%
    select(ends_with('name'))
```

```
## # A tibble: 3 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 Melanie   Chisholm 
## 2 Victoria  Beckham  
## 3 Geri      Halliwell
```

---
class: inverse

<div class="countdown" id="timer_5fc81060" style="top:0;right:0;font-size:2em;" data-warnwhen="30">
<code class="countdown-time"><span class="countdown-digits minutes">10</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

## Think pair share

.font90[
1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and `select()`, `filter()`, and `%>%` to answer the following questions:

- Create a new data frame, `dc_dawn`, that contains only the observations (rows) from DC airports that occurred at dawn.
- Create a new data frame, `dc_dawn_birds`, that contains only the observations (rows) from DC airports that occurred at dawn and only the variables (columns) about the species of bird.
- Create a new data frame, `dc_dawn_birds_known`, that contains only the observations (rows) from DC airports that occurred at dawn and only the variables (columns) about the KNOWN species of bird.
- How many _known_ unique species of birds have been involved in accidents at DC airports at dawn?
]

---
class: inverse, center

# .fancy[Break]

<div class="countdown" id="timer_5fc80ff0" style="top:1;right:0;bottom:0;left:0;margin:5%;font-size:8em;" data-warnwhen="30">
<code class="countdown-time"><span class="countdown-digits minutes">05</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

---

class: inverse, middle

# Week 12: .fancy[Data Wrangling]

## 1. Selecting & filtering
## 2. Sequences with pipes
## 3. .orange[Creating new variables]
## 4. Grouped operations

---
class: center

## Create new variables with `mutate()`

<br>
<center>
<img src="images/rstudio-cheatsheet-mutate.png" width="900">
</center>

---
class: center

## Create new variables with `mutate()`

<center>
<img src="images/horst_monsters_mutate.jpg" width="500">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---
# Create new variables with `mutate()`

Example: Use the `yearOfBirth` variable to compute the age of each band member

--
**Base R**:

```r
spicegirls$age <- 2020 - spicegirls$yearOfBirth
```

--
**dplyr**:

```r
spicegirls %>%
    mutate(age = 2020 - yearOfBirth)
```

```
## # A tibble: 5 x 6
##   firstName lastName  spice  yearOfBirth deceased   age
##   <chr>     <chr>     <chr>        <dbl> <lgl>    <dbl>
## 1 Melanie   Brown     Scary         1975 FALSE       45
## 2 Melanie   Chisholm  Sporty        1974 FALSE       46
## 3 Emma      Bunton    Baby          1976 FALSE       44
## 4 Geri      Halliwell Ginger        1972 FALSE       48
## 5 Victoria  Beckham   Posh          1974 FALSE       46
```

---
# You can _immediately_ use new variables

```r
spicegirls %>%
    select(firstName, lastName, yearOfBirth) %>%
    mutate(
        age = 2020 - yearOfBirth,
*       meanAge  = mean(age)) # Immediately using the "age" variable
```

```
## # A tibble: 5 x 5
##   firstName lastName  yearOfBirth   age meanAge
##   <chr>     <chr>           <dbl> <dbl>   <dbl>
## 1 Melanie   Brown            1975    45    45.8
## 2 Melanie   Chisholm         1974    46    45.8
## 3 Emma      Bunton           1976    44    45.8
## 4 Geri      Halliwell        1972    48    45.8
## 5 Victoria  Beckham          1974    46    45.8
```

---
# Handle if/else conditions with `ifelse()`

--
### `ifelse(<condition>, <if TRUE>, <else>)`

--

```r
spicegirls %>%
    mutate(
        yobEvenOdd = ifelse(yearOfBirth %% 2 == 0, 'even', 'odd'))
```

```
## # A tibble: 5 x 6
##   firstName lastName  spice  yearOfBirth deceased yobEvenOdd
##   <chr>     <chr>     <chr>        <dbl> <lgl>    <chr>     
## 1 Melanie   Brown     Scary         1975 FALSE    odd       
## 2 Melanie   Chisholm  Sporty        1974 FALSE    even      
## 3 Emma      Bunton    Baby          1976 FALSE    even      
## 4 Geri      Halliwell Ginger        1972 FALSE    even      
## 5 Victoria  Beckham   Posh          1974 FALSE    even
```

---
class: inverse

<div class="countdown" id="timer_5fc81124" style="top:0;right:0;font-size:2em;" data-warnwhen="30">
<code class="countdown-time"><span class="countdown-digits minutes">10</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

## Think pair share

.font90[
1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object with `%>%` and `mutate()` to create the following new variables:

- `height_miles`: The `height` variable converted to miles (Hint: there are 5,280 feet in a mile).
- `cost_mil`: Is `TRUE` if the repair costs was greater or equal to $1 million, `FALSE` otherwise.
- `season`: One of four seasons based on the `incident_month` variable:

- `spring`: March, April, May
  - `summer`: June, July, August
  - `fall`: September, October, November
  - `winter`: December, January, February
]

---

class: inverse, middle

# Week 12: .fancy[Data Wrangling]

## 1. Selecting & filtering
## 2. Sequences with pipes
## 3. Creating new variables
## 4. .orange[Grouped operations]

---
# Split-apply-combine with `group_by`

<br>
### 1. **Split** the data into groups
### 2. **Apply** some analysis to each group
### 3. **Combine** the results

---
class: center

# Split-apply-combine with `group_by`

<br>
<center>
<img src="images/rstudio-cheatsheet-group_by.png" width="900">
</center>

---
## Split-apply-combine with `group_by`

```r
bands
```

```
## # A tibble: 9 x 5
##   firstName lastName  yearOfBirth deceased band      
##   <chr>     <chr>           <dbl> <lgl>    <chr>     
## 1 Melanie   Brown            1975 FALSE    spicegirls
## 2 Melanie   Chisholm         1974 FALSE    spicegirls
## 3 Emma      Bunton           1976 FALSE    spicegirls
## 4 Geri      Halliwell        1972 FALSE    spicegirls
## 5 Victoria  Beckham          1974 FALSE    spicegirls
## 6 John      Lennon           1940 TRUE     beatles   
## 7 Paul      McCartney        1942 FALSE    beatles   
## 8 Ringo     Starr            1940 FALSE    beatles   
## 9 George    Harrison         1943 TRUE     beatles
```

---
## Split-apply-combine with `group_by`

Compute the mean band member age for **each band**

```r
bands %>%
    mutate(
        age = 2020 - yearOfBirth,
*       mean_age = mean(age)) # This is the mean across both bands
```

```
## # A tibble: 9 x 7
##   firstName lastName  yearOfBirth deceased band         age mean_age
##   <chr>     <chr>           <dbl> <lgl>    <chr>      <dbl>    <dbl>
## 1 Melanie   Brown            1975 FALSE    spicegirls    45     60.4
## 2 Melanie   Chisholm         1974 FALSE    spicegirls    46     60.4
## 3 Emma      Bunton           1976 FALSE    spicegirls    44     60.4
## 4 Geri      Halliwell        1972 FALSE    spicegirls    48     60.4
## 5 Victoria  Beckham          1974 FALSE    spicegirls    46     60.4
## 6 John      Lennon           1940 TRUE     beatles       80     60.4
## 7 Paul      McCartney        1942 FALSE    beatles       78     60.4
## 8 Ringo     Starr            1940 FALSE    beatles       80     60.4
## 9 George    Harrison         1943 TRUE     beatles       77     60.4
```

---
## Split-apply-combine with `group_by`

Compute the mean band member age for each band

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
*   group_by(band) %>% # Everything after this will be done each band
    mutate(mean_age = mean(age))
```

```
## # A tibble: 9 x 7
## # Groups:   band [2]
##   firstName lastName  yearOfBirth deceased band         age mean_age
##   <chr>     <chr>           <dbl> <lgl>    <chr>      <dbl>    <dbl>
## 1 Melanie   Brown            1975 FALSE    spicegirls    45     45.8
## 2 Melanie   Chisholm         1974 FALSE    spicegirls    46     45.8
## 3 Emma      Bunton           1976 FALSE    spicegirls    44     45.8
## 4 Geri      Halliwell        1972 FALSE    spicegirls    48     45.8
## 5 Victoria  Beckham          1974 FALSE    spicegirls    46     45.8
## 6 John      Lennon           1940 TRUE     beatles       80     78.8
## 7 Paul      McCartney        1942 FALSE    beatles       78     78.8
## 8 Ringo     Starr            1940 FALSE    beatles       80     78.8
## 9 George    Harrison         1943 TRUE     beatles       77     78.8
```

---
class: center

# Summarize data frames with `summarise()`

<br>
<center>
<img src="images/rstudio-cheatsheet-summarise.png" width="900">
</center>

---
## Summarize data frames with `summarise()`

Compute the mean band member age for **each band**

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
*   summarise(mean_age = mean(age)) # Drops all variables except for group
```

```
## # A tibble: 2 x 2
##   band       mean_age
##   <chr>         <dbl>
## 1 beatles        78.8
## 2 spicegirls     45.8
```

---
## Summarize data frames with `summarise()`

Compute the mean, min, and max band member age for **each band**

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age))
```

```
## # A tibble: 2 x 4
##   band       mean_age min_age max_age
##   <chr>         <dbl>   <dbl>   <dbl>
## 1 beatles        78.8      77      80
## 2 spicegirls     45.8      44      48
```

---
## Computing counts of observations with `n()`

How many members are in each band?

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age),
*       numMembers = n())
```

```
## # A tibble: 2 x 5
##   band       mean_age min_age max_age numMembers
##   <chr>         <dbl>   <dbl>   <dbl>      <int>
## 1 beatles        78.8      77      80          4
## 2 spicegirls     45.8      44      48          5
```

---
## If you only want a quick count, use `count()`

These do the same thing:

.leftcol[

```r
bands %>%
    group_by(band) %>%
    summarise(n = n())
```

```
## # A tibble: 2 x 2
##   band           n
##   <chr>      <int>
## 1 beatles        4
## 2 spicegirls     5
```
]
.rightcol[

```r
bands %>%
    count(band)
```

```
## # A tibble: 2 x 2
##   band           n
##   <chr>      <int>
## 1 beatles        4
## 2 spicegirls     5
```
]

---
## If you only want a quick count, use `count()`

You can count multiple combinations

```r
bands %>%
    mutate(nameStartsWithG = str_detect(firstName, '^G')) %>%
    count(band, nameStartsWithG)
```

```
## # A tibble: 4 x 3
##   band       nameStartsWithG     n
##   <chr>      <lgl>           <int>
## 1 beatles    FALSE               3
## 2 beatles    TRUE                1
## 3 spicegirls FALSE               4
## 4 spicegirls TRUE                1
```

---
class: inverse

<div class="countdown" id="timer_5fc80f62" style="top:0;right:0;font-size:2em;" data-warnwhen="30">
<code class="countdown-time"><span class="countdown-digits minutes">10</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

## Think pair share

.font90[
1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and `group_by()`, `summarise()`, `count()`, and `%>%` to answer the following questions:

- Create a summary data frame that contains the mean `height` for each different time of day.
- Create a summary data frame that contains the maximum `cost_repairs_infl_adj` for each year.
- Which _month_ has had the greatest number of reported incidents?
- Which _year_ has had the greatest number of reported incidents?
]

---
# Exporting data

```r
ageSummary <- bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age),
        numMembers = n())
ageSummary
```

```
## # A tibble: 2 x 5
##   band       mean_age min_age max_age numMembers
##   <chr>         <dbl>   <dbl>   <dbl>      <int>
## 1 beatles        78.8      77      80          4
## 2 spicegirls     45.8      44      48          5
```

---
# Exporting data: `here()` + `write_csv()`

Save the `ageSummary` data frame in your "data" folder:

--
1) Create a path to where you want to save the data

```r
library(here)
savePath <- here('data', 'ageSummary.csv')
```
--
2) Export the data

```r
library(readr)
write_csv(ageSummary, savePath)
```

---
# HW 10

--
Make sure you install the package `nycflights13`

```r
install.packages('nycflights13')
```

--
This package includes **5 data frames**:

```r
airlines
airports
flights
planes
weather
```