Data Wrangling

]

# Week 9: .fancy[Data Wrangling]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8.1-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zm128-96c-17.7 0-32-14.3-32-32s14.3-32 32-32s32 14.3 32 32s-14.3 32-32 32z"/></svg> EMSE 4571: Intro to Programming for Analytics
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M272 304h-96C78.8 304 0 382.8 0 480c0 17.67 14.33 32 32 32h384c17.67 0 32-14.33 32-32C448 382.8 369.2 304 272 304zM48.99 464C56.89 400.9 110.8 352 176 352h96c65.16 0 119.1 48.95 127 112H48.99zM224 256c70.69 0 128-57.31 128-128c0-70.69-57.31-128-128-128S96 57.31 96 128C96 198.7 153.3 256 224 256zM224 48c44.11 0 80 35.89 80 80c0 44.11-35.89 80-80 80S144 172.1 144 128C144 83.89 179.9 48 224 48z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M152 64H296V24C296 10.75 306.7 0 320 0C333.3 0 344 10.75 344 24V64H384C419.3 64 448 92.65 448 128V448C448 483.3 419.3 512 384 512H64C28.65 512 0 483.3 0 448V128C0 92.65 28.65 64 64 64H104V24C104 10.75 114.7 0 128 0C141.3 0 152 10.75 152 24V64zM48 448C48 456.8 55.16 464 64 464H384C392.8 464 400 456.8 400 448V192H48V448z"/></svg> March 30, 2023

]

---

# Quiz 5

## Write your name on the quiz!

## Rules:

- Work alone; no outside help of any kind is allowed.
- No calculators, no notes, no books, no computers, no phones.

]

]

---

# Week 9: .fancy[Data Wrangling]

### 1. Selecting & filtering
### 2. Sequences with pipes

### BREAK

### 3. Creating new variables
### 4. Grouped operations

---

# Week 9: .fancy[Data Wrangling]

### 1. .orange[Selecting & filtering]
### 2. Sequences with pipes

### BREAK

### 3. Creating new variables
### 4. Grouped operations

---

# Before we start

Make sure you have the "tidyverse" installed

```r
install.packages('tidyverse')
```

(this is at the top of the `practice.R` file)

Remember: you only need to install packages once!

---

### The tidyverse: `stringr` + `dplyr` + `readr` +  `ggplot2` + ...

<center>
<img src="images/horst_monsters_tidyverse.jpeg" width="950">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---

## Today: better data wrangling with **dplyr**

<center>
<img src="images/horst_monsters_data_wrangling.png" width="600">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---

# 80% of the job is data wrangling

---

# The main `dplyr` "verbs"

- `select()`: subset columns
- `filter()`: subset rows on conditions
- `arrange()`: sort data frame
- `mutate()`: create new columns by using information from other columns
- `group_by()`: group data to perform grouped operations
- `summarize()`: create summary statistics (usually on grouped data)
- `count()`: count discrete rows

---

# This week's British Band: **The Spice Girls**

```r
spicegirls <- tibble(
    firstName   = c("Melanie", "Melanie", "Emma", "Geri", "Victoria"),
    lastName    = c("Brown", "Chisholm", "Bunton", "Halliwell", "Beckham"),
    spice       = c("Scary", "Sporty", "Baby", "Ginger", "Posh"),
    yearOfBirth = c(1975, 1974, 1976, 1972, 1974),
    deceased    = c(FALSE, FALSE, FALSE, FALSE, FALSE)
)
spicegirls
```

```
#> # A tibble: 5 × 5
#>   firstName lastName  spice  yearOfBirth deceased
#>   <chr>     <chr>     <chr>        <dbl> <lgl>   
#> 1 Melanie   Brown     Scary         1975 FALSE   
#> 2 Melanie   Chisholm  Sporty        1974 FALSE   
#> 3 Emma      Bunton    Baby          1976 FALSE   
#> 4 Geri      Halliwell Ginger        1972 FALSE   
#> 5 Victoria  Beckham   Posh          1974 FALSE
```

---

# Select columns with `select()`

---

# Select columns with `select()`

Example: Select the columns `firstName` & `lastName`

**Base R**:

```r
spicegirls[c('firstName', 'lastName')]
```

```
#> # A tibble: 5 × 2
#>   firstName lastName 
#>   <chr>     <chr>    
#> 1 Melanie   Brown    
#> 2 Melanie   Chisholm 
#> 3 Emma      Bunton   
#> 4 Geri      Halliwell
#> 5 Victoria  Beckham
```

---

# Select columns with `select()`

Example: Select the columns `firstName` & `lastName`

**dplyr**: (note that you don't need `""` around names)

```r
select(spicegirls, firstName, lastName)
```

---

# Select columns with `select()`

Use the `-` sign to drop columns:

```r
select(spicegirls, -firstName, -lastName)
```

```
#> # A tibble: 5 × 3
#>   spice  yearOfBirth deceased
#>   <chr>        <dbl> <lgl>   
#> 1 Scary         1975 FALSE   
#> 2 Sporty        1974 FALSE   
#> 3 Baby          1976 FALSE   
#> 4 Ginger        1972 FALSE   
#> 5 Posh          1974 FALSE
```

---

# Select columns with `select()`

Select columns based on name criteria:

- `ends_with()` = Select columns that end with a character string
- `contains()` = Select columns that contain a character string
- `matches()` = Select columns that match a regular expression
- `one_of()` = Select column names that are from a group of names

---

# Select columns with `select()`

Select only the "name" columns

```r
select(spicegirls, ends_with('name'))
```

---

# Select rows with `filter()`

---

# Select rows with `filter()`

Example: Filter the band members born after 1974

```r
## # A tibble: 5 x 5
##   firstName lastName  spice  yearOfBirth deceased
##   <chr>     <chr>     <chr>        <dbl> <lgl>
*## 1 Melanie   Brown     Scary         1975 FALSE
## 2 Melanie   Chisholm  Sporty        1974 FALSE
*## 3 Emma      Bunton    Baby          1976 FALSE
## 4 Geri      Halliwell Ginger        1972 FALSE
## 5 Victoria  Beckham   Posh          1974 FALSE
```

---

# Select rows with `filter()`

Example: Filter the band members born after 1974

**Base R**:

```r
spicegirls[spicegirls$yearOfBirth > 1974,]
```

```
#> # A tibble: 2 × 5
#>   firstName lastName spice yearOfBirth deceased
#>   <chr>     <chr>    <chr>       <dbl> <lgl>   
#> 1 Melanie   Brown    Scary        1975 FALSE   
#> 2 Emma      Bunton   Baby         1976 FALSE
```

---

# Select rows with `filter()`

Example: Filter the band members born after 1974

**dplyr**:

```r
filter(spicegirls, yearOfBirth > 1974)
```

---

# Select rows with `filter()`

Example: Filter the band members born after 1974 **& are named "Melanie"**

**dplyr**:

```r
filter(spicegirls, yearOfBirth > 1974 & firstName == "Melanie")
```

```
#> # A tibble: 1 × 5
#>   firstName lastName spice yearOfBirth deceased
#>   <chr>     <chr>    <chr>       <dbl> <lgl>   
#> 1 Melanie   Brown    Scary        1975 FALSE
```

---

# .center[Logic operators for `filter()`]

<br>

Description | Example
------------|------------
Values greater than 1 | `value > 1`
Values greater than or equal to 1 | `value >= 1`
Values less than 1 | `value < 1`
Values less than or equal to 1 | `value <= 1`
Values equal to 1 | `value == 1`
Values not equal to 1 | `value != 1`
Values in the set c(1, 4) | `value %in% c(1, 4)`

---

# Removing missing values

Drop all rows where `variable` is `NA`

```r
filter(data, !is.na(variable))
```

---

## Your turn: wildlife impacts data

1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and the `select()` and `filter()` functions to answer the following questions:

- Create a new data frame, `df_birds`, that contains only the variables (columns) about the species of bird.
- Create a new data frame, `dc`, that contains only the observations (rows) from DC airports.
- Create a new data frame, `dc_birds_known`, that contains only the observations (rows) from DC airports and those where the species of bird is known.
- How many _known_ unique species of birds have been involved in accidents at DC airports?

]

---

# Week 9: .fancy[Data Wrangling]

### 1. Selecting & filtering
### 2. .orange[Sequences with pipes]

### BREAK

### 3. Creating new variables
### 4. Grouped operations

---

# Create sequences of operations with "pipes"

<br>
<center>
<img src="images/magrittePipe.jpg" width="400">
</center>
[The Treachery of Images](https://en.wikipedia.org/wiki/The_Treachery_of_Images), René Magritte

]

<center>
<img src="images/magrittr.png" width="350">
<a href = "https://magrittr.tidyverse.org/">magrittr package</a>
</center>

]

---

# Think of `%>%` as the words "...and then..."

**Without Pipes** (read from inside-out):

```r
leave_house(get_dressed(get_out_of_bed(wake_up(me))))
```

**With Pipes**:

```r
me %>%
    wake_up() %>%
    get_out_of_bed() %>%
    get_dressed() %>%
    leave_house()
```

---

# Sequence operations with pipes: `%>%`

1. Filter the band members born after 1974
2. Select only the columns `firstName` & `lastName`

**Without Pipes**:

```r
select(filter(spicegirls, yearOfBirth > 1974), firstName, lastName)
```

```
#> # A tibble: 2 × 2
#>   firstName lastName
#>   <chr>     <chr>   
#> 1 Melanie   Brown   
#> 2 Emma      Bunton
```

---

# Sequence operations with pipes: `%>%`

1. Filter the band members born after 1974
2. Select only the columns `firstName` & `lastName`

**With Pipes**:

```r
spicegirls %>%
    filter(yearOfBirth > 1974) %>%
    select(firstName, lastName)
```

```
#> # A tibble: 2 × 2
#>   firstName lastName
#>   <chr>     <chr>   
#> 1 Melanie   Brown   
#> 2 Emma      Bunton
```

---

# Think of the words "...and then..."

**Without Pipes**:

```r
select(filter(spicegirls, yearOfBirth > 1974), firstName, lastName)
```

**With Pipes**: Note that you don't need to repeat the dataframe name

```r
spicegirls %>%
    filter(yearOfBirth > 1974) %>%
    select(firstName, lastName)
```

---

# Sort rows with `arrange()`

Sort the data frame by year of birth:

```r
spicegirls %>%
    arrange(yearOfBirth)
```

```
#> # A tibble: 5 × 5
#>   firstName lastName  spice  yearOfBirth deceased
#>   <chr>     <chr>     <chr>        <dbl> <lgl>   
#> 1 Geri      Halliwell Ginger        1972 FALSE   
#> 2 Melanie   Chisholm  Sporty        1974 FALSE   
#> 3 Victoria  Beckham   Posh          1974 FALSE   
#> 4 Melanie   Brown     Scary         1975 FALSE   
#> 5 Emma      Bunton    Baby          1976 FALSE
```

---

# Sort rows with `arrange()`

Use the `desc()` function to sort in descending order:

```r
spicegirls %>%
    arrange(desc(yearOfBirth))
```

```
#> # A tibble: 5 × 5
#>   firstName lastName  spice  yearOfBirth deceased
#>   <chr>     <chr>     <chr>        <dbl> <lgl>   
#> 1 Emma      Bunton    Baby          1976 FALSE   
#> 2 Melanie   Brown     Scary         1975 FALSE   
#> 3 Melanie   Chisholm  Sporty        1974 FALSE   
#> 4 Victoria  Beckham   Posh          1974 FALSE   
#> 5 Geri      Halliwell Ginger        1972 FALSE
```

---

# Sort rows with `arrange()`

Example of filtering, arranging, and selecting:

```r
spicegirls %>%
    filter(yearOfBirth < 1975) %>%
    arrange(desc(yearOfBirth)) %>%
    select(ends_with('name'))
```

```
#> # A tibble: 3 × 2
#>   firstName lastName 
#>   <chr>     <chr>    
#> 1 Melanie   Chisholm 
#> 2 Victoria  Beckham  
#> 3 Geri      Halliwell
```

---

## Your turn

1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and `select()`, `filter()`, and `%>%` to answer the following questions:

- Create a new data frame, `dc_dawn`, that contains only the observations (rows) from DC airports that occurred at dawn.
- Create a new data frame, `dc_dawn_birds`, that contains only the observations (rows) from DC airports that occurred at dawn and only the variables (columns) about the species of bird.
- Create a new data frame, `dc_dawn_birds_known`, that contains only the observations (rows) from DC airports that occurred at dawn and only the variables (columns) about the KNOWN species of bird.
- How many _known_ unique species of birds have been involved in accidents at DC airports at dawn?

]

---

# .fancy[Break]

---

# Week 9: .fancy[Data Wrangling]

### 1. Selecting & filtering
### 2. Sequences with pipes

### BREAK

### 3. .orange[Creating new variables]
### 4. Grouped operations

---

## Create new variables with `mutate()`

---

<center>
<img src="images/horst_monsters_mutate.jpg" width="700">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---

# Create new variables with `mutate()`

Example: Use the `yearOfBirth` variable to compute the age of each band member

**Base R**:

```r
spicegirls$age <- 2022 - spicegirls$yearOfBirth
```

**dplyr**:

```r
spicegirls %>%
    mutate(age = 2022 - yearOfBirth)
```

```
#> # A tibble: 5 × 6
#>   firstName lastName  spice  yearOfBirth deceased   age
#>   <chr>     <chr>     <chr>        <dbl> <lgl>    <dbl>
#> 1 Melanie   Brown     Scary         1975 FALSE       47
#> 2 Melanie   Chisholm  Sporty        1974 FALSE       48
#> 3 Emma      Bunton    Baby          1976 FALSE       46
#> 4 Geri      Halliwell Ginger        1972 FALSE       50
#> 5 Victoria  Beckham   Posh          1974 FALSE       48
```

---

# You can _immediately_ use new variables

```r
spicegirls %>%
    mutate(
        age = 2022 - yearOfBirth,
*       meanAge  = mean(age)) # Immediately using the "age" variable
```

```
#> # A tibble: 5 × 7
#>   firstName lastName  spice  yearOfBirth deceased   age meanAge
#>   <chr>     <chr>     <chr>        <dbl> <lgl>    <dbl>   <dbl>
#> 1 Melanie   Brown     Scary         1975 FALSE       47    47.8
#> 2 Melanie   Chisholm  Sporty        1974 FALSE       48    47.8
#> 3 Emma      Bunton    Baby          1976 FALSE       46    47.8
#> 4 Geri      Halliwell Ginger        1972 FALSE       50    47.8
#> 5 Victoria  Beckham   Posh          1974 FALSE       48    47.8
```

---

# .center[Handling if/else conditions]

### .center[`ifelse(<condition>, <if TRUE>, <else>)`]

```r
spicegirls %>%
    mutate(
        yobAfter74 = ifelse(yearOfBirth > 1974, "yes", "no"))
```

```
#> # A tibble: 5 × 6
#>   firstName lastName  spice  yearOfBirth deceased yobAfter74
#>   <chr>     <chr>     <chr>        <dbl> <lgl>    <chr>     
#> 1 Melanie   Brown     Scary         1975 FALSE    yes       
#> 2 Melanie   Chisholm  Sporty        1974 FALSE    no        
#> 3 Emma      Bunton    Baby          1976 FALSE    yes       
#> 4 Geri      Halliwell Ginger        1972 FALSE    no        
#> 5 Victoria  Beckham   Posh          1974 FALSE    no
```

---

## Your turn

1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object with `%>%` and `mutate()` to create the following new variables:

- `height_miles`: The `height` variable converted to miles (Hint: there are 5,280 feet in a mile).
- `cost_mil`: Is `TRUE` if the repair costs was greater or equal to $1 million, `FALSE` otherwise.
- `season`: One of four seasons based on the `incident_month` variable:

- `spring`: March, April, May
  - `summer`: June, July, August
  - `fall`: September, October, November
  - `winter`: December, January, February

]

---

# Week 9: .fancy[Data Wrangling]

### 1. Selecting & filtering
### 2. Sequences with pipes

### BREAK

### 3. Creating new variables
### 4. .orange[Grouped operations]

---

# Split-apply-combine with `group_by`

<br>

### 1. **Split** the data into groups
### 2. **Apply** some analysis to each group
### 3. **Combine** the results

---

# Split-apply-combine with `group_by`

---

## Split-apply-combine with `group_by`

```r
bands
```

```
#> # A tibble: 9 × 5
#>   firstName lastName  yearOfBirth deceased band      
#>   <chr>     <chr>           <dbl> <lgl>    <chr>     
#> 1 Melanie   Brown            1975 FALSE    spicegirls
#> 2 Melanie   Chisholm         1974 FALSE    spicegirls
#> 3 Emma      Bunton           1976 FALSE    spicegirls
#> 4 Geri      Halliwell        1972 FALSE    spicegirls
#> 5 Victoria  Beckham          1974 FALSE    spicegirls
#> 6 John      Lennon           1940 TRUE     beatles   
#> 7 Paul      McCartney        1942 FALSE    beatles   
#> 8 Ringo     Starr            1940 FALSE    beatles   
#> 9 George    Harrison         1943 TRUE     beatles
```

---

## Split-apply-combine with `group_by`

Compute the mean band member age for **each band**

```r
bands %>%
    mutate(
        age = 2020 - yearOfBirth,
*       mean_age = mean(age)) # This is the mean across both bands
```

```
#> # A tibble: 9 × 7
#>   firstName lastName  yearOfBirth deceased band         age mean_age
#>   <chr>     <chr>           <dbl> <lgl>    <chr>      <dbl>    <dbl>
#> 1 Melanie   Brown            1975 FALSE    spicegirls    45     60.4
#> 2 Melanie   Chisholm         1974 FALSE    spicegirls    46     60.4
#> 3 Emma      Bunton           1976 FALSE    spicegirls    44     60.4
#> 4 Geri      Halliwell        1972 FALSE    spicegirls    48     60.4
#> 5 Victoria  Beckham          1974 FALSE    spicegirls    46     60.4
#> 6 John      Lennon           1940 TRUE     beatles       80     60.4
#> 7 Paul      McCartney        1942 FALSE    beatles       78     60.4
#> 8 Ringo     Starr            1940 FALSE    beatles       80     60.4
#> 9 George    Harrison         1943 TRUE     beatles       77     60.4
```

---

## Split-apply-combine with `group_by`

Compute the mean band member age for each band

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
*   group_by(band) %>% # Everything after this will be done each band
    mutate(mean_age = mean(age))
```

```
#> # A tibble: 9 × 7
#> # Groups:   band [2]
#>   firstName lastName  yearOfBirth deceased band         age mean_age
#>   <chr>     <chr>           <dbl> <lgl>    <chr>      <dbl>    <dbl>
#> 1 Melanie   Brown            1975 FALSE    spicegirls    45     45.8
#> 2 Melanie   Chisholm         1974 FALSE    spicegirls    46     45.8
#> 3 Emma      Bunton           1976 FALSE    spicegirls    44     45.8
#> 4 Geri      Halliwell        1972 FALSE    spicegirls    48     45.8
#> 5 Victoria  Beckham          1974 FALSE    spicegirls    46     45.8
#> 6 John      Lennon           1940 TRUE     beatles       80     78.8
#> 7 Paul      McCartney        1942 FALSE    beatles       78     78.8
#> 8 Ringo     Starr            1940 FALSE    beatles       80     78.8
#> 9 George    Harrison         1943 TRUE     beatles       77     78.8
```

---

# Summarize data frames with `summarise()`

---

## Summarize data frames with `summarise()`

Compute the mean band member age for **each band**

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
*   summarise(mean_age = mean(age)) # Drops all variables except for group
```

```
#> # A tibble: 2 × 2
#>   band       mean_age
#>   <chr>         <dbl>
#> 1 beatles        78.8
#> 2 spicegirls     45.8
```

---

## Summarize data frames with `summarise()`

Compute the mean, min, and max band member age for **each band**

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age))
```

```
#> # A tibble: 2 × 4
#>   band       mean_age min_age max_age
#>   <chr>         <dbl>   <dbl>   <dbl>
#> 1 beatles        78.8      77      80
#> 2 spicegirls     45.8      44      48
```

---

## Computing counts of observations with `n()`

How many members are in each band?

```r
bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age),
*       numMembers = n())
```

```
#> # A tibble: 2 × 5
#>   band       mean_age min_age max_age numMembers
#>   <chr>         <dbl>   <dbl>   <dbl>      <int>
#> 1 beatles        78.8      77      80          4
#> 2 spicegirls     45.8      44      48          5
```

---

## If you only want a quick count, use `count()`

These do the same thing:

```r
bands %>%
    group_by(band) %>%
    summarise(n = n())
```

```
#> # A tibble: 2 × 2
#>   band           n
#>   <chr>      <int>
#> 1 beatles        4
#> 2 spicegirls     5
```

]

```r
bands %>%
    count(band)
```

```
#> # A tibble: 2 × 2
#>   band           n
#>   <chr>      <int>
#> 1 beatles        4
#> 2 spicegirls     5
```

]

---

## If you only want a quick count, use `count()`

You can count multiple combinations

```r
bands %>%
    mutate(nameStartsWithG = str_detect(firstName, '^G')) %>%
    count(band, nameStartsWithG)
```

```
#> # A tibble: 4 × 3
#>   band       nameStartsWithG     n
#>   <chr>      <lgl>           <int>
#> 1 beatles    FALSE               3
#> 2 beatles    TRUE                1
#> 3 spicegirls FALSE               4
#> 4 spicegirls TRUE                1
```

---

## Your turn

1) Create the data frame object `df` by using `here()` and `read_csv()` to load the `wildlife_impacts.csv` file in the `data` folder.

2) Use the `df` object and `group_by()`, `summarise()`, `count()`, and `%>%` to answer the following questions:

- Create a summary data frame that contains the mean `height` for each different time of day.
- Create a summary data frame that contains the maximum `cost_repairs_infl_adj` for each year.
- Which _month_ has had the greatest number of reported incidents?
- Which _year_ has had the greatest number of reported incidents?

]

---

# Exporting data

```r
ageSummary <- bands %>%
    mutate(age = 2020 - yearOfBirth) %>%
    group_by(band) %>%
    summarise(
        mean_age = mean(age),
        min_age = min(age),
        max_age = max(age),
        numMembers = n())
ageSummary
```

---

# Exporting data: `here()` + `write_csv()`

Save the `ageSummary` data frame in your "data" folder:

1) Create a path to where you want to save the data

```r
library(here)
savePath <- here('data', 'ageSummary.csv')
```

2) Export the data

```r
library(readr)
write_csv(ageSummary, savePath)
```

---

# HW 9

Make sure you install the package `nycflights13`

```r
install.packages('nycflights13')
```

This package includes **5 data frames**:

```r
airlines
airports
flights
planes
weather
```