Week 11: Data Frames

.leftcol30[
<center>
<img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250>
</center>
]
.rightcol70[
# Week 11: .fancy[Data Frames]

### EMSE 4574: Intro to Programming for Analytics
### John Paul Helveston
### November 10, 2020
]

---

# Quiz 5

.leftcol[
- ### Go to `#classroom` channel in Slack for link
- ### Open up RStudio before you start - you'll probably want to use it.
]
.rightcol[
<center>
<img src="images/quiz_doge.png" width="400">
</center>
]

---
# Before we start

Make sure you have these packages installed and loaded:

```r
install.packages("stringr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("readr")
```

(they're at the top of the notes.R file)

Remember: you only need to install them once!

---
.leftcol[
## "The purpose of computing is insight, not numbers"
### - [Richard Hamming](https://en.wikipedia.org/wiki/Richard_Hamming)
]
.rightcol[
<img src="images/Richard_Hamming.jpg" width="400">
]

---
class: inverse, middle

# Week 11: .fancy[Data Frames]

## 1. Basics
## 2. Slicing
## 3. External data

---

# Week 11: .fancy[Data Frames]

## 1. .orange[Basics]
## 2. Slicing
## 3. External data

---
# The data frame...in Excel

---
# The data frame...in R

```r
beatles <- tibble(
    firstName   = c("John", "Paul", "Ringo", "George"),
    lastName    = c("Lennon", "McCartney", "Starr", "Harrison"),
    instrument  = c("guitar", "bass", "drums", "guitar"),
    yearOfBirth = c(1940, 1942, 1940, 1943),
    deceased    = c(TRUE, FALSE, FALSE, TRUE)
)
beatles
```

```
## # A tibble: 4 x 5
##   firstName lastName  instrument yearOfBirth deceased
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 John      Lennon    guitar            1940 TRUE    
## 2 Paul      McCartney bass              1942 FALSE   
## 3 Ringo     Starr     drums             1940 FALSE   
## 4 George    Harrison  guitar            1943 TRUE
```

---
# The data frame...in RStudio

```r
View(beatles)
```
]
<img src="images/dataframe.png" width="700">

---
## **Columns**: _Vectors_ of values (must be same data type)

```r
beatles
```

---
## **Rows**: Information about individual observations

```r
beatles
```

```
## # A tibble: 1 x 5
##   firstName lastName instrument yearOfBirth deceased
##   <chr>     <chr>    <chr>            <dbl> <lgl>   
## 1 John      Lennon   guitar            1940 TRUE
```

---
## Make a data frame with `data.frame()`

```r
beatles <- data.frame(
    firstName   = c("John", "Paul", "Ringo", "George"),
    lastName    = c("Lennon", "McCartney", "Starr", "Harrison"),
    instrument  = c("guitar", "bass", "drums", "guitar"),
    yearOfBirth = c(1940, 1942, 1940, 1943),
    deceased    = c(TRUE, FALSE, FALSE, TRUE)
)
```
--

```r
beatles
```

```
##   firstName  lastName instrument yearOfBirth deceased
## 1      John    Lennon     guitar        1940     TRUE
## 2      Paul McCartney       bass        1942    FALSE
## 3     Ringo     Starr      drums        1940    FALSE
## 4    George  Harrison     guitar        1943     TRUE
```

---
## Make a data frame with `tibble()`

```r
library(dplyr)
```

```r
beatles
```

---
## Why I use `tibble()` instead of `data.frame()`

--
1. The `tibble()` shows the **dimensions** and **data type**.

--
2. A tibble will only print the first few rows of data when you enter the object name
Example: `faithful` vs. `as_tibble(faithful)`

--
3. Columns of class `character` are _never_ converted into factors (don't worry about this for now...just know that tibbles make life easier when dealing with character type columns).

**Note**: I use the word **"data frame"** to refer to both `tibble()` and `data.frame()` objects

---
## Data frame vectors must have the same length

```r
beatles <- tibble(
*   firstName   = c("John", "Paul", "Ringo", "George", "Bob"), # Added "Bob"
    lastName    = c("Lennon", "McCartney", "Starr", "Harrison"),
    instrument  = c("guitar", "bass", "drums", "guitar"),
    yearOfBirth = c(1940, 1942, 1940, 1943),
    deceased    = c(TRUE, FALSE, FALSE, TRUE)
)
```

```
## Error: Tibble columns must have compatible sizes.
## * Size 5: Existing data.
## * Size 4: Column `lastName`.
## ℹ Only values of size one are recycled.
```

---
## Use `NA` for missing values

```r
beatles <- tibble(
    firstName   = c("John", "Paul", "Ringo", "George", "Bob"), # Added "Bob"
*   lastName    = c("Lennon", "McCartney", "Starr", "Harrison", NA),
*   instrument  = c("guitar", "bass", "drums", "guitar", NA),
*   yearOfBirth = c(1940, 1942, 1940, 1943, NA),
*   deceased    = c(TRUE, FALSE, FALSE, TRUE, NA)
)
```
--

```r
beatles
```

```
## # A tibble: 5 x 5
##   firstName lastName  instrument yearOfBirth deceased
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 John      Lennon    guitar            1940 TRUE    
## 2 Paul      McCartney bass              1942 FALSE   
## 3 Ringo     Starr     drums             1940 FALSE   
## 4 George    Harrison  guitar            1943 TRUE    
## 5 Bob       <NA>      <NA>                NA NA
```

---
# Dimensions: `nrow()`, `ncol()`, & `dim()`

```r
nrow(beatles) # Number of rows
```

```
## [1] 5
```

```r
ncol(beatles) # Number of columns
```

```
## [1] 5
```

```r
dim(beatles)  # Number of rows and columns
```

```
## [1] 5 5
```

---
## Use `names()` to see which variables a data frame has

Get the names of columns:

```r
names(beatles)
```

```
## [1] "firstName"   "lastName"    "instrument"  "yearOfBirth" "deceased"
```

```r
colnames(beatles)
```

```
## [1] "firstName"   "lastName"    "instrument"  "yearOfBirth" "deceased"
```
--
Get the names of rows (rarely needed):

```r
rownames(beatles)
```

```
## [1] "1" "2" "3" "4" "5"
```

---
# Changing the column names

Change the column names with `names()` or `colnames()`:

```r
names(beatles) <- c('one', 'two', 'three', 'four', 'five')
beatles
```

```
## # A tibble: 5 x 5
##   one    two       three   four five 
##   <chr>  <chr>     <chr>  <dbl> <lgl>
## 1 John   Lennon    guitar  1940 TRUE 
## 2 Paul   McCartney bass    1942 FALSE
## 3 Ringo  Starr     drums   1940 FALSE
## 4 George Harrison  guitar  1943 TRUE 
## 5 Bob    <NA>      <NA>      NA NA
```

---
# Changing the column names

Make all the column names upper-case:

```r
colnames(beatles) <- stringr::str_to_upper(colnames(beatles))
beatles
```

```
## # A tibble: 5 x 5
##   FIRSTNAME LASTNAME  INSTRUMENT YEAROFBIRTH DECEASED
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 John      Lennon    guitar            1940 TRUE    
## 2 Paul      McCartney bass              1942 FALSE   
## 3 Ringo     Starr     drums             1940 FALSE   
## 4 George    Harrison  guitar            1943 TRUE    
## 5 Bob       <NA>      <NA>                NA NA
```

---
## Combine data frames by columns using `bind_cols()`

Note: `bind_cols()` is from the **dplyr** library

```r
names <- tibble(
    firstName = c("John", "Paul", "Ringo", "George"),
    lastName  = c("Lennon", "McCartney", "Starr", "Harrison"))

instruments <- tibble(
    instrument = c("guitar", "bass", "drums", "guitar"))
```
--

```r
bind_cols(names, instruments)
```

```
## # A tibble: 4 x 3
##   firstName lastName  instrument
##   <chr>     <chr>     <chr>     
## 1 John      Lennon    guitar    
## 2 Paul      McCartney bass      
## 3 Ringo     Starr     drums     
## 4 George    Harrison  guitar
```

---
## Combine data frames by rows using `bind_rows()`

Note: `bind_rows()` is from the **dplyr** library

```r
members1 <- tibble(
    firstName = c("John", "Paul"),
    lastName  = c("Lennon", "McCartney"))

members2 <- tibble(
    firstName = c("Ringo", "George"),
    lastName  = c("Starr", "Harrison"))
```
--

```r
bind_rows(members1, members2)
```

```
## # A tibble: 4 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 John      Lennon   
## 2 Paul      McCartney
## 3 Ringo     Starr    
## 4 George    Harrison
```

---
## Note: `bind_rows()` requires the **same** columns names:

```r
*colnames(members2) <- c("firstName", "LastName")
bind_rows(members1, members2)
```

```
## # A tibble: 4 x 3
##   firstName lastName  LastName
##   <chr>     <chr>     <chr>   
## 1 John      Lennon    <NA>    
## 2 Paul      McCartney <NA>    
## 3 Ringo     <NA>      Starr   
## 4 George    <NA>      Harrison
```
Note how `<NA>`s were created

---

## Quick practice

Answer these questions using the `animals_farm` and `animals_pet` data frames:

1. Write code to find how many _rows_ are in the `animals_farm` data frame?
2. Write code to find how many _columns_ are in the `animals_pet` data frame?
3. Create a new data frame, `animals`, by combining `animals_farm` and `animals_pet`.
4. Change the column names of `animals` to title case.

---

# Week 11: .fancy[Data Frames]

## 1. Basics
## 2. .orange[Slicing]
## 3. External data

---
## &zwj; Access data frame columns using the `$` symbol

```r
beatles$firstName
```

```
## [1] "John"   "Paul"   "Ringo"  "George"
```
--

```r
beatles$lastName
```

```
## [1] "Lennon"    "McCartney" "Starr"     "Harrison"
```

---
# Creating new variables with the `$` symbol

--
Add the hometown of the bandmembers:

```r
beatles$hometown <- 'Liverpool'
beatles
```

```
## # A tibble: 4 x 6
##   firstName lastName  instrument yearOfBirth deceased hometown 
##   <chr>     <chr>     <chr>            <dbl> <lgl>    <chr>    
## 1 John      Lennon    guitar            1940 TRUE     Liverpool
## 2 Paul      McCartney bass              1942 FALSE    Liverpool
## 3 Ringo     Starr     drums             1940 FALSE    Liverpool
## 4 George    Harrison  guitar            1943 TRUE     Liverpool
```

---
# Creating new variables with the `$` symbol

--
Add a new `alive` variable:

```r
beatles$alive <- c(FALSE, TRUE, TRUE, FALSE)
beatles
```

```
## # A tibble: 4 x 7
##   firstName lastName  instrument yearOfBirth deceased hometown  alive
##   <chr>     <chr>     <chr>            <dbl> <lgl>    <chr>     <lgl>
## 1 John      Lennon    guitar            1940 TRUE     Liverpool FALSE
## 2 Paul      McCartney bass              1942 FALSE    Liverpool TRUE 
## 3 Ringo     Starr     drums             1940 FALSE    Liverpool TRUE 
## 4 George    Harrison  guitar            1943 TRUE     Liverpool FALSE
```

---
## You can compute new variables from current ones

--
Compute and add the age of the bandmembers:

```r
beatles$age <- 2020 - beatles$yearOfBirth
beatles
```

```
## # A tibble: 4 x 8
##   firstName lastName  instrument yearOfBirth deceased hometown  alive
##   <chr>     <chr>     <chr>            <dbl> <lgl>    <chr>     <lgl>
## 1 John      Lennon    guitar            1940 TRUE     Liverpool FALSE
## 2 Paul      McCartney bass              1942 FALSE    Liverpool TRUE 
## 3 Ringo     Starr     drums             1940 FALSE    Liverpool TRUE 
## 4 George    Harrison  guitar            1943 TRUE     Liverpool FALSE
##     age
##   <dbl>
## 1    80
## 2    78
## 3    80
## 4    77
```

---
## Access elements by index: `DF[row, column]`

General form for indexing elements:

```r
DF[row, column]
```
--
.leftcol[
Select the element in row 1, column 2:

```r
beatles[1, 2]
```

```
## # A tibble: 1 x 1
##   lastName
##   <chr>   
## 1 Lennon
```
]
--
.rightcol[
Select the elements in rows 1 & 2 and columns 2 & 3:

```r
beatles[c(1, 2), c(2, 3)]
```

```
## # A tibble: 2 x 2
##   lastName  instrument
##   <chr>     <chr>     
## 1 Lennon    guitar    
## 2 McCartney bass
```
]

---
## Leave row or column "blank" to select all

```r
beatles[c(1, 2),] # Selects all COLUMNS for rows 1 & 2
```

```
## # A tibble: 2 x 5
##   firstName lastName  instrument yearOfBirth deceased
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 John      Lennon    guitar            1940 TRUE    
## 2 Paul      McCartney bass              1942 FALSE
```
--

```r
beatles[,c(1, 2)] # Selects all ROWS for columns 1 & 2
```

```
## # A tibble: 4 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 John      Lennon   
## 2 Paul      McCartney
## 3 Ringo     Starr    
## 4 George    Harrison
```

---
## Negative indices exclude row / column

```r
beatles[-1, ] # Select all ROWS except the first
```

```
## # A tibble: 3 x 5
##   firstName lastName  instrument yearOfBirth deceased
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 Paul      McCartney bass              1942 FALSE   
## 2 Ringo     Starr     drums             1940 FALSE   
## 3 George    Harrison  guitar            1943 TRUE
```
--

```r
beatles[,-1] # Select all COLUMNS except the first
```

```
## # A tibble: 4 x 4
##   lastName  instrument yearOfBirth deceased
##   <chr>     <chr>            <dbl> <lgl>   
## 1 Lennon    guitar            1940 TRUE    
## 2 McCartney bass              1942 FALSE   
## 3 Starr     drums             1940 FALSE   
## 4 Harrison  guitar            1943 TRUE
```

---
# You can select columns by their names

Note: you don't need the comma to select an entire column

--
.leftcol[
One column

```r
beatles['firstName']
```

```
## # A tibble: 4 x 1
##   firstName
##   <chr>    
## 1 John     
## 2 Paul     
## 3 Ringo    
## 4 George
```
]
--
<br>Multiple columns
.rightcol[

```r
beatles[c('firstName', 'lastName')]
```

```
## # A tibble: 4 x 2
##   firstName lastName 
##   <chr>     <chr>    
## 1 John      Lennon   
## 2 Paul      McCartney
## 3 Ringo     Starr    
## 4 George    Harrison
```
]

---
## Use logical indices to _filter_ rows

--
**Which Beatles members are still alive?**<br>Create a logical vector using the `deceased` column:

```r
beatles$deceased == FALSE
```

```
## [1] FALSE  TRUE  TRUE FALSE
```
--
Insert this logical vector in the ROW position of `beatles[,]`:

```r
beatles[beatles$deceased == FALSE,]
```

```
## # A tibble: 2 x 5
##   firstName lastName  instrument yearOfBirth deceased
##   <chr>     <chr>     <chr>            <dbl> <lgl>   
## 1 Paul      McCartney bass              1942 FALSE   
## 2 Ringo     Starr     drums             1940 FALSE
```

---
class: inverse

## Think-Pair-Share

Answer these questions using the `beatles` data frame:

1. Create a new column, `playsGuitar`, which is `TRUE` if the band member plays the guitar and `FALSE` otherwise.
2. Filter the data frame to select only the rows for the band members who have four-letter first names.
3. Create a new column, `fullName`, which contains the band member's first and last name separated by a space (e.g. `"John Lennon"`)

---
class: inverse, center

# .fancy[Break]

---

# Week 11: .fancy[Data Frames]

## 1. Basics
## 2. Slicing
## 3. .orange[External data]

---
# Getting data into R

<br>
## 1. Load external packages
## 2. Read in external files (usually .csv files)

---
## Getting the data from an R package

```r
library(ggplot2)
```
--

```r
data(package = "ggplot2")
```

```
Data sets in package ‘ggplot2’:

diamonds                Prices of over 50,000 round cut diamonds
economics               US economic time series
economics_long          US economic time series
faithfuld               2d density estimate of Old Faithful data
luv_colours             'colors()' in Luv space
midwest                 Midwest demographics
mpg                     Fuel economy data from 1999 to 2008 for 38
                        popular models of cars
msleep                  An updated and expanded version of the mammals
                        sleep dataset
presidential            Terms of 11 presidents from Eisenhower to Obama
seals                   Vector field of seal movements
txhousing               Housing sales in TX
```

---
# Find out about package data sets with `?`

```r
?msleep
```

```
msleep {ggplot2}

An updated and expanded version of the mammals sleep dataset

Description

This is an updated and expanded version of the mammals sleep dataset. Updated sleep times and weights were taken from V. M. Savage and G. B. West. A quantitative, theoretical framework for understanding mammalian sleep. Proceedings of the National Academy of Sciences, 104 (3):1051-1056, 2007.
```

---
# Previewing data frames: `msleep`

--
Look at the data in a "spreadsheet"-like way:

```r
View(msleep)
```
This is "read-only" so you can't corrupt the data 😄

---
# My favorite quick summary: `glimpse()`

Preview each variable with `str()` or `glimpse()`

```r
glimpse(msleep)
```
.code80[

```
## Rows: 83
## Columns: 11
## $ name         <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Grea…
## $ genus        <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bo…
## $ vore         <chr> "carni", "omni", "herbi", "omni", "herbi", "herbi…
## $ order        <chr> "Carnivora", "Primates", "Rodentia", "Soricomorph…
## $ conservation <chr> "lc", NA, "nt", "lc", "domesticated", NA, "vu", N…
## $ sleep_total  <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1…
## $ sleep_rem    <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.…
## $ sleep_cycle  <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.38…
## $ awake        <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9,…
## $ brainwt      <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.…
## $ bodywt       <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.4…
```
]

---
## Also very useful for quick checks: `head()` and `tail()`

```r
head(msleep)
```
.code40[

```
## # A tibble: 6 x 11
##   name                       genus      vore  order        conservation
##   <chr>                      <chr>      <chr> <chr>        <chr>       
## 1 Cheetah                    Acinonyx   carni Carnivora    lc          
## 2 Owl monkey                 Aotus      omni  Primates     <NA>        
## 3 Mountain beaver            Aplodontia herbi Rodentia     nt          
## 4 Greater short-tailed shrew Blarina    omni  Soricomorpha lc          
## 5 Cow                        Bos        herbi Artiodactyla domesticated
## 6 Three-toed sloth           Bradypus   herbi Pilosa       <NA>        
##   sleep_total sleep_rem sleep_cycle awake  brainwt  bodywt
##         <dbl>     <dbl>       <dbl> <dbl>    <dbl>   <dbl>
## 1        12.1      NA        NA      11.9 NA        50    
## 2        17         1.8      NA       7    0.0155    0.48 
## 3        14.4       2.4      NA       9.6 NA         1.35 
## 4        14.9       2.3       0.133   9.1  0.00029   0.019
## 5         4         0.7       0.667  20    0.423   600    
## 6        14.4       2.2       0.767   9.6 NA         3.85
```
]]
.rightcol[
View the **last** 6 rows with `tail()`

```r
tail(msleep)
```
.code40[

```
## # A tibble: 6 x 11
##   name                 genus    vore  order        conservation
##   <chr>                <chr>    <chr> <chr>        <chr>       
## 1 Tenrec               Tenrec   omni  Afrosoricida <NA>        
## 2 Tree shrew           Tupaia   omni  Scandentia   <NA>        
## 3 Bottle-nosed dolphin Tursiops carni Cetacea      <NA>        
## 4 Genet                Genetta  carni Carnivora    <NA>        
## 5 Arctic fox           Vulpes   carni Carnivora    <NA>        
## 6 Red fox              Vulpes   carni Carnivora    <NA>        
##   sleep_total sleep_rem sleep_cycle awake brainwt  bodywt
##         <dbl>     <dbl>       <dbl> <dbl>   <dbl>   <dbl>
## 1        15.6       2.3      NA       8.4  0.0026   0.9  
## 2         8.9       2.6       0.233  15.1  0.0025   0.104
## 3         5.2      NA        NA      18.8 NA      173.   
## 4         6.3       1.3      NA      17.7  0.0175   2    
## 5        12.5      NA        NA      11.5  0.0445   3.38 
## 6         9.8       2.4       0.35   14.2  0.0504   4.23
```
]]

---
# Importing an external data file

<br>
Note the `data.csv` file in your `data` folder.

- **DO NOT** double-click it!
- **DO NOT** open it in Excel!

PSA: Excel can **corrupt** your data

---
# Steps to importing external data files

--
## 1. Create a path to the data

```r
library(here)
*pathToData <- here('data', 'data.csv')
pathToData
```

```
## [1] "/Users/jhelvy/gh/0gw/P4A/2020-Fall/class/11-data-frames/data/data.csv"
```
--
## 2. Import the data

```r
library(readr)
*df <- read_csv(pathToData)
```

---
## PSA: Use the **here** package to make file paths

The `here()` function builds the path to your **root** to your _working directory_ <br>(this is where your `.Rproj` file lives!)

```r
here()
```

```
## [1] "/Users/jhelvy/gh/0gw/P4A/2020-Fall/class/11-data-frames"
```
--
The `here()` function builds the path to files _inside_ your working directory

```r
pathToData <- here('data', 'data.csv')
pathToData
```

```
## [1] "/Users/jhelvy/gh/0gw/P4A/2020-Fall/class/11-data-frames/data/data.csv"
```

---
# Avoid hard-coding file paths!

### (they can break on different computers)

```r
pathToData <- 'data/data.csv'
pathToData
```

```
## [1] "data/data.csv"
```
# 💩💩💩

---
class: center

.leftcol40[.left[
## PSA:<br>Use the **here** package to make file paths
]]
.rightcol60[
<center><br>
<img src="images/horst_monsters_here.png">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)
]

---
# Back to reading in data

```r
pathToData <- here('data', 'data.csv')
*df <- read_csv(pathToData)
```

**Important**: Note the use of `read_csv()` instead of `read.csv()`

I recommend `read_csv()`...it is usually more robust

---
class: inverse

## Think-Pair-Share

.font90[
1) Use the `here()` and `read_csv()` functions to load the `data.csv` file that is in the `data` folder. Name the data frame object `df`.

2) Use the `df` object to answer the following questions:

- How many rows and columns are in the data frame?
- What type of data is each column?
- Preview the different columns - what do you think this data is about? What might one row represent?
- How many unique airports are in the data frame?
- What is the earliest and latest observation in the data frame?
- What is the lowest and highest cost of any one repair in the data frame?
]

---
class: center
## Next week: better data wrangling with **dplyr**

<center>
<img src="images/horst_monsters_data_wrangling.png" width="600">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---
# Select rows with `filter()`

Example: Filter rows to find which Beatles members are still alive?

--
Base R:

```r
beatles[beatles$deceased == FALSE,]
```

--
&zwj;**dplyr**:

```r
filter(beatles, deceased == FALSE)
```

---
# In 2 weeks: plotting with **ggplot2**

```
## # A tibble: 11 x 2
##     brainwt   bodywt
##       <dbl>    <dbl>
##  1 0.001       0.06 
##  2 0.0066      1    
##  3 0.000140    0.005
##  4 0.0108      3.5  
##  5 0.0123      2.95 
##  6 0.0063      1.7  
##  7 4.60     2547    
##  8 0.000300    0.023
##  9 0.655     521    
## 10 0.419     187    
## 11 0.0035      0.77
```
]
.rightcol[
## ...into _information_
<img src="slides-11-data-frames_files/figure-html/unnamed-chunk-76-1.png" width="468" />
]

---
# A note about HW9

- You have what you need to start now.
- It will be _much_ easier if you use the **dplyr** functions (i.e. read ahead).