class: title-slide, middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250> </center> ] .rightcol70[ # Week 13: .fancy[Data Visualization] ### EMSE 4574: Intro to Programming for Analytics ### John Paul Helveston ### November 24, 2020 ] --- class: inverse # Quiz 6
05
:
00
.leftcol[ - ### Go to `#classroom` channel in Slack for link - ### Open up RStudio before you start - you'll probably want to use it. ] .rightcol[ <center> <img src="images/quiz_doge.png" width="400"> </center> ] --- # Before we start Make sure you have the "tidyverse" installed and loaded, and import these two data frames ```r library(tidyverse) library(here) birds <- read_csv(here('data', 'wildlife_impacts.csv')) bears <- read_csv(here('data', 'bear_killings.csv')) ``` (this is at the top of the notes.R file) --- # The Challenger disaster On January 28, 1986 the space shuttle Challenger exploded .leftcol[ <img src="images/challenger_crew.jpg"> ] .rightcol[ <img src="images/explosion.jpg"> ] --- # The Challenger disaster NASA Engineers had the data on temperature & o-ring failure .leftcol60[ <img src="images/oring_data.png" width=600> ] .rightcol40[ <img src="images/orings.png"> ] --- class: center ## What NASA was shown .leftcol60[ <img src="images/rockets_chart.png" width=600> ] .rightcol40[.left[ <br><br><br><br><br><br><br><br> Tufte, Edward R. (1997) _Visual Explanations: Images and Quantities, Evidence and Narrative_, Graphics Press, Cheshire, Connecticut.]] --- class: center # What NASA _should_ have been shown <img src="images/tufte_plot.png" width=1000> .left[Tufte, Edward R. (1997) _Visual Explanations: Images and Quantities,<br> Evidence and Narrative_, Graphics Press, Cheshire, Connecticut.] --- class: inverse, middle # Week 13: .fancy[Data Visualization] ## 1. Plotting with Base R ## 2. Plotting with **ggplot2** ## 3. Tweaking your ggplot --- class: inverse, middle # Week 13: .fancy[Data Visualization] ## 1. .orange[Plotting with Base R] ## 2. Plotting with **ggplot2** ## 3. Tweaking your ggplot --- # Today's data: ## Bear attacks in North America Explore the `bears` data frame: ```r glimpse(bears) head(bears) ``` --- class: center ## Two basic plots in R .leftcol[ ### Scatterplots <img src="slides-13-data-visualization_files/figure-html/base_scatter-1.png" width="504" /> ] .rightcol[ ### Histograms <img src="slides-13-data-visualization_files/figure-html/base_hist-1.png" width="504" /> ] --- # Scatterplots with `plot()` ### Plot relationship between two variables -- .leftcol[ General syntax: ```r plot(x = x_vector, y = y_vector) ``` ] --- # Scatterplots with `plot()` ### Plot relationship between two variables .leftcol[ General syntax: ```r plot(x = x_vector, y = y_vector) ``` Example: ```r var1 <- seq(1, 5) var2 <- 2*var1 plot(x = var1, y = var2) ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-12-1.png" width="432" /> ] --- # Scatterplots with `plot()` ### `x` and `y` must have the same length! -- ```r var2 <- var2[-1] ``` -- ```r length(var1) == length(var2) ``` ``` ## [1] FALSE ``` -- ```r plot(x = var1, y = var2) ``` ``` ## Error in xy.coords(x, y, xlabel, ylabel, log): 'x' and 'y' lengths differ ``` --- # Scatterplots with `plot()` ### Plotting variables from a data frame: -- .leftcol[ Plot `year` vs. `age`: ```r plot(x = bears$year, y = bears$age) ``` ] -- .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-16-1.png" width="504" /> ] --- # Making `plot()` pretty .leftcol[.code80[ ```r plot(x = bears$year, y = bears$age, col = 'darkblue', # Point color pch = 19, # Point shape main = "Age of victims over time", xlab = "Year", ylab = "Age of victim") ``` ]] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-17-1.png" width="504" /> ] --- class: inverse
10
:
00
## Think pair share: `plot()` Does the annual number of bird impacts appear to be changing over time? Make a plot using the `birds` data frame to justify your answer Hint: You may need to create a summary data frame to answer this question! **Bonus points**: Make your plot pretty --- # Histograms with `hist()` ### Plot the _distribution_ of a single variable .leftcol[ General syntax: ```r hist(x = x_vector) ``` ] --- # Histograms with `hist()` ### Plot the _distribution_ of a single variable .leftcol[ General syntax: ```r hist(x = x_vector) ``` Example: ```r hist(bears$month) ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-21-1.png" width="432" /> ] --- # Making `hist()` pretty .leftcol[.code80[ ```r hist(x = bears$month, breaks = 12, col = 'darkred', main = "Bear killings by month", xlab = "Month", ylab = "Count") ``` ]] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-22-1.png" width="432" /> ] --- class: inverse
10
:
00
## Think pair share: `hist()` Make plots using the `birds` data frame to answer these questions - Which months have the highest and lowest number of bird impacts in the dataset? - Which aircrafts experience more impacts: 2-engine, 3-engine, or 4-engine? - At what height do most impacts occur? **Bonus points**: Make your plots pretty --- class: inverse, middle # Week 13: .fancy[Data Visualization] ## 1. Plotting with Base R ## 2. .orange[Plotting with **ggplot2**] ## 3. Tweaking your ggplot --- class: center ## Advanced figures with `ggplot2` <center> <img src="images/horst_monsters_ggplot2.png" width=600> </center>Art by [Allison Horst](https://www.allisonhorst.com/) --- .leftcol[ <img src="images/making_a_ggplot.jpeg" width=600> ] .rightcol[ # "Grammar of Graphics" Concept developed by Leland Wilkinson (1999) **ggplot2** package developed by Hadley Wickham (2005) ] --- # Making plot layers with ggplot2 <br> ### 1. The data (we'll use `bears`) ### 2. The aesthetic mapping (what goes on the axes?) ### 3. The geometries (points? bars? etc.) --- # Layer 1: The data The `ggplot()` function initializes the plot with whatever data you're using .leftcol[ ```r ggplot(data = bears) ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-24-1.png" width="504" /> ]] --- # Layer 2: The aesthetic mapping The `aes()` function determines which variables will be _mapped_ to the geometries<br>(e.g. the axes) .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-25-1.png" width="504" /> ]] --- # Layer 3: The geometries Use `+` to add geometries (e.g. points) .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + * geom_point() ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-26-1.png" width="504" /> ]] --- # Other common geometries - `geom_point()`: scatter plots - `geom_line()`: lines connecting data points - `geom_col()`: bar charts - `geom_boxplot()`: boxes for boxplots --- # Scatterplots with `geom_point()` Add points: .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + * geom_point() ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-27-1.png" width="504" /> ]] --- # Scatterplots with `geom_point()` Change the color of all points: .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + * geom_point(color = 'blue') ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-28-1.png" width="504" /> ]] --- # Scatterplots with `geom_point()` Map the point color to a **variable**: .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + * geom_point(aes(color = gender)) ``` Note that `color = gender` is _inside_ `aes()` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-29-1.png" width="504" /> ]] --- # Scatterplots with `geom_point()` Adjust labels with `labs()` layer: .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point(aes(color = gender)) + * labs(x = "Year", * y = "Age", * title = "Bear victim age over time", * color = "Gender") ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-30-1.png" width="504" /> ]] --- class: inverse
10
:
00
## Think pair share: `geom_point()` Use the `birds` data frame to create the following plots .leftcol[ <img src="slides-13-data-visualization_files/figure-html/ggpoint_p1-1.png" width="504" /> ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/ggpoint_p2-1.png" width="504" /> ] --- class: inverse, center # .fancy[Break]
05
:
00
--- ## Make bar charts with `geom_col()` With bar charts, you'll often need to create summary variables to plot -- .leftcol[ Step 1: Summarize the data ```r bear_months <- bears %>% * count(month) ``` Step 2: Make the plot ```r ggplot(bear_months) + * geom_col(aes(x = month, y = n)) ``` ] .rightcol[ Example: count of attacks by month <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-33-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## Make bar charts with `geom_col()` Alternative approach: piping directly into ggplot .leftcol[ ```r bears %>% * count(month) %>% # Pipe into ggplot ggplot() + geom_col(aes(x = month, y = n)) ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-34-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## Be careful with `geom_col()` vs. `geom_bar()` .leftcol[ ### .center[`geom_col()`] Map both `x` and `y` ```r bears %>% * count(month) %>% ggplot() + * geom_col(aes(x = month, y = n)) ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-35-1.png" width="288" /> ] .rightcol[ ### .center[`geom_bar()`] Only map `x` (`y` is computed) ```r bears %>% ggplot() + * geom_bar(aes(x = month)) ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-36-1.png" width="288" /> ] --- ## Make bar charts with `geom_col()` .leftcol[ Another example:<br>Mean age of victim in each year ```r bears %>% filter(!is.na(age)) %>% group_by(year) %>% * summarise(meanAge = mean(age)) %>% ggplot() + * geom_col(aes(x = year, y = meanAge)) ``` ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-37-1.png" width="504" /> ]] --- ### Change bar width: `width` ### Change bar color: `fill` ### Change bar outline: `color` .leftcol[ ```r bears %>% count(month) %>% ggplot() + geom_col(aes(x = month, y = n), * width = 0.7, * fill = "blue", * color = "red") ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-38-1.png" width="504" style="display: block; margin: auto;" /> ] --- ### Map the `fill` to `bearType` .leftcol[.code70[ ```r bears %>% * count(month, bearType) %>% ggplot() + geom_col(aes(x = month, y = n, * fill = bearType)) ``` ] Note that I had to summarize the count by both `month` and `bearType` .code70[ ```r bears %>% count(month, bearType) ``` ] .code60[ ``` ## # A tibble: 27 x 3 ## month bearType n ## <dbl> <chr> <int> ## 1 1 Brown 1 ## 2 1 Polar 2 ## 3 2 Brown 1 ## 4 3 Brown 1 ## 5 4 Black 1 ## 6 4 Brown 3 ## 7 5 Black 15 ## 8 5 Brown 2 ## 9 5 Polar 1 ## 10 6 Black 10 ## # … with 17 more rows ``` ]] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-41-1.png" width="504" style="display: block; margin: auto;" /> ] --- # "Factors" = Categorical variables By default, R makes numeric variables _continuous_ .leftcol[ ```r bears %>% count(month) %>% ggplot() + geom_col(aes(x = month, y = n)) ``` **The variable `month` is a _number_** ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-43-1.png" width="504" style="display: block; margin: auto;" /> ]] --- # "Factors" = Categorical variables You can make a continuous variable _categorical_ using `as.factor()` .leftcol[.code80[ ```r bears %>% count(month) %>% ggplot() + * geom_col(aes(x = as.factor(month), y = n)) ``` ] **The variable `month` is a _factor_** ] .rightcol[.blackborder[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-44-1.png" width="504" /> ]] --- class: inverse
15
:
00
## Think pair share: `geom_col()` Use the `bears` and `birds` data frame to create the following plots .leftcol[ <img src="slides-13-data-visualization_files/figure-html/ggbar_p1-1.png" width="504" /> ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/ggbar_p2-1.png" width="504" /> ] --- class: inverse, middle # Week 13: .fancy[Data Visualization] ## 1. Plotting with Base R ## 2. Plotting with **ggplot2** ## 3. .orange[Tweaking your ggplot] --- # Working with themes Themes change _global_ features of your plot, like the background color, grid lines, etc. -- .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-46-1.png" width="504" /> ] --- # Working with themes Themes change _global_ features of your plot, like the background color, grid lines, etc. .leftcol[ ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_bw() ``` ] .rightcol[ <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-47-1.png" width="504" /> ] --- ### Common themes .leftcol[ `theme_bw()` ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_bw() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-48-1.png" width="432" /> ] .rightcol[ `theme_minimal()` ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_minimal() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-49-1.png" width="432" /> ] --- ### Common themes .leftcol[ `theme_classic()` ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_classic() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-50-1.png" width="432" /> ] .rightcol[ `theme_void()` ```r ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_void() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-51-1.png" width="432" /> ] --- ### Other themes: [**hrbrthemes**](https://github.com/hrbrmstr/hrbrthemes) .leftcol[ ```r *library(hrbrthemes) ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_ipsum() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-52-1.png" width="432" /> ] .rightcol[ ```r *library(hrbrthemes) ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_ft_rc() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-53-1.png" width="432" /> ] --- ### Other themes: **ggthemes** .leftcol[ ```r *library(ggthemes) ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_economist() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-54-1.png" width="432" /> ] .rightcol[ ```r *library(ggthemes) ggplot(data = bears, aes(x = year, y = age)) + geom_point() + * theme_economist_white() ``` <img src="slides-13-data-visualization_files/figure-html/unnamed-chunk-55-1.png" width="432" /> ] --- # Save figures with `ggsave()` -- First, assign the plot to an object name: ```r scatterPlot <- ggplot(data = bears) + geom_point(aes(x = year, y = age)) ``` -- Then use `ggsave()` to save the plot: ```r ggsave(filename = here('plots', 'scatterPlot.png'), plot = scatterPlot, width = 6, # inches height = 4) ``` --- class: inverse # Extra practice 1 Use the `mtcars` data frame to create the following plots .cols3[ <img src="slides-13-data-visualization_files/figure-html/mtcars_1-1.png" width="324" /> ] .cols3[ <img src="slides-13-data-visualization_files/figure-html/mtcars_2-1.png" width="324" /> ] .cols3[ <img src="slides-13-data-visualization_files/figure-html/mtcars_3-1.png" width="324" /> ] --- class: inverse # Extra practice 2 Use the `mpg` data frame to create the following plot <img src="slides-13-data-visualization_files/figure-html/mtcars_4-1.png" width="576" />