+ - 0:00:00
Notes for current slide
Notes for next slide

Week 10: Data Visualization

EMSE 4571 / 6571: Intro to Programming for Analytics

John Paul Helveston

April 03, 2025

1 / 49

Quiz 8

10:00

Write your name on the quiz!

Rules:

  • Work alone; no outside help of any kind is allowed.
  • No calculators, no notes, no books, no computers, no phones.
2 / 49

Week 10: Data Visualization

1. Plotting with Base R

2. Plotting with ggplot2: Part 1

BREAK

3. Plotting with ggplot2: Part 2

4. Tweaking your ggplot

3 / 49

Week 10: Data Visualization

1. Plotting with Base R

2. Plotting with ggplot2: Part 1

BREAK

3. Plotting with ggplot2: Part 2

4. Tweaking your ggplot

4 / 49

Today's data:

Bear attacks in North America

Explore the bears data frame:

glimpse(bears)
head(bears)
5 / 49

Two basic plots in R

Scatterplots

Histograms

6 / 49

Scatterplots with plot()

Plot relationship between two variables

7 / 49

Scatterplots with plot()

Plot relationship between two variables

General syntax:

plot(x = x_vector, y = y_vector)
7 / 49

Scatterplots with plot()

Plot relationship between two variables

General syntax:

plot(x = x_vector, y = y_vector)

Example:

var1 <- bears$year
var2 <- bears$age
plot(x = var1, y = var2)

8 / 49

Scatterplots with plot()

x and y must have the same length!

9 / 49

Scatterplots with plot()

x and y must have the same length!

var2 <- var2[-1]
9 / 49

Scatterplots with plot()

x and y must have the same length!

var2 <- var2[-1]
length(var1) == length(var2)
#> [1] FALSE
9 / 49

Scatterplots with plot()

x and y must have the same length!

var2 <- var2[-1]
length(var1) == length(var2)
#> [1] FALSE
plot(x = var1, y = var2)
#> Error in xy.coords(x, y, xlabel, ylabel, log): 'x' and 'y' lengths differ
9 / 49

Making plot() pretty

plot(
x = bears$year,
y = bears$age,
col = 'darkblue', # Point color
pch = 19, # Point shape
main = "Age of victims over time",
xlab = "Year",
ylab = "Age of victim"
)

10 / 49
10:00

Your turn: plot()

Does the annual number of bird impacts appear to be changing over time?

Make a plot using the birds data frame to justify your answer.


Hint: You may need to create a summary data frame to answer this question!

Bonus: Make your plot pretty!

11 / 49

Histograms with hist()

Plot the distribution of a single variable

General syntax:

hist(x = x_vector)
12 / 49

Histograms with hist()

Plot the distribution of a single variable

General syntax:

hist(x = x_vector)

Example:

hist(bears$month)

13 / 49

Making hist() pretty

hist(
x = bears$month,
breaks = 12,
col = 'darkred',
main = "Bear killings by month",
xlab = "Month",
ylab = "Count"
)

14 / 49
10:00

Your turn: hist()

Make plots using the birds data frame to answer these questions

  1. Which months have the highest and lowest number of bird impacts in the dataset?
  2. Which aircrafts experience more impacts: 2-engine, 3-engine, or 4-engine?
  3. At what height do most impacts occur?

Bonus: Make your plots pretty!

15 / 49

Week 10: Data Visualization

1. Plotting with Base R

2. Plotting with ggplot2: Part 1

BREAK

3. Plotting with ggplot2: Part 2

4. Tweaking your ggplot

16 / 49

Better figures with ggplot2

Art by Allison Horst
17 / 49

"Grammar of Graphics"

Concept developed by Leland Wilkinson (1999)

ggplot2 package developed by Hadley Wickham (2005)

18 / 49

Making plot layers with ggplot2


1. The data (we'll use bears)

2. The aesthetic mapping (what goes on the axes?)

3. The geometries (points? bars? etc.)

19 / 49

Layer 1: The data

The ggplot() function initializes the plot with whatever data you're using

ggplot(data = bears)

20 / 49

Layer 2: The aesthetic mapping

The aes() function determines which variables will be mapped to the geometries
(e.g. the axes)

ggplot(
data = bears,
mapping = aes(x = year, y = age))

21 / 49

Layer 3: The geometries

Use + to add geometries (e.g. points)

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point()

22 / 49

Other common geometries

  • geom_point(): scatter plots
  • geom_line(): lines connecting data points
  • geom_col(): bar charts
  • geom_boxplot(): boxes for boxplots
23 / 49

Scatterplots with geom_point()

Add points:

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point()

24 / 49

Scatterplots with geom_point()

Change the color of all points:

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point(color = 'blue')

25 / 49

Scatterplots with geom_point()

Map the point color to a variable:

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point(aes(color = gender))

Note that color = gender is inside aes()

26 / 49

Scatterplots with geom_point()

Adjust labels with labs() layer:

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point(aes(color = gender)) +
labs(
x = "Year",
y = "Age",
title = "Bear victim age over time",
color = "Gender"
)

27 / 49
10:00

Your turn: geom_point()

Use the birds data frame to create the following plots

28 / 49

Break

05:00
29 / 49

Week 10: Data Visualization

1. Plotting with Base R

2. Plotting with ggplot2: Part 1

BREAK

3. Plotting with ggplot2: Part 2

4. Tweaking your ggplot

30 / 49

Make bar charts with geom_col()

With bar charts, you'll often need to create summary variables to plot

31 / 49

Make bar charts with geom_col()

With bar charts, you'll often need to create summary variables to plot

Step 1: Summarize the data

bear_months <- bears %>%
count(month)

Step 2: Make the plot

ggplot(data = bear_months) +
geom_col(aes(x = month, y = n))

Example: count of attacks by month

31 / 49

Make bar charts with geom_col()

Alternative approach: piping directly into ggplot

bears %>%
count(month) %>% # Pipe into ggplot
ggplot() +
geom_col(aes(x = month, y = n))

32 / 49

Be careful with geom_col() vs. geom_bar()

geom_col()

Map both x and y

bears %>%
count(month) %>%
ggplot() +
geom_col(aes(x = month, y = n))

geom_bar()

Only map x (y is computed)

bears %>%
ggplot() +
geom_bar(aes(x = month))

33 / 49

Make bar charts with geom_col()

Another example:
Mean age of victim in each year

bears %>%
filter(!is.na(age)) %>%
group_by(year) %>%
summarise(meanAge = mean(age)) %>%
ggplot() +
geom_col(aes(x = year, y = meanAge))

34 / 49

Change bar width: width

Change bar color: fill

Change bar outline: color

bears %>%
count(month) %>%
ggplot() +
geom_col(
mapping = aes(x = month, y = n),
width = 0.7,
fill = "blue",
color = "red"
)

35 / 49

Map the fill to bearType

bears %>%
count(month, bearType) %>%
ggplot() +
geom_col(
mapping = aes(
x = month, y = n, fill = bearType)
)

Note that I had to summarize the count by both month and bearType

bears %>%
count(month, bearType)
#> # A tibble: 27 × 3
#> month bearType n
#> <dbl> <chr> <int>
#> 1 1 Brown 1
#> 2 1 Polar 2
#> 3 2 Brown 1
#> 4 3 Brown 1
#> 5 4 Black 1
#> 6 4 Brown 3
#> 7 5 Black 15
#> 8 5 Brown 2
#> 9 5 Polar 1
#> 10 6 Black 10
#> # ℹ 17 more rows

36 / 49

"Factors" = Categorical variables

By default, R makes numeric variables continuous

bears %>%
count(month) %>%
ggplot() +
geom_col(aes(x = month, y = n))

The variable month is a number

37 / 49

"Factors" = Categorical variables

You can make a continuous variable categorical using as.factor()

bears %>%
count(month) %>%
ggplot() +
geom_col(
mapping = aes(
x = as.factor(month),
y = n)
)

The variable month is a factor

38 / 49
15:00

Your turn: geom_col()

Use the bears and birds data frame to create the following plots

39 / 49

Week 10: Data Visualization

1. Plotting with Base R

2. Plotting with ggplot2: Part 1

BREAK

3. Plotting with ggplot2: Part 2

4. Tweaking your ggplot

40 / 49

Working with themes

Themes change global features of your plot, like the background color, grid lines, etc.

41 / 49

Working with themes

Themes change global features of your plot, like the background color, grid lines, etc.

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point()

41 / 49

Working with themes

Themes change global features of your plot, like the background color, grid lines, etc.

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_bw()

42 / 49

Common themes

theme_bw()

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_bw()

theme_minimal()

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_minimal()

43 / 49

Common themes

theme_classic()

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_classic()

theme_void()

ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_void()

44 / 49

Other themes: hrbrthemes

library(hrbrthemes)
ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_ipsum()

library(hrbrthemes)
ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_ft_rc()

45 / 49

Other themes: ggthemes

library(ggthemes)
ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_economist()

library(ggthemes)
ggplot(
data = bears,
mapping = aes(x = year, y = age)) +
geom_point() +
theme_economist_white()

46 / 49

Save figures with ggsave()

47 / 49

Save figures with ggsave()

First, assign the plot to an object name:

scatterPlot <- ggplot(data = bears) +
geom_point(aes(x = year, y = age))
47 / 49

Save figures with ggsave()

First, assign the plot to an object name:

scatterPlot <- ggplot(data = bears) +
geom_point(aes(x = year, y = age))

Then use ggsave() to save the plot:

ggsave(
filename = here('plots', 'scatterPlot.png'),
plot = scatterPlot,
width = 6, # inches
height = 4
)
47 / 49

Extra practice 1

Use the mtcars data frame to create the following plots

48 / 49

Extra practice 2

Use the mpg data frame to create the following plot

49 / 49

Quiz 8

10:00

Write your name on the quiz!

Rules:

  • Work alone; no outside help of any kind is allowed.
  • No calculators, no notes, no books, no computers, no phones.
2 / 49
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow