class: middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250> </center> ] .rightcol70[ # Week 13: .fancy[Monte Carlo Methods] ###
EMSE 4571: Intro to Programming for Analytics ###
John Paul Helveston ###
April 21, 2022 ] --- class: center # Monte Carlo, Monaco .leftcol[.circleborder[ <img src="images/monaco_big.png" width="100%" style="display: block; margin: auto;" /> ]] .rightcol[.blackborder[ <img src="images/monaco_zoom.png" width="80%" style="display: block; margin: auto;" /> ]] --- class: center # "Monte Carlo" is associated with 3 things -- .cols3[ ### .center[Gambling] <img src="images/monte_carlo_slots.gif" width="60%" style="display: block; margin: auto;" /> ] -- .cols3[ ### .center[Racing] <img src="images/monte_carlo_f1.gif" width="80%" style="display: block; margin: auto;" /> ] -- .cols3[ ### .center[Simulation] <img src="images/monte_carlo_pi.gif" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, middle # Week 13: .fancy[Monte Carlo Methods] ### 1. Monte Carlo Simulation ### BREAK ### 2. Monte Carlo Integration --- class: inverse, middle # Week 13: .fancy[Monte Carlo Methods] ### 1. .orange[Monte Carlo Simulation] ### BREAK ### 2. Monte Carlo Integration --- # Monte Carlo Simulation: _Computing Probability_ -- ### General process: - Run a series of trials. - In each trial, simulate an event (e.g. a coin toss, a dice roll, etc.). - Count the number of "successful" trials <br><br> -- ### `\(\frac{\text{# Successful Trials}}{\text{# Total Trials}}\)` = Observed Odds `\(\simeq\)` Expected Odds -- ### **Law of large numbers**:<br>As _N_ increases, Observed Odds
Expected Odds --- # How would you measure if a coin is "fair"? -- Run a series of trials and record outcome: "heads" or "tails" -- ```r coin <- c("heads", "tails") N <- 10000 *tosses <- sample(x = coin, size = N, replace = TRUE) head(tosses) # Preview first few tosses ``` ``` #> [1] "tails" "tails" "tails" "tails" "heads" "heads" ``` -- Probability of getting "heads": ```r sum(tosses == "heads") / N ``` ``` #> [1] 0.4973 ``` --- # Tossing an unfair coin -- Set the `prob` argument to a 40-60 coin -- ```r coin <- c("heads", "tails") N <- 10000 *tosses <- sample(x = coin, size = N, replace = TRUE, prob = c(0.4, 0.6)) head(tosses) # Preview first few tosses ``` ``` #> [1] "heads" "heads" "heads" "tails" "tails" "tails" ``` -- Probability of getting "heads": ```r sum(tosses == "heads") / N ``` ``` #> [1] 0.39 ``` --- # A more complex simulation: _dice rolling_ ### What is the probability of rolling a 6-sided dice 3 times<br>and getting the sequence 1, 3, 5? -- .leftcol65[.code80[ ```r library(tidyverse) dice <- c(1, 2, 3, 4, 5, 6) N <- 10000 rolls <- tibble( roll1 = sample(x = dice, size = N, replace = T), roll2 = sample(x = dice, size = N, replace = T), roll3 = sample(x = dice, size = N, replace = T) ) ``` ]] -- .rightcol35[ ```r head(rolls) ``` ``` #> # A tibble: 6 × 3 #> roll1 roll2 roll3 #> <dbl> <dbl> <dbl> #> 1 1 1 4 #> 2 3 5 1 #> 3 6 2 2 #> 4 4 1 3 #> 5 6 4 3 #> 6 1 1 2 ``` ] --- # A more complex simulation: _dice rolling_ Simulated probability of getting sequence 1, 3, 5: ```r successes <- rolls %>% filter(roll1 == 1 & roll2 == 3 & roll3 == 5) nrow(successes) / N ``` ``` #> [1] 0.0049 ``` -- _Actual_ probability of getting sequence 1, 3, 5: ```r (1/6)^3 ``` ``` #> [1] 0.00462963 ``` --- name: tps1 class: inverse
15
:
00
## Your Turn: Coins & Dice .leftcol70[ Using the `sample()` function, conduct a monte carlo simulation to estimate the answers to these questions: - If you flipped a coin 3 times in a row, what is the probability that you'll get three "tails" in a row? - If you rolled 2 dice, what is the probability that you'll get "snake-eyes" (two 1's)? - If you rolled 2 dice, what is the probability that you'll get an outcome that sums to 8? ] --- .font150[When `replace = FALSE`] Sometimes events cannot be independently simulated <br> -- What are the odds that 3 cards drawn from a 52-card deck will sum to 13? - Aces = 1 - Jack = 10 - Queen = 10 - King = 10 --- .font150[When `replace = FALSE`] Sometimes events cannot be independently simulated <br> ```r deck <- rep(c(seq(1, 10), 10, 10, 10), 4) # Rep because there are 4 suits length(deck) ``` ``` #> [1] 52 ``` -- Draw 3 cards from the deck _without replacement_: ```r *cards <- sample(x = deck, size = 3, replace = FALSE) cards ``` ``` #> [1] 2 10 2 ``` --- .font150[When `replace = FALSE`] <br> **Note**: You can't draw more than 52 cards _without replacement_: ```r cards <- sample(x = deck, size = 53, replace = FALSE) ``` ``` #> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE' ``` --- .font150[When `replace = FALSE`] What are the odds that 3 cards drawn from a 52-card deck will sum to 13? <br> Repeat the 3-card draw _N_ times: ```r N <- 100000 count <- 0 for (i in 1:N) { * cards <- sample(x = deck, size = 3, replace = FALSE) if (sum(cards) == 13) { count <- count + 1 } } count / N # Compute the probability ``` ``` #> [1] 0.03743 ``` --- name: tps2 class: inverse
15
:
00
## Your Turn: Cards Use the `sample()` function and a monte carlo simulation to estimate the answers to these questions: .leftcol[ - What are the odds that four cards drawn from a 52-card deck will have the same suit? - What are the odds that five cards drawn from a 52-card deck will sum to a prime number? - Aces = 1 - Jack = 10 - Queen = 10 - King = 10 ] .rightcol[ **Hint**: use `isPrime()` to help: ```r isPrime <- function(n) { if (n == 2) { return(TRUE) } for (i in seq(2, n-1)) { if (n %% i == 0) { return(FALSE) } } return(TRUE) } ``` ] --- class: inverse, center # .fancy[Break]
05
:
00
--- class: inverse, middle # Week 13: .fancy[Monte Carlo Methods] ### 1. Monte Carlo Simulation ### BREAK ### 2. .orange[Monte Carlo Integration] --- .fira[.font170[Discrete vs. continuous random numbers]] -- .leftcol[ ### .center[Discrete] `sample()`<br>Takes random samples from vector `x` ```r sample_discrete <- sample( x = c("heads", "tails"), size = 5, replace = TRUE ) sample_discrete ``` ``` #> [1] "heads" "heads" "tails" "heads" "heads" ``` ] -- .rightcol[ ### .center[Continuous] `runif()`<br>Takes random samples between bounds ```r sample_continuous <- runif( n = 5, min = 0, max = 1 ) sample_continuous ``` ``` #> [1] 0.31565281 0.88368891 0.65443806 0.14829714 0.07009013 ``` ] --- class: center # Monte Carlo Integration Integration = compute the area "under the curve" -- .leftcol[.left[ Find the area of `\(y = x^2\)` between `\(4 < x < 8\)` <img src="figs/quadratic-area-1.png" width="432" style="display: block; margin: auto;" /> ]] -- .rightcol[ `\(\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}\)` <br> <img src="figs/unnamed-chunk-28-1.png" width="576" style="display: block; margin: auto;" /> ] --- class: center .fira[.font150[Monte Carlo Integration]] `\(\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}\)` <img src="figs/unnamed-chunk-29-1.png" width="576" style="display: block; margin: auto;" /> `\(\text{Area Under Curve} = \text{Area of Rectangle} \left( \frac{\text{# Points Under Curve}}{\text{# Total Points}} \right)\)` --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol60[ **Step 1**: Compute area of rectangle ```r area_rectangle <- (8 - 4) * (8^2 - 0) area_rectangle ``` ``` #> [1] 256 ``` ] .rightcol40[ <img src="figs/unnamed-chunk-31-1.png" width="576" style="display: block; margin: auto;" /> ] --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol60[ **Step 2**: Simulate points ```r N <- 100000 points <- tibble( * x = runif(N, min = 4, max = 8), * y = runif(N, min = 0, max = 8^2)) %>% mutate(belowCurve = y < x^2) head(points) ``` ``` #> # A tibble: 6 × 3 #> x y belowCurve #> <dbl> <dbl> <lgl> #> 1 4.98 9.65 TRUE #> 2 5.02 36.7 FALSE #> 3 4.32 6.86 TRUE #> 4 7.46 28.9 TRUE #> 5 5.88 55.7 FALSE #> 6 5.59 42.7 FALSE ``` ] .rightcol40[ <img src="figs/unnamed-chunk-33-1.png" width="576" style="display: block; margin: auto;" /> ] --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol70[ **Step 3**: Compute area under curve ```r N <- 100000 points <- tibble( x = runif(N, min = 4, max = 8), y = runif(N, min = 0, max = 8^2)) %>% mutate(belowCurve = y < x^2) *points_ratio <- sum(points$belowCurve) / N points_ratio ``` ``` #> [1] 0.58296 ``` ```r *area_under_curve <- area_rectangle * points_ratio area_under_curve ``` ``` #> [1] 149.2378 ``` ] .rightcol30[ <img src="figs/unnamed-chunk-35-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### How did we do? Simulated area under curve: ```r area_under_curve ``` ``` #> [1] 149.2378 ``` -- Actual area under curve: `\(\int_{4}^{8} x^2 \mathrm{dx} = \left ( \frac{x^3}{3} \right ) \Big|_4^8 = \frac{8^3}{3} - \frac{4^3}{3} = 149.33\bar{3}\)` -- % Error: ```r true_area <- ((8^3 / 3) - (4^3 / 3)) 100*((area_under_curve - true_area) / true_area) ``` ``` #> [1] -0.064 ``` --- # Monte Carlo `\(\pi\)` -- .leftcol[ <img src="images/pi.png" width="90%" /> ] -- .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` ] --- # Monte Carlo `\(\pi\)` .leftcol[ <img src="images/pi.png" width="90%" /> ] .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` Ratio of areas = `\(\pi / 4\)`: `\(\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}\)` ] --- # Monte Carlo `\(\pi\)` .leftcol[ <img src="images/pi.png" width="90%" /> ] .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` Ratio of areas = `\(\pi / 4\)`: `\(\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}\)` `\(\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)\)` ] --- name: tps3 class: inverse
15
:
00
## Your Turn: Estimate `\(\pi\)` .leftcol30[ <img src="figs/cirle-points-1.png" width="360" /> `\(\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)\)` ] .rightcol70[.font90[ 1. Create a tibble with variables `x` and `y` that each contain 10,000 random points between -1 and 1, representing the (x, y) coordinates to a random point inside a square of side length 2 centered at `(x, y) = (0, 0)`. **Hint**: use `runif()` 2. Create a new column, `radius`, that is equal to the distance to each `(x, y)` point from the center of the square. 3. Create a new column, `pointInCircle`, that is `TRUE` if the point lies _within_ the circle inscribed in the square, and `FALSE` otherwise. 4. Create the scatterplot on the left (don't worry about the precise colors, dimensions, etc.). 5. Estimate `\(\pi\)` by multiplying 4 times the ratio of points inside the circle to the total number of points ]] --- class: center, middle, inverse <img src="images/monte_hall.jpg" width="80%" style="display: block; margin: auto;" /> --- name: tps4 class: inverse
15
:
00
## Your Turn: Monte Hall Problem .leftcol40[ <img src="images/monte_hall.jpg" width="80%" /> 1. You choose door 1, 2, or 3 2. One door is removed 3. Should you swap doors? ] .rightcol60[.font90[ In this simulation, the prize is always behind door #1: - If you choose door #1, you must KEEP it to win. - If you choose door #2 or #3, you must SWAP to win. 1) Create the tibble, `choices`, with two variables: - `door` contains the first door chosen (`1`, `2`, or `3`) - `swap` contains a logical (`TRUE` or `FALSE`) for whether the contestant swaps doors. **Hint**: use `sample()` 2) Create a new tibble, `wins`, which contains only the rows from `choices` that resulted in a win. 3) Compute the percentage of times the contestant won after swapping doors. ]] --- # Reminders -- ### 1) Please fill the GW course feedback (see slack announcement) -- ### 2) I'll hold a final review next Thursday -- ### 3) Final is Thursday, May 5, 12:45pm-2:45pm