class: title-slide, middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250> </center> ] .rightcol70[ # Week 15: .fancy[Monte Carlo Methods] ### EMSE 4574: Intro to Programming for Analytics ### John Paul Helveston ### December 08, 2020 ] --- class: center # Monte Carlo, Monaco .leftcol[.circleborder[ <img src="images/monaco_big.png" width="100%" style="display: block; margin: auto;" /> ]] .rightcol[.blackborder[ <img src="images/monaco_zoom.png" width="80%" style="display: block; margin: auto;" /> ]] --- class: center # "Monte Carlo" is associated with 3 things -- .cols3[ ### Gambling <img src="images/monte_carlo_slots.gif" width="60%" style="display: block; margin: auto;" /> ] -- .cols3[ ### Racing <img src="images/monte_carlo_f1.gif" width="80%" style="display: block; margin: auto;" /> ] -- .cols3[ ### Simulation <img src="images/monte_carlo_pi.gif" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, middle # Week 15: .fancy[Monte Carlo Methods] ## 1. Monte Carlo Simulation ## 2. Monte Carlo Integration --- class: inverse, middle # Week 15: .fancy[Monte Carlo Methods] ## 1. .orange[Monte Carlo Simulation] ## 2. Monte Carlo Integration --- # Monte Carlo Simulation: _Computing Probability_ -- ### General process: - Run a series of trials. - In each trial, simulate an event (e.g. a coin toss, a dice roll, etc.). - Count the number of "successful" trials <br><br> -- ### `\(\frac{\text{# Successful Trials}}{\text{# Total Trials}}\)` = Observed Odds `\(\simeq\)` Expected Odds -- ### **Law of large numbers**:<br>As _N_ increases, Observed Odds <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M224.3 273l-136 136c-9.4 9.4-24.6 9.4-33.9 0l-22.6-22.6c-9.4-9.4-9.4-24.6 0-33.9l96.4-96.4-96.4-96.4c-9.4-9.4-9.4-24.6 0-33.9L54.3 103c9.4-9.4 24.6-9.4 33.9 0l136 136c9.5 9.4 9.5 24.6.1 34zm192-34l-136-136c-9.4-9.4-24.6-9.4-33.9 0l-22.6 22.6c-9.4 9.4-9.4 24.6 0 33.9l96.4 96.4-96.4 96.4c-9.4 9.4-9.4 24.6 0 33.9l22.6 22.6c9.4 9.4 24.6 9.4 33.9 0l136-136c9.4-9.2 9.4-24.4 0-33.8z"/></svg> Expected Odds --- # How would you measure if a coin is "fair"? -- Run a series of trials and record outcome: "heads" or "tails" -- ```r coin <- c("heads", "tails") N <- 10000 *tosses <- sample(x = coin, size = N, replace = TRUE) head(tosses) # Preview first few tosses ``` ``` #> [1] "tails" "tails" "tails" "heads" "heads" "tails" ``` -- Probability of getting "heads": ```r sum(tosses == "heads") / N ``` ``` #> [1] 0.5053 ``` --- # Tossing an unfair coin -- Set the `prob` argument to a 40-60 coin -- ```r coin <- c("heads", "tails") N <- 10000 *tosses <- sample(x = coin, size = N, replace = TRUE, prob = c(0.4, 0.6)) head(tosses) # Preview first few tosses ``` ``` #> [1] "heads" "tails" "tails" "tails" "tails" "tails" ``` -- Probability of getting "heads": ```r sum(tosses == "heads") / N ``` ``` #> [1] 0.3964 ``` --- # A more complex simulation: _dice rolling_ ### What is the probability of rolling a 6-sided dice 3 times<br>and getting the sequence 1, 3, 5? -- .leftcol65[.code80[ ```r library(tidyverse) dice <- c(1, 2, 3, 4, 5, 6) N <- 10000 rolls <- tibble( roll1 = sample(x = dice, size = N, replace = T), roll2 = sample(x = dice, size = N, replace = T), roll3 = sample(x = dice, size = N, replace = T) ) ``` ]] -- .rightcol35[ ```r dim(rolls) ``` ``` #> [1] 10000 3 ``` ```r head(rolls) ``` ``` #> # A tibble: 6 x 3 #> roll1 roll2 roll3 #> <dbl> <dbl> <dbl> #> 1 2 4 1 #> 2 5 4 3 #> 3 4 6 5 #> 4 3 3 4 #> 5 5 5 4 #> 6 1 2 5 ``` ] --- # A more complex simulation: _dice rolling_ Simulated probability of getting sequence 1, 3, 5: ```r successes <- rolls %>% filter(roll1 == 1, roll2 == 3, roll3 == 5) nrow(successes) / N ``` ``` #> [1] 0.0037 ``` -- _Actual_ probability of getting sequence 1, 3, 5: ```r (1/6)^3 ``` ``` #> [1] 0.00462963 ``` --- name: tps1 class: inverse
15
:
00
## Think pair share: Coins & Dice .leftcol70[ Use the `sample()` function and a monte carlo simulation to estimate the answers to these questions: - If you flipped a coin 3 times in a row, what is the probability that you'll get three "tails" in a row? - If you rolled 2 dice, what is the probability that you'll get "snake-eyes" (two 1's)? - If you rolled 2 dice, what is the probability that you'll get an outcome that sums to 8? ] --- .font150[When `replace = FALSE`] Sometimes events cannot be independently simulated <br> -- What are the odds that 3 cards drawn from a 52-card deck will sum to 13?<br>(Aces = 1, Jack, Queen, King = 10) <br> -- ```r deck <- rep(c(seq(1, 10), 10, 10, 10), 4) # Rep because there are 4 suits length(deck) ``` ``` #> [1] 52 ``` -- Draw 3 cards from the deck _without replacement_: ```r *cards <- sample(x = deck, size = 3, replace = FALSE) cards ``` ``` #> [1] 6 2 4 ``` --- .font150[When `replace = FALSE`] <br> **Note**: You can't draw more than 52 cards _without replacement_: ```r cards <- sample(x = deck, size = 53, replace = FALSE) ``` ``` #> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE' ``` --- .font150[When `replace = FALSE`] What are the odds that 3 cards drawn from a 52-card deck will sum to 13?<br>(Aces = 1, Jack, Queen, King = 10) <br> Repeat the 3-card draw _N_ times: ```r N <- 100000 count <- 0 for (i in 1:N) { * cards <- sample(x = deck, size = 3, replace = FALSE) if (sum(cards) == 13) { count <- count + 1 } } count / N # Compute the probability ``` ``` #> [1] 0.03666 ``` --- name: tps2 class: inverse
15
:
00
## Think pair share: Cards Use the `sample()` function and a monte carlo simulation to estimate the answers to these questions: .leftcol[ - What are the odds that four cards drawn from a 52-card deck will have the same suit? - What are the odds that five cards drawn from a 52-card deck will sum to a prime number?<br>(Aces = 1, Jack, Queen, King = 10)<br>**Hint**: use `isPrime()` to help. ] .rightcol[ ```r isPrime <- function(n) { if (n == 2) { return(TRUE) } for (i in seq(2, n-1)) { if (n %% i == 0) { return(FALSE) } } return(TRUE) } ``` ] --- class: inverse, center # .fancy[Break]
05
:
00
--- class: inverse, middle # Week 15: .fancy[Monte Carlo Methods] ## 1. Monte Carlo Simulation ## 2. .orange[Monte Carlo Integration] --- .fira[.font170[Discrete vs. continuous random numbers]] -- .leftcol[ ### Discrete `sample()`<br>Takes random samples from vector `x` ```r *sample_discrete <- sample( x = c("heads", "tails"), size = 5, replace = TRUE ) sample_discrete ``` ``` #> [1] "tails" "heads" "tails" "heads" "heads" ``` ] -- .rightcol[ ### Continuous `runif()`<br>Takes random samples between bounds ```r *sample_continuous <- runif( n = 5, min = 0, max = 1 ) sample_continuous ``` ``` #> [1] 0.03944369 0.37356860 0.78492300 0.70443366 0.56196587 ``` ] --- class: center # Monte Carlo Integration Integration = compute the area "under the curve" -- .leftcol[.left[ Find the area of `\(y = x^2\)` between `\(4 < x < 8\)` <img src="figs/quadratic-area-1.png" width="432" style="display: block; margin: auto;" /> ]] -- .rightcol[ `\(\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}\)` <br> <img src="figs/unnamed-chunk-29-1.png" width="576" style="display: block; margin: auto;" /> ] --- class: center .fira[.font150[Monte Carlo Integration]] `\(\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}\)` <img src="figs/unnamed-chunk-30-1.png" width="576" style="display: block; margin: auto;" /> `\(\text{Area Under Curve} = \text{Area of Rectangle} \left( \frac{\text{# Points Under Curve}}{\text{# Total Points}} \right)\)` --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol60[ **Step 1**: Compute area of rectangle ```r area_rectangle <- (8 - 4) * (8^2 - 0) area_rectangle ``` ``` #> [1] 256 ``` ] .rightcol40[ <img src="figs/unnamed-chunk-32-1.png" width="576" style="display: block; margin: auto;" /> ] --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol60[ **Step 2**: Simulate points ```r N <- 100000 points <- tibble( * x = runif(N, min = 4, max = 8), * y = runif(N, min = 0, max = 8^2)) %>% mutate(belowCurve = y < x^2) head(points) ``` ``` #> # A tibble: 6 x 3 #> x y belowCurve #> <dbl> <dbl> <lgl> #> 1 7.36 37.2 TRUE #> 2 4.63 53.4 FALSE #> 3 6.10 58.1 FALSE #> 4 7.28 20.9 TRUE #> 5 6.58 30.1 TRUE #> 6 7.03 9.88 TRUE ``` ] .rightcol40[ <img src="figs/unnamed-chunk-34-1.png" width="576" style="display: block; margin: auto;" /> ] --- .center[.fira[.font150[Monte Carlo Integration]]] .leftcol70[ **Step 3**: Compute area under curve ```r N <- 100000 points <- tibble( x = runif(N, min = 4, max = 8), y = runif(N, min = 0, max = 8^2)) %>% mutate(belowCurve = y < x^2) *points_ratio <- sum(points$belowCurve) / N points_ratio ``` ``` #> [1] 0.57984 ``` ```r *area_under_curve <- area_rectangle * points_ratio area_under_curve ``` ``` #> [1] 148.439 ``` ] .rightcol30[ <img src="figs/unnamed-chunk-36-1.png" width="576" style="display: block; margin: auto;" /> ] --- ### How did we do? Simulated area under curve: ```r area_under_curve ``` ``` #> [1] 148.439 ``` -- Actual area under curve: `\(\int_{4}^{8} x^2 \mathrm{dx} = \left ( \frac{x^3}{3} \right ) \Big|_4^8 = \frac{8^3}{3} - \frac{4^3}{3} = 149.33\bar{3}\)` -- % Error: ```r true_area <- ((8^3 / 3) - (4^3 / 3)) 100*((area_under_curve - true_area) / true_area) ``` ``` #> [1] -0.5988571 ``` --- # Monte Carlo `\(\pi\)` -- .leftcol[ <img src="images/pi.png" width="90%" /> ] -- .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` ] --- # Monte Carlo `\(\pi\)` .leftcol[ <img src="images/pi.png" width="90%" /> ] .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` Ratio of areas = `\(\pi / 4\)`: `\(\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}\)` ] --- # Monte Carlo `\(\pi\)` .leftcol[ <img src="images/pi.png" width="90%" /> ] .rightcol45[ Area of a circle: `\(\quad\quad A_{circle} = \pi r^2\)` Area of square containing circle: `\(\quad\quad A_{square} = 4r^2\)` Ratio of areas = `\(\pi / 4\)`: `\(\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}\)` `\(\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)\)` ] --- name: tps3 class: inverse
15
:
00
## Think pair share: Estimate `\(\pi\)` .leftcol30[ <img src="figs/cirle-points-1.png" width="360" /> `\(\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)\)` ] .rightcol70[.font90[ 1. Create a tibble with variables `x` and `y` that each contain 10,000 random points between -1 and 1, representing the (x, y) coordinates to a random point inside a square of side length 2 centered at `(x, y) = (0, 0)`. **Hint**: use `runif()` 2. Create a new column, `radius`, that is equal to the distance to each `(x, y)` point from the center of the square. 3. Create a new column, `pointInCircle`, that is `TRUE` if the point lies _within_ the circle inscribed in the square, and `FALSE` otherwise. 4. Create the scatterplot on the left (don't worry about the precise colors, dimensions, etc.). 5. Estimate `\(\pi\)` by multiplying 4 times the ratio of points inside the circle to the total number of points ]] --- class: center, middle, inverse <img src="images/monte_hall.jpg" width="80%" style="display: block; margin: auto;" /> --- name: tps4 class: inverse
15
:
00
## Think pair share: Monte Hall Problem .leftcol40[ <img src="images/monte_hall.jpg" width="80%" /> 1. You choose door 1, 2, or 3 2. One door is removed 3. Should you swap doors? ] .rightcol60[.font90[ In this simulation, the prize is always behind door #1: - If you choose door #1, you must KEEP it to win. - If you choose door #2 or #3, you must SWAP to win. 1) Create the tibble, `choices`, with two variables: - `door` contains the first door chosen (`1`, `2`, or `3`) - `swap` contains a logical (`TRUE` or `FALSE`) for whether the contestant swaps doors. **Hint**: use `sample()` 2) Create a new tibble, `wins`, which contains only the rows from `choices` that resulted in a win. 3) Compute the percentage of times the contestant won after swapping doors. ]] --- # Reminders -- ### 1) Please fill the GW course feedback (see slack announcement) -- ### 2) I'll hold a final review sometime Thursday or Friday (check slack) -- ### 3) Final is Tuesday, December 15, 2020 12:45pm-2:45pm