Monte Carlo Methods

.leftcol30[
<center>
<img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250>
</center>
]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M496 128v16a8 8 0 0 1-8 8h-24v12c0 6.627-5.373 12-12 12H60c-6.627 0-12-5.373-12-12v-12H24a8 8 0 0 1-8-8v-16a8 8 0 0 1 4.941-7.392l232-88a7.996 7.996 0 0 1 6.118 0l232 88A8 8 0 0 1 496 128zm-24 304H40c-13.255 0-24 10.745-24 24v16a8 8 0 0 0 8 8h464a8 8 0 0 0 8-8v-16c0-13.255-10.745-24-24-24zM96 192v192H60c-6.627 0-12 5.373-12 12v20h416v-20c0-6.627-5.373-12-12-12h-36V192h-64v192h-64V192h-64v192h-64V192H96z"/></svg> EMSE 4571: Intro to Programming for Analytics
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.3 128-128S294.7 0 224 0 96 57.3 96 128s57.3 128 128 128zm89.6 32h-16.7c-22.2 10.2-46.9 16-72.9 16s-50.6-5.8-72.9-16h-16.7C60.2 288 0 348.2 0 422.4V464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48v-41.6c0-74.2-60.2-134.4-134.4-134.4z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V192H0v272zm320-196c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM192 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM64 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zM400 64h-48V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H160V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H48C21.5 64 0 85.5 0 112v48h448v-48c0-26.5-21.5-48-48-48z"/></svg> April 21, 2022
]

---

# Monte Carlo, Monaco

]]

]]

---

# "Monte Carlo" is associated with 3 things

### .center[Gambling]

]

### .center[Racing]

]

### .center[Simulation]

]

---

# Week 13: .fancy[Monte Carlo Methods]

### 1. Monte Carlo Simulation

### BREAK

### 2. Monte Carlo Integration

---

# Week 13: .fancy[Monte Carlo Methods]

### 1. .orange[Monte Carlo Simulation]

### BREAK

### 2. Monte Carlo Integration

---

# Monte Carlo Simulation: _Computing Probability_

### General process:

- Run a series of trials.
- In each trial, simulate an event (e.g. a coin toss, a dice roll, etc.).
- Count the number of "successful" trials
<br><br>

### `$\frac{\text{# Successful Trials}}{\text{# Total Trials}}$` = Observed Odds `$\simeq$` Expected Odds

### **Law of large numbers**:<br>As _N_ increases, Observed Odds <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M224.3 273l-136 136c-9.4 9.4-24.6 9.4-33.9 0l-22.6-22.6c-9.4-9.4-9.4-24.6 0-33.9l96.4-96.4-96.4-96.4c-9.4-9.4-9.4-24.6 0-33.9L54.3 103c9.4-9.4 24.6-9.4 33.9 0l136 136c9.5 9.4 9.5 24.6.1 34zm192-34l-136-136c-9.4-9.4-24.6-9.4-33.9 0l-22.6 22.6c-9.4 9.4-9.4 24.6 0 33.9l96.4 96.4-96.4 96.4c-9.4 9.4-9.4 24.6 0 33.9l22.6 22.6c9.4 9.4 24.6 9.4 33.9 0l136-136c9.4-9.2 9.4-24.4 0-33.8z"/></svg> Expected Odds

---

# How would you measure if a coin is "fair"?

Run a series of trials and record outcome: "heads" or "tails"

```r
coin <- c("heads", "tails")
N <- 10000
*tosses <- sample(x = coin, size = N, replace = TRUE)
head(tosses) # Preview first few tosses
```

```
#> [1] "tails" "tails" "tails" "tails" "heads" "heads"
```

Probability of getting "heads":

```r
sum(tosses == "heads") / N
```

```
#> [1] 0.4973
```

---

# Tossing an unfair coin

Set the `prob` argument to a 40-60 coin

```r
coin <- c("heads", "tails")
N <- 10000
*tosses <- sample(x = coin, size = N, replace = TRUE, prob = c(0.4, 0.6))
head(tosses) # Preview first few tosses
```

```
#> [1] "heads" "heads" "heads" "tails" "tails" "tails"
```

Probability of getting "heads":

```r
sum(tosses == "heads") / N
```

```
#> [1] 0.39
```

---

# A more complex simulation: _dice rolling_

### What is the probability of rolling a 6-sided dice 3 times<br>and getting the sequence 1, 3, 5?

```r
library(tidyverse)
dice <- c(1, 2, 3, 4, 5, 6)
N <- 10000
rolls <- tibble(
  roll1 = sample(x = dice, size = N, replace = T),
  roll2 = sample(x = dice, size = N, replace = T),
  roll3 = sample(x = dice, size = N, replace = T)
)
```

]]

```r
head(rolls)
```

```
#> # A tibble: 6 × 3
#>   roll1 roll2 roll3
#>   <dbl> <dbl> <dbl>
#> 1     1     1     4
#> 2     3     5     1
#> 3     6     2     2
#> 4     4     1     3
#> 5     6     4     3
#> 6     1     1     2
```

]

---

# A more complex simulation: _dice rolling_

Simulated probability of getting sequence 1, 3, 5:

```r
successes <- rolls %>%
  filter(roll1 == 1 & roll2 == 3 & roll3 == 5)

nrow(successes) / N
```

```
#> [1] 0.0049
```

_Actual_ probability of getting sequence 1, 3, 5:

```r
(1/6)^3
```

```
#> [1] 0.00462963
```

---

## Your Turn: Coins & Dice

Using the `sample()` function, conduct a monte carlo simulation to estimate the answers to these questions:

- If you flipped a coin 3 times in a row, what is the probability that you'll get three "tails" in a row?
- If you rolled 2 dice, what is the probability that you'll get "snake-eyes" (two 1's)?
- If you rolled 2 dice, what is the probability that you'll get an outcome that sums to 8?

]

---

Sometimes events cannot be independently simulated

<br>

What are the odds that 3 cards drawn from a 52-card deck will sum to 13?

- Aces = 1
- Jack = 10
- Queen = 10
- King = 10

---

Sometimes events cannot be independently simulated

<br>

```r
deck <- rep(c(seq(1, 10), 10, 10, 10), 4) # Rep because there are 4 suits
length(deck)
```

```
#> [1] 52
```

Draw 3 cards from the deck _without replacement_:

```r
*cards <- sample(x = deck, size = 3, replace = FALSE)
cards
```

```
#> [1]  2 10  2
```

---

<br>

**Note**: You can't draw more than 52 cards _without replacement_:

```r
cards <- sample(x = deck, size = 53, replace = FALSE)
```

```
#> Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
```

---

What are the odds that 3 cards drawn from a 52-card deck will sum to 13?
<br>

Repeat the 3-card draw _N_ times:

```r
N <- 100000
count <- 0

for (i in 1:N) {
*  cards <- sample(x = deck, size = 3, replace = FALSE)
   if (sum(cards) == 13) {
      count <- count + 1
   }
}

count / N # Compute the probability
```

```
#> [1] 0.03743
```

---

## Your Turn: Cards

Use the `sample()` function and a monte carlo simulation to estimate the answers to these questions:

- What are the odds that four cards drawn from a 52-card deck will have the same suit?
- What are the odds that five cards drawn from a 52-card deck will sum to a prime number?

- Aces = 1
- Jack = 10
- Queen = 10
- King = 10

]

**Hint**: use `isPrime()` to help:

```r
isPrime <- function(n) {
    if (n == 2) { return(TRUE) }
    for (i in seq(2, n-1)) {
        if (n %% i == 0) {
            return(FALSE)
        }
    }
    return(TRUE)
}
```

]

---

# .fancy[Break]

---

# Week 13: .fancy[Monte Carlo Methods]

### 1. Monte Carlo Simulation

### BREAK

### 2. .orange[Monte Carlo Integration]

---

### .center[Discrete]

`sample()`<br>Takes random samples from vector `x`

```r
sample_discrete <- sample(
  x       = c("heads", "tails"),
  size    = 5,
  replace = TRUE
)

sample_discrete
```

```
#> [1] "heads" "heads" "tails" "heads" "heads"
```
]

### .center[Continuous]

`runif()`<br>Takes random samples between bounds

```r
sample_continuous <- runif(
  n   = 5,
  min = 0,
  max = 1
)

sample_continuous
```

```
#> [1] 0.31565281 0.88368891 0.65443806 0.14829714 0.07009013
```

]

---

# Monte Carlo Integration

Integration = compute the area "under the curve"

Find the area of `$y = x^2$` between `$4 < x < 8$`

]]

`$\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}$`
<br>

]

---

`$\frac{\text{Area Under Curve}}{\text{Area of Rectangle}} = \frac{\text{# Points Under Curve}}{\text{# Total Points}}$`

`$\text{Area Under Curve} = \text{Area of Rectangle} \left( \frac{\text{# Points Under Curve}}{\text{# Total Points}} \right)$`

---

**Step 1**: Compute area of rectangle

```r
area_rectangle <- (8 - 4) * (8^2 - 0)
area_rectangle
```

```
#> [1] 256
```
]

]

---

**Step 2**: Simulate points

```r
N <- 100000
points <- tibble(
*   x = runif(N, min = 4, max = 8),
*   y = runif(N, min = 0, max = 8^2)) %>%
    mutate(belowCurve = y < x^2)

head(points)
```

```
#> # A tibble: 6 × 3
#>       x     y belowCurve
#>   <dbl> <dbl> <lgl>     
#> 1  4.98  9.65 TRUE      
#> 2  5.02 36.7  FALSE     
#> 3  4.32  6.86 TRUE      
#> 4  7.46 28.9  TRUE      
#> 5  5.88 55.7  FALSE     
#> 6  5.59 42.7  FALSE
```

]

]

---

**Step 3**: Compute area under curve

```r
N <- 100000
points <- tibble(
    x = runif(N, min = 4, max = 8),
    y = runif(N, min = 0, max = 8^2)) %>%
    mutate(belowCurve = y < x^2)

*points_ratio <- sum(points$belowCurve) / N
points_ratio
```

```
#> [1] 0.58296
```

```r
*area_under_curve <- area_rectangle * points_ratio
area_under_curve
```

```
#> [1] 149.2378
```
]

]

---

### How did we do?

Simulated area under curve:

```r
area_under_curve
```

```
#> [1] 149.2378
```

Actual area under curve:

`$\int_{4}^{8} x^2 \mathrm{dx} = \left ( \frac{x^3}{3} \right ) \Big|_4^8 = \frac{8^3}{3} - \frac{4^3}{3} = 149.33\bar{3}$`

% Error:

```r
true_area <- ((8^3 / 3) - (4^3 / 3))
100*((area_under_curve - true_area) / true_area)
```

```
#> [1] -0.064
```

---

# Monte Carlo `$\pi$`

]

Area of a circle:

`$\quad\quad A_{circle} = \pi r^2$`

Area of square containing circle:

`$\quad\quad A_{square} = 4r^2$`

]

---

# Monte Carlo `$\pi$`

]

Area of a circle:

`$\quad\quad A_{circle} = \pi r^2$`

Area of square containing circle:

`$\quad\quad A_{square} = 4r^2$`

Ratio of areas = `$\pi / 4$`:

`$\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}$`

]

---

# Monte Carlo `$\pi$`

]

Area of a circle:

`$\quad\quad A_{circle} = \pi r^2$`

Area of square containing circle:

`$\quad\quad A_{square} = 4r^2$`

Ratio of areas = `$\pi / 4$`:

`$\quad\quad \frac{A_{circle}}{A_{square}} = \dfrac{\pi r^2}{4r^2} = \dfrac{\pi}{4}$`

`$\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)$`

]

---

## Your Turn: Estimate `$\pi$`

`$\pi = 4 \left( \frac{A_{circle}}{A_{square}} \right)$`

]
.rightcol70[.font90[

1. Create a tibble with variables `x` and `y` that each contain 10,000 random points between -1 and 1, representing the (x, y) coordinates to a random point inside a square of side length 2 centered at `(x, y) = (0, 0)`. **Hint**: use `runif()`

2. Create a new column, `radius`, that is equal to the distance to each `(x, y)` point from the center of the square.

3. Create a new column, `pointInCircle`, that is `TRUE` if the point lies _within_ the circle inscribed in the square, and `FALSE` otherwise.

4. Create the scatterplot on the left (don't worry about the precise colors, dimensions, etc.).

5. Estimate `$\pi$` by multiplying 4 times the ratio of points inside the circle to the total number of points

]]

---

---

## Your Turn: Monte Hall Problem

1. You choose door 1, 2, or 3
2. One door is removed
3. Should you swap doors?

]

In this simulation, the prize is always behind door #1:

- If you choose door #1, you must KEEP it to win.
- If you choose door #2 or #3, you must SWAP to win.

1) Create the tibble, `choices`, with two variables:
- `door` contains the first door chosen (`1`, `2`, or `3`)
- `swap` contains a logical (`TRUE` or `FALSE`) for whether the contestant swaps doors. **Hint**: use `sample()`

2) Create a new tibble, `wins`, which contains only the rows from `choices` that resulted in a win.

3) Compute the percentage of times the contestant won after swapping doors.

]]

---

# Reminders

### 1) Please fill the GW course feedback (see slack announcement)

### 2) I'll hold a final review next Thursday

### 3) Final is Thursday, May 5, 12:45pm-2:45pm