STA6349: Applied Bayesian Analysis
Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you.
While understanding that “modern” art doesn’t necessarily mean “new” art, a question still bubbles up: what are the chances that this modern artist is Gen X or even younger, i.e., born in 1965 or later?
Let \pi denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger.
The Beta(4,6) prior model for \pi reflects our own very vague prior assumption that major modern art museums disproportionately display artists born before 1965, i.e., \pi most likely falls below 0.5.
After all, “modern art” dates back to the 1880s and it can take a while to attain such high recognition in the art world.
\begin{align*} Y|\pi &\sim \text{Bin}(100,\pi) \\ \pi &\sim \text{Beta}(4,6) \\ \pi | (Y = 14) &\sim \text{Beta(18,92)} \end{align*}
We must be able to utilize this posterior to perform a rigorous posterior analysis of \pi.
There are three common tasks in posterior analysis:
For example,
What’s our estimate of \pi?
Does our model support the claim that fewer than 20% of museum artists are Gen X or younger?
If we sample 20 more museum artists, how many do we predict will be Gen X or younger?
We can construct posterior credible intervals.
Let \theta have posterior pmf f(\theta|y).
A posterior credible interval (CI) provides a range of posterior plausible values of \theta, and thus a summary of both posterior central tendency and variability.
A middle 95% CI is constructed by the 2.5th and 97.5th posterior percentiles, \left( \theta_{0.025}, \theta_{0.975} \right), and there is a 95% posterior probability that \theta is in this range,
P\left[ \theta \in (\theta_{0.025}, \theta_{0.975})|Y=y \right] = \int_{\theta_{0.025}}^{\theta_{0.975}} f(\theta|y) d\theta = 0.95
There is no one right credible interval.
P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi
P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi
pbeta()
function:P\left[ \pi < 0.2 | Y = 14 \right] = 0.849
P\left[ \pi < 0.2 \right] = \int_0^{0.2} f(\pi) d\pi
pbeta()
function:P\left[ \pi < 0.2 \right] = 0.086
|
|
|
H_0: \pi \ge 0.2 | P[H_0] = 0.914 | P[H_0 | Y= 14] = 0.151 |
H_1: \pi < 0.2 | P[H_1] = 0.086 | P[H_1 | Y = 14] = 0.849 |
|
|
|
H_0: \pi \ge 0.2 | P[H_0] = 0.914 | P[H_0 | Y= 14] = 0.151 |
H_1: \pi < 0.2 | P[H_1] = 0.086 | P[H_1 | Y = 14] = 0.849 |
\begin{align*} \text{posterior odds} &= \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} \\ &= \frac{0.849}{0.151} \\ &\approx 5.62 \end{align*}
|
|
|
H_0: \pi \ge 0.2 | P[H_0] = 0.914 | P[H_0 | Y= 14] = 0.151 |
H_1: \pi < 0.2 | P[H_1] = 0.086 | P[H_1 | Y = 14] = 0.849 |
\begin{align*} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.086}{0.914} \\ &\approx 0.093 \end{align*}
Bayes Factor
\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}
We will compare this value to 1.
BF = 1: The plausibility of H_1 did not change in light of the observed data.
BF > 1: The plausibility of H_1 increased in light of the observed data.
BF < 1: The plausibilty of H_1 decreased in light of the observed data.
|
|
|
H_0: \pi \ge 0.2 | P[H_0] = 0.914 | P[H_0 | Y= 14] = 0.151 |
H_1: \pi < 0.2 | P[H_1] = 0.086 | P[H_1 | Y = 14] = 0.849 |
\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{5.62}{0.09} \\ &\approx 62.44 \end{align*}
|
|
|
H_0: \pi \ge 0.2 | P[H_0] = 0.914 | P[H_0 | Y= 14] = 0.151 |
H_1: \pi < 0.2 | P[H_1] = 0.086 | P[H_1 | Y = 14] = 0.849 |
It’s time to draw a conclusion!
We have fairly convincing evidence in factor of the claim that fewer than 20% of artists at major modern art museums are Gen X or younger.
This gives us more information - we have a holistic measure of the level of uncertainty about the claim.
\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}
P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0
\text{posterior odds} = \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} = \frac{1}{0}
Welp.
Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).
Welp.
Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).
Let’s reframe our hypotheses:
\begin{align*} H_0&: \ \pi \in (0.25, 0.35) \\ H_1&: \ \pi \not\in (0.25, 0.35) \end{align*}
Now, we can more rigorously claim belief in H_1.
This also allows us a way to construct our hypothesis test with posterior probability and Bayes Factor.
In addition to estimating the posterior and using the posterior distribution for hypothesis testing, we may be interested in predicting the outcome in a new dataset.
Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
Logical response:
posterior guess was 16%
n = 20
n \hat{\pi} = 20(0.16) = 3.2
… can we have 3.2 artworks?
Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
Two sources of variability in prediction:
Sampling variability: if we randomly sample n artists, we do not expect n \hat{\pi} \in \{1, 2, ..., n\}.
Posterior variability in \pi: we know that 0.16 is not the singular posterior plausible value of \pi.
95% credible interval for \pi: (0.10, 0.24)
What do we expect to see under each possible \pi?
Let’s look at combining the sampling variability in our new Y and posterior variability in \pi.
Let Y' = y' be the (unknown) number of the 20 new artwork that are done by Gen X or younger artists.
Conditioned on \pi, the sampling variability in Y' can be modeled
Y'|\pi \sim \text{Bin}(20, \pi)
f(y'|\pi) = P[Y' = y'|\pi] = {20\choose{y'}} \pi^{y'} (1-\pi)^{20-y'}
We can weight f(y'|\pi) by the posterior pdf, f(\pi|y=14).
This captures the chance of observing Y' = y' Gen Xers for a given \pi
At the same time, this accounts for the posterior plausibility of \pi.
f(y'|\pi) f(\pi|y=14)
f(y'|y) = \int f(y'|\pi) f(\pi|y) \ d \pi
The overall chance of observing Y' = y' weights the chance of observing this outcome under any possible \pi by the posterior plausibility of \pi.
Chance of observing this outcome under any \pi: f(y'|\pi).
Posterior plausibility of \pi: f(\pi|y).
# STEP 2: SIMULATE the posterior
art_sim <- stan(model_code = art_model, data = list(Y = 14), chains = 4, iter = 5000*2, seed = 84735)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 3e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.011 seconds (Warm-up)
Chain 1: 0.011 seconds (Sampling)
Chain 1: 0.022 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.011 seconds (Warm-up)
Chain 2: 0.012 seconds (Sampling)
Chain 2: 0.023 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.011 seconds (Warm-up)
Chain 3: 0.011 seconds (Sampling)
Chain 3: 0.022 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.011 seconds (Warm-up)
Chain 4: 0.011 seconds (Sampling)
Chain 4: 0.022 seconds (Total)
Chain 4:
As we saw previously, the posterior was Beta(18, 92).
We will use the tidy()
function from the broom.mixed
package.
The approximate middle 95% CI for \pi is (0.100, 0.239).
Our approximation of the actual posterior median is 0.162.
mcmc_areas()
function from the bayesrules
package to get a corresponding graph,:::
Unfortunately, tidy()
does not give everything we may be interested in.
art_chains_df %>%
summarize(post_mean = mean(pi),
post_median = median(pi),
post_mode = sample_mode(pi),
lower_95 = quantile(pi, 0.025),
upper_95 = quantile(pi, 0.975))
tidy()
(and then some!)Now that we have saved the Markov chain values, we can use them to answer questions about the data.
Recall, we were interested in testing the claim that fewer than 20% of major museum artists are Gen X.
From this, we can see that MCMC gave us an accurate approximation.
We should use this as “proof” that the approximations are “reliable” for non-conjugate families.
8.4
8.8
8.9
8.10
8.14
8.15
8.16