Posterior Inference and Prediction

STA6349: Applied Bayesian Analysis

Introduction

Now that we know how to find posterior distributions, we will discuss how to use them to make inference.

library(bayesrules)
library(tidyverse)
library(rstan)
library(bayesplot)
library(broom.mixed)
library(janitor)
data("moma_sample")

Working Example

Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you.
While understanding that “modern” art doesn’t necessarily mean “new” art, a question still bubbles up: what are the chances that this modern artist is Gen X or even younger, i.e., born in 1965 or later?
Let \pi denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger.
The Beta(4,6) prior model for \pi reflects our own very vague prior assumption that major modern art museums disproportionately display artists born before 1965, i.e., \pi most likely falls below 0.5.
After all, “modern art” dates back to the 1880s and it can take a while to attain such high recognition in the art world.

Working Example

The Beta(4,6) prior model for \pi reflects our own very vague prior assumption

Working Example

Let’s consider the following dataset:

head(moma_sample)

Working Example

Counting the number of Generation X and younger,

moma_sample %>% 
  group_by(genx) %>% 
  tally()

Working Example

Our modeling is as follows,

\begin{align*} Y|\pi &\sim \text{Bin}(100,\pi) \\ \pi &\sim \text{Beta}(4,6) \\ \pi | (Y = 14) &\sim \text{Beta(18,92)} \end{align*}

summarize_beta_binomial(alpha = 4, beta = 6,
                        y = 14, n = nrow(moma_sample))

Working Example

plot_beta_binomial(alpha = 4, beta = 6,
                   y = 14, n = nrow(moma_sample)) +
  theme_bw()

Working Example

We must be able to utilize this posterior to perform a rigorous posterior analysis of \pi.
There are three common tasks in posterior analysis:
- estimation,
- hypothesis testing, and
- prediction.
For example,
- What’s our estimate of \pi?
- Does our model support the claim that fewer than 20% of museum artists are Gen X or younger?
- If we sample 20 more museum artists, how many do we predict will be Gen X or younger?

Posterior Estimation

We can construct posterior credible intervals.
- Let \theta have posterior pmf f(\theta|y).
- A posterior credible interval (CI) provides a range of posterior plausible values of \theta, and thus a summary of both posterior central tendency and variability.
- A middle 95% CI is constructed by the 2.5th and 97.5th posterior percentiles, \left( \theta_{0.025}, \theta_{0.975} \right), and there is a 95% posterior probability that \theta is in this range,

P\left[ \theta \in (\theta_{0.025}, \theta_{0.975})|Y=y \right] = \int_{\theta_{0.025}}^{\theta_{0.975}} f(\theta|y) d\theta = 0.95

Posterior Estimation

Recall the Beta(18, 92) posterior model for \pi, the proportion of modern art museum artists that are Gen X or younger.

qbeta(c(0.025, 0.975), 18, 92) # 95% CI

[1] 0.1009084 0.2379286

There is a 95% posterior probability that somewhere between 10% and 24% of museum artists are Gen X or younger.

qbeta(c(0.25, 0.75), 18, 92) # 50% CI

[1] 0.1388414 0.1862197

There is a 50% posterior probability that somewhere between 14% and 19% of museum artists are Gen X or younger.

Posterior Estimation

95% is a common choice, however, note that it is somewhat arbitrary and used because of decades of tradition.

There is no one right credible interval.
- It will just depend on the context of the analysis.

Posterior Hypothesis Testing

Suppose we read an article claiming that fewer than 20% of museum artists are Gen X or younger.

How plausible is it that \pi < 0.2?

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

Posterior Hypothesis Testing

Posterior Probability

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

We can find this probability by using the pbeta() function:

pbeta(0.20, 18, 92)

[1] 0.8489856

Thus,

P\left[ \pi < 0.2 | Y = 14 \right] = 0.849

There is approximately an 84.9% posterior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

Prior Probability

P\left[ \pi < 0.2 \right] = \int_0^{0.2} f(\pi) d\pi

We can find this probability by using the pbeta() function:

pbeta(0.20, 4, 6)

[1] 0.08564173

Thus,

P\left[ \pi < 0.2 \right] = 0.086

There is approximately an 8.6% prior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

We can create a table of information we will use to make inferences:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

Posterior Hypothesis Testing

Let’s find the posterior odds:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{posterior odds} &= \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} \\ &= \frac{0.849}{0.151} \\ &\approx 5.62 \end{align*}

Our posterior assessment suggests that \pi is 5.62 times more likely to be below 0.2 rather than being above 0.2.

Posterior Hypothesis Testing

Let’s find the prior odds:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.086}{0.914} \\ &\approx 0.093 \end{align*}

Posterior Hypothesis Testing

Bayes Factor
- When we are comparing two competing hypotheses, H_0 vs. H_1, the Bayes Factor is an odds ratio for H_1:

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

We will compare this value to 1.
- BF = 1: The plausibility of H_1 did not change in light of the observed data.
- BF > 1: The plausibility of H_1 increased in light of the observed data.
  - The greater the Bayes Factor, the more convincing the evidence for H_1.
- BF < 1: The plausibilty of H_1 decreased in light of the observed data.

Posterior Hypothesis Testing

Let’s now find the Bayes Factor:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{5.62}{0.09} \\ &\approx 62.44 \end{align*}

Posterior Hypothesis Testing

Let’s now find the Bayes Factor:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

prior_odds <- pbeta(0.20, 4, 6)/(1-pbeta(0.20, 4, 6))
post_odds <- pbeta(0.20, 18, 92)/(1-pbeta(0.20, 18, 92))
(BF <- post_odds/prior_odds)

[1] 60.02232

Posterior Hypothesis Testing

It’s time to draw a conclusion!
- Posterior probability: 0.85
- Bayes factor: 60
We have fairly convincing evidence in factor of the claim that fewer than 20% of artists at major modern art museums are Gen X or younger.
This gives us more information - we have a holistic measure of the level of uncertainty about the claim.
- This should help us inform our next steps.

Posterior Hypothesis Testing

We now want to test whether or not 30% of major museum artists are Gen X or younger.

\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}

Why is this an issue?

P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0

and even worse,

\text{posterior odds} = \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} = \frac{1}{0}

Posterior Hypothesis Testing

Welp.

Posterior Hypothesis Testing

Welp.
Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).
- Do we think that 0.3 is a plausible value?

Posterior Hypothesis Testing

Welp.
Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).
- Do we think that 0.3 is a plausible value?
Let’s reframe our hypotheses:

\begin{align*} H_0&: \ \pi \in (0.25, 0.35) \\ H_1&: \ \pi \not\in (0.25, 0.35) \end{align*}

Now, we can more rigorously claim belief in H_1.
- The entire hypothesized range is above the 95% CI.
This also allows us a way to construct our hypothesis test with posterior probability and Bayes Factor.

Posterior Prediction

In addition to estimating the posterior and using the posterior distribution for hypothesis testing, we may be interested in predicting the outcome in a new dataset.
Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?

Posterior Prediction

Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
- Logical response:
  - posterior guess was 16%
  - n = 20
  - n \hat{\pi} = 20(0.16) = 3.2
- … can we have 3.2 artworks?

Posterior Prediction

Example: Suppose we get our hands on data for 20 more artworks displayed at the museum. Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
- Two sources of variability in prediction:
  - Sampling variability: if we randomly sample n artists, we do not expect n \hat{\pi} \in \{1, 2, ..., n\}.
  - Posterior variability in \pi: we know that 0.16 is not the singular posterior plausible value of \pi.
    - 95% credible interval for \pi: (0.10, 0.24)
    - What do we expect to see under each possible \pi?

Posterior Prediction

Let’s look at combining the sampling variability in our new Y and posterior variability in \pi.
- Let Y' = y' be the (unknown) number of the 20 new artwork that are done by Gen X or younger artists.
  - y' \in \{0, 1, ..., 20\}.
- Conditioned on \pi, the sampling variability in Y' can be modeled

Y'|\pi \sim \text{Bin}(20, \pi)

f(y'|\pi) = P[Y' = y'|\pi] = {20\choose{y'}} \pi^{y'} (1-\pi)^{20-y'}

Posterior Prediction

We can weight f(y'|\pi) by the posterior pdf, f(\pi|y=14).
- This captures the chance of observing Y' = y' Gen Xers for a given \pi
- At the same time, this accounts for the posterior plausibility of \pi.

f(y'|\pi) f(\pi|y=14)

Posterior Prediction

This leads us to the posterior predictive model of Y'.

f(y'|y) = \int f(y'|\pi) f(\pi|y) \ d \pi

The overall chance of observing Y' = y' weights the chance of observing this outcome under any possible \pi by the posterior plausibility of \pi.
- Chance of observing this outcome under any \pi: f(y'|\pi).
- Posterior plausibility of \pi: f(\pi|y).

Posterior Simulation

# STEP 1: DEFINE the model
art_model <- "
  data {
    int<lower = 0, upper = 100> Y;
  }
  parameters {
    real<lower = 0, upper = 1> pi;
  }
  model {
    Y ~ binomial(100, pi);
    pi ~ beta(4, 6);
  }
"

Posterior Simulation

# STEP 2: SIMULATE the posterior
art_sim <- stan(model_code = art_model, data = list(Y = 14),  chains = 4, iter = 5000*2, seed = 84735)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 1:                0.011 seconds (Sampling)
Chain 1:                0.022 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 2:                0.012 seconds (Sampling)
Chain 2:                0.023 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 3:                0.011 seconds (Sampling)
Chain 3:                0.022 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.011 seconds (Warm-up)
Chain 4:                0.011 seconds (Sampling)
Chain 4:                0.022 seconds (Total)
Chain 4:

Posterior Simulation

# Parallel trace plot
mcmc_trace(art_sim, pars = "pi", size = 0.5) + xlab("iteration")

Posterior Simulation

# Density plot
mcmc_dens_overlay(art_sim, pars = "pi")

Posterior Simulation

# Autocorrelation plot
mcmc_acf(art_sim, pars = "pi")

Posterior Simulation

As we saw previously, the posterior was Beta(18, 92).
We will use the tidy() function from the broom.mixed package.

library(broom.mixed)
tidy(art_sim, conf.int = TRUE, conf.level = 0.95)

The approximate middle 95% CI for \pi is (0.100, 0.239).
Our approximation of the actual posterior median is 0.162.

Posterior Simulation

We can use the mcmc_areas() function from the bayesrules package to get a corresponding graph,

mcmc_areas(art_sim, pars = "pi", prob = 0.95)

:::

Posterior Simulation

Unfortunately, tidy() does not give everything we may be interested in.
- We can save the Markov chain values to a dataset and analyze.

# Store the 4 chains in 1 data frame
art_chains_df <- as.data.frame(art_sim, pars = "lp__", include = FALSE)
dim(art_chains_df)

[1] 20000     1

head(art_chains_df, n=3)

Posterior Simulation

We can then summarize the Markov chain values,

art_chains_df %>% 
  summarize(post_mean = mean(pi), 
            post_median = median(pi),
            post_mode = sample_mode(pi),
            lower_95 = quantile(pi, 0.025),
            upper_95 = quantile(pi, 0.975))

We have reproduced/verified the results from tidy() (and then some!)

Posterior Simulation

Now that we have saved the Markov chain values, we can use them to answer questions about the data.
Recall, we were interested in testing the claim that fewer than 20% of major museum artists are Gen X.

art_chains_df %>% 
  mutate(exceeds = pi < 0.20) %>% 
  tabyl(exceeds)

By the posterior, there is an 84.6% chance that Gen X artist representation is under 20%.

Posterior Simulation

Let us compare the results between using conjugate family knowledge and MCMC.

From this, we can see that MCMC gave us an accurate approximation.
We should use this as “proof” that the approximations are “reliable” for non-conjugate families.
- Always look at diagnostics!

Homework

8.4
8.8
8.9
8.10
8.14
8.15
8.16