Posterior Analysis

Introduction

Now that we know how to find posterior distributions, we will discuss how to use them to make inference.
Here are the packages we need today:

library(bayesrules)
library(tidyverse)
library(rstan)
library(bayesplot)
library(broom.mixed)
library(janitor)
data("moma_sample")

Working Example

Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you.
- What are the chances that this modern artist is Gen X or even younger, i.e., born in 1965 or later?
Let \pi denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger.
- Let’s let the Beta(4,6) prior model for \pi reflect our prior assumption that major modern art museums disproportionately display artists born before 1965, i.e., \pi most likely falls below 0.5.
- Rationale: “modern art” dates back to the 1880s and it can take a while to gain recognition as an artist.

Working Example

The Beta(4,6) prior model for \pi:

Working Example

Let’s consider the following dataset:

head(moma_sample)

Working Example

Counting the number of Generation X and younger,

moma_sample %>% 
  group_by(genx) %>% 
  tally()

Working Example

Our modeling is as follows,

\begin{align*} Y|\pi &\sim \text{Bin}(100,\pi) \\ \pi &\sim \text{Beta}(4,6) \\ \pi | (Y = 14) &\sim \text{Beta(18,92)} \end{align*}

Working Example

Working Example

There are three common tasks in posterior analysis:
- estimation,
- hypothesis testing, and
- prediction.
For example,
- What’s our estimate of \pi?
- Does our model support the claim that fewer than 20% of museum artists are Gen X or younger?
- If we sample 20 more museum artists, how many do we predict will be Gen X or younger?

Posterior Estimation

We can construct posterior credible intervals.
- A posterior credible interval (CI) provides a range of posterior plausible values of \theta, and thus a summary of both posterior central tendency and variability.
- A middle 95% CI is constructed by the 2.5th and 97.5th posterior percentiles, \left( \theta_{0.025}, \theta_{0.975} \right), and there is a 95% posterior probability that \theta is in this range,

P\left[ \theta \in (\theta_{0.025}, \theta_{0.975})|Y=y \right] = \int_{\theta_{0.025}}^{\theta_{0.975}} f(\theta|y) d\theta = 0.95

Posterior Estimation

Recall the Beta(18, 92) posterior model for \pi, the proportion of modern art museum artists that are Gen X or younger.

qbeta(c(0.025, 0.975), 18, 92) # 95% CI

[1] 0.1009084 0.2379286

There is a 95% posterior probability that somewhere between 10% and 24% of museum artists are Gen X or younger.

Posterior Estimation

Recall the Beta(18, 92) posterior model for \pi, the proportion of modern art museum artists that are Gen X or younger.

qbeta(c(0.25, 0.75), 18, 92) # 50% CI

[1] 0.1388414 0.1862197

There is a 50% posterior probability that somewhere between 14% and 19% of museum artists are Gen X or younger.

Posterior Estimation

95% is a common choice, however, note that it is somewhat arbitrary and used because of decades of tradition.

There is no one right credible interval.
- It will just depend on the context of the analysis.

Posterior Hypothesis Testing

Suppose we read an article claiming that fewer than 20% of museum artists are Gen X or younger.

How plausible is it that \pi < 0.2?

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

Posterior Hypothesis Testing

Posterior Probability

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

We can find this probability by using the pbeta() function:

pbeta(0.20, 18, 92)

Posterior Hypothesis Testing

We can find this probability by using the pbeta() function:

pbeta(0.20, 18, 92)

[1] 0.8489856

Thus,

P\left[ \pi < 0.2 | Y = 14 \right] = 0.849

There is approximately an 84.9% posterior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

Prior Probability

P\left[ \pi < 0.2 \right] = \int_0^{0.2} f(\pi) d\pi

We can find this probability by using the pbeta() function:

pbeta(0.20, 4, 6)

Posterior Hypothesis Testing

We can find this probability by using the pbeta() function:

pbeta(0.20, 4, 6)

[1] 0.08564173

Thus,

P\left[ \pi < 0.2 \right] = 0.086

There is approximately an 8.6% prior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

We can create a table of information we will use to make inferences:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

Posterior Hypothesis Testing

Let’s find the posterior odds in favor of H_1:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{posterior odds} &= \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} \\ &= \frac{0.849}{0.151} \\ &\approx 5.62 \end{align*}

Our posterior assessment, after considering data collected, suggests that \pi is 5.62 times more likely to be below 0.2 rather than being above 0.2.

Posterior Hypothesis Testing

Let’s find the prior odds in favor of H_1:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.086}{0.914} \\ &\approx 0.093 \end{align*}

Our prior assessment, before considering the data collected, suggests that \pi is 0.093 times more likely to be below 0.2 rather than being above 0.2.

Posterior Hypothesis Testing

Bayes Factor
- When we are comparing two competing hypotheses, H_0 vs. H_1, the Bayes Factor is an odds ratio for H_1.
- i.e., the Bayes Factor as a measure of evidence in favor of H_1.

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

This will be our alternative to the p-value.

Posterior Hypothesis Testing

Bayes Factor

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

We will compare the BF to 1.
- BF = 1: The plausibility of H_1 did not change in light of the observed data.
- BF > 1: The plausibility of H_1 increased in light of the observed data.
  - The greater the Bayes Factor, the more convincing the evidence for H_1.
- BF < 1: The plausibilty of H_1 decreased in light of the observed data.

Posterior Hypothesis Testing

Let’s now find the Bayes Factor:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{5.62}{0.09} \\ &\approx 62.44 \end{align*}

Posterior Hypothesis Testing

Let’s now find the Bayes Factor:

Hypotheses	Prior Probability	Posterior Probability
H_0: \pi \ge 0.2	P[H_0] = 0.914	P[H_0 \| Y= 14] = 0.151
H_1: \pi < 0.2	P[H_1] = 0.086	P[H_1 \| Y = 14] = 0.849

prior_odds <- pbeta(0.20, 4, 6)/(1-pbeta(0.20, 4, 6))
post_odds <- pbeta(0.20, 18, 92)/(1-pbeta(0.20, 18, 92))
(BF <- post_odds/prior_odds)

[1] 60.02232

Posterior Hypothesis Testing

It’s time to draw a conclusion!
- Posterior probability: 0.85
- Bayes factor: 60
We have fairly convincing evidence in factor of the claim that fewer than 20% of artists at major modern art museums are Gen X or younger.
This gives us more information - we have a holistic measure of the level of uncertainty about the claim.
- This should help us inform our next steps.

Posterior Hypothesis Testing

We now want to test whether or not 30% of major museum artists are Gen X or younger.

\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}

Why is this an issue?

Posterior Hypothesis Testing

We now want to test whether or not 30% of major museum artists are Gen X or younger.

\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}

Why is this an issue? The posterior probability of a point hypothesis is always 0!

P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0

Posterior Hypothesis Testing

The posterior probability of a point hypothesis is always 0!

P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0

…. meaning:

\text{posterior odds} = \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} = \frac{1}{0}

Posterior Hypothesis Testing

Welp.
Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).
- Do we think that 0.3 is a plausible value?

Posterior Hypothesis Testing

Let’s reframe our hypotheses:

\begin{align*} H_0&: \ \pi \in (0.25, 0.35) \\ H_1&: \ \pi \not\in (0.25, 0.35) \end{align*}

Now, we can more rigorously claim belief in H_1.
- The entire hypothesized range is above the 95% CI.
This also allows us a way to construct our hypothesis test with posterior probability and Bayes Factor.

Posterior Prediction

In addition to estimating the posterior and using the posterior distribution for hypothesis testing, we may be interested in predicting the outcome in a new dataset.
Suppose we get our hands on data for 20 more artworks displayed at the museum.
Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?

Posterior Prediction

Suppose we get our hands on data for 20 more artworks displayed at the museum.
Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?
- Logical response:
  - posterior guess was 16%
  - n = 20
  - n \hat{\pi} = 20(0.16) = 3.2

Posterior Prediction

Suppose we get our hands on data for 20 more artworks displayed at the museum. What number would you predict are done by artists that are Gen X or younger?
- Two sources of variability in prediction:
  - Sampling variability: we do not expect n \hat{\pi} to be an integer.
  - Posterior variability in \pi: we know that 0.16 is not the only plausible \pi.
    - 95% credible interval for \pi: (0.10, 0.24)
    - What do we expect to see under each possible \pi?

Posterior Prediction

Let’s look at combining the sampling variability in our new Y and posterior variability in \pi.
- Let Y' = y' be the (unknown) number of the 20 new artwork that are done by Gen X or younger artists.
  - y' \in \{0, 1, ..., 20\}.
- Conditioned on \pi, the sampling variability in Y' can be modeled

Y'|\pi \sim \text{Bin}(20, \pi)

f(y'|\pi) = P[Y' = y'|\pi] = {20\choose{y'}} \pi^{y'} (1-\pi)^{20-y'}

Posterior Prediction

We can weight f(y'|\pi) by the posterior pdf, f(\pi|y=14).
- This captures the chance of observing Y' = y' Gen Xers for a given \pi
- At the same time, this accounts for the posterior plausibility of \pi.

f(y'|\pi) f(\pi|y=14)

Posterior Prediction

This leads us to the posterior predictive model of Y'.

f(y'|y) = \int f(y'|\pi) f(\pi|y) \ d \pi

The overall chance of observing Y' = y' weights the chance of observing this outcome under any possible \pi by the posterior plausibility of \pi.
- Chance of observing this outcome under any \pi: f(y'|\pi).
- Posterior plausibility of \pi: f(\pi|y).

Posterior Prediction

# STEP 1: DEFINE the model
art_model <- "
  data {
    int<lower = 0, upper = 100> Y;
  }
  parameters {
    real<lower = 0, upper = 1> pi;
  }
  model {
    Y ~ binomial(100, pi);
    pi ~ beta(4, 6);
  }
"

Posterior Prediction

# STEP 2: SIMULATE the posterior
art_sim <- stan(model_code = art_model, 
                data = list(Y = 14),  
                chains = 4, 
                iter = 5000*2, 
                seed = 84735)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 1:                0.013 seconds (Sampling)
Chain 1:                0.026 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.014 seconds (Warm-up)
Chain 2:                0.015 seconds (Sampling)
Chain 2:                0.029 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 3:                0.014 seconds (Sampling)
Chain 3:                0.027 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 4:                0.013 seconds (Sampling)
Chain 4:                0.026 seconds (Total)
Chain 4:

Posterior Prediction

# Parallel trace plot
mcmc_trace(art_sim, pars = "pi", size = 0.5) + xlab("iteration")

Posterior Prediction

# Density plot
mcmc_dens_overlay(art_sim, pars = "pi")

Posterior Prediction

# Autocorrelation plot
mcmc_acf(art_sim, pars = "pi")

Posterior Prediction

As we saw previously, the posterior was Beta(18, 92).
We will use the tidy() function from the broom.mixed package.

library(broom.mixed)
tidy(art_sim, conf.int = TRUE, conf.level = 0.95)

The approximate middle 95% CI for \pi is (0.100, 0.239).
Our approximation of the actual posterior median is 0.162.

Posterior Prediction

We can use the mcmc_areas() function from the bayesrules package to get a corresponding graph,

mcmc_areas(art_sim, pars = "pi", prob = 0.95)

Posterior Simulation

Unfortunately, tidy() does not give everything we may be interested in.
- We can save the Markov chain values to a dataset and analyze.

# Store the 4 chains in 1 data frame
art_chains_df <- as.data.frame(art_sim, pars = "lp__", include = FALSE)
dim(art_chains_df)

[1] 20000     1

head(art_chains_df, n=3)

Posterior Simulation

We can then summarize the Markov chain values,

art_chains_df %>% 
  summarize(post_mean = mean(pi), 
            post_median = median(pi),
            post_mode = sample_mode(pi),
            lower_95 = quantile(pi, 0.025),
            upper_95 = quantile(pi, 0.975))

We have reproduced/verified the results from tidy() (and then some!)

Posterior Simulation

Now that we have saved the Markov chain values, we can use them to answer questions about the data.
Recall, we were interested in testing the claim that fewer than 20% of major museum artists are Gen X.

art_chains_df %>% 
  mutate(exceeds = pi < 0.20) %>% 
  tabyl(exceeds)

By the posterior, there is an 84.6% chance that Gen X artist representation is under 20%.

Posterior Simulation

Let us compare the results between using conjugate family knowledge and MCMC.

From this, we can see that MCMC gave us an accurate approximation.
We should use this as “proof” that the approximations are “reliable” for non-conjugate families.
- Always look at diagnostics!

Homework / Practice

From the Bayes Rules! textbook:
- 8.4
- 8.8
- 8.9
- 8.10
- 8.14
- 8.15
- 8.16