Posterior Analysis

Introduction

  • Now that we know how to find posterior distributions, we will discuss how to use them to make inference.

  • Here are the packages we need today:

library(bayesrules)
library(tidyverse)
library(rstan)
library(bayesplot)
library(broom.mixed)
library(janitor)
data("moma_sample")

Working Example

  • Imagine you find yourself standing at the Museum of Modern Art (MoMA) in New York City, captivated by the artwork in front of you.
    • What are the chances that this modern artist is Gen X or even younger, i.e., born in 1965 or later?
  • Let \pi denote the proportion of artists represented in major U.S. modern art museums that are Gen X or younger.
    • Let’s let the Beta(4,6) prior model for \pi reflect our prior assumption that major modern art museums disproportionately display artists born before 1965, i.e., \pi most likely falls below 0.5.
    • Rationale: “modern art” dates back to the 1880s and it can take a while to gain recognition as an artist.

Working Example

  • The Beta(4,6) prior model for \pi:

Working Example

  • Let’s consider the following dataset:
head(moma_sample)

Working Example

  • Counting the number of Generation X and younger,
moma_sample %>% 
  group_by(genx) %>% 
  tally()

Working Example

  • Our modeling is as follows,

\begin{align*} Y|\pi &\sim \text{Bin}(100,\pi) \\ \pi &\sim \text{Beta}(4,6) \\ \pi | (Y = 14) &\sim \text{Beta(18,92)} \end{align*}

Working Example

Working Example

  • There are three common tasks in posterior analysis:
    • estimation,
    • hypothesis testing, and
    • prediction.
  • For example,
    • What’s our estimate of \pi?
    • Does our model support the claim that fewer than 20% of museum artists are Gen X or younger?
    • If we sample 20 more museum artists, how many do we predict will be Gen X or younger?

Posterior Estimation

  • We can construct posterior credible intervals.
    • A posterior credible interval (CI) provides a range of posterior plausible values of \theta, and thus a summary of both posterior central tendency and variability.
    • A middle 95% CI is constructed by the 2.5th and 97.5th posterior percentiles, \left( \theta_{0.025}, \theta_{0.975} \right), and there is a 95% posterior probability that \theta is in this range,

P\left[ \theta \in (\theta_{0.025}, \theta_{0.975})|Y=y \right] = \int_{\theta_{0.025}}^{\theta_{0.975}} f(\theta|y) d\theta = 0.95

Posterior Estimation

  • Recall the Beta(18, 92) posterior model for \pi, the proportion of modern art museum artists that are Gen X or younger.
qbeta(c(0.025, 0.975), 18, 92) # 95% CI
[1] 0.1009084 0.2379286
  • There is a 95% posterior probability that somewhere between 10% and 24% of museum artists are Gen X or younger.

Posterior Estimation

  • Recall the Beta(18, 92) posterior model for \pi, the proportion of modern art museum artists that are Gen X or younger.
qbeta(c(0.25, 0.75), 18, 92) # 50% CI
[1] 0.1388414 0.1862197
  • There is a 50% posterior probability that somewhere between 14% and 19% of museum artists are Gen X or younger.

Posterior Estimation

  • 95% is a common choice, however, note that it is somewhat arbitrary and used because of decades of tradition.
  • There is no one right credible interval.
    • It will just depend on the context of the analysis.

Posterior Hypothesis Testing

  • Suppose we read an article claiming that fewer than 20% of museum artists are Gen X or younger.
  • How plausible is it that \pi < 0.2?

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

Posterior Hypothesis Testing

  • Posterior Probability

P\left[ \pi < 0.2 | Y = 14 \right] = \int_0^{0.2} f(\pi|y = 14) d\pi

  • We can find this probability by using the pbeta() function:
pbeta(0.20, 18, 92)

Posterior Hypothesis Testing

  • We can find this probability by using the pbeta() function:
pbeta(0.20, 18, 92)
[1] 0.8489856
  • Thus,

P\left[ \pi < 0.2 | Y = 14 \right] = 0.849

  • There is approximately an 84.9% posterior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

  • Prior Probability

P\left[ \pi < 0.2 \right] = \int_0^{0.2} f(\pi) d\pi

  • We can find this probability by using the pbeta() function:
pbeta(0.20, 4, 6)

Posterior Hypothesis Testing

  • We can find this probability by using the pbeta() function:
pbeta(0.20, 4, 6)
[1] 0.08564173
  • Thus,

P\left[ \pi < 0.2 \right] = 0.086

  • There is approximately an 8.6% prior chance that Gen Xers account for fewer than 20% of modern art museum artists.

Posterior Hypothesis Testing

  • We can create a table of information we will use to make inferences:
Hypotheses
Prior Probability
Posterior Probability
H_0: \pi \ge 0.2 P[H_0] = 0.914 P[H_0 | Y= 14] = 0.151
H_1: \pi < 0.2 P[H_1] = 0.086 P[H_1 | Y = 14] = 0.849

Posterior Hypothesis Testing

  • Let’s find the posterior odds in favor of H_1:
Hypotheses
Prior Probability
Posterior Probability
H_0: \pi \ge 0.2 P[H_0] = 0.914 P[H_0 | Y= 14] = 0.151
H_1: \pi < 0.2 P[H_1] = 0.086 P[H_1 | Y = 14] = 0.849

\begin{align*} \text{posterior odds} &= \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} \\ &= \frac{0.849}{0.151} \\ &\approx 5.62 \end{align*}

  • Our posterior assessment, after considering data collected, suggests that \pi is 5.62 times more likely to be below 0.2 rather than being above 0.2.

Posterior Hypothesis Testing

  • Let’s find the prior odds in favor of H_1:
Hypotheses
Prior Probability
Posterior Probability
H_0: \pi \ge 0.2 P[H_0] = 0.914 P[H_0 | Y= 14] = 0.151
H_1: \pi < 0.2 P[H_1] = 0.086 P[H_1 | Y = 14] = 0.849

\begin{align*} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.086}{0.914} \\ &\approx 0.093 \end{align*}

  • Our prior assessment, before considering the data collected, suggests that \pi is 0.093 times more likely to be below 0.2 rather than being above 0.2.

Posterior Hypothesis Testing

  • Bayes Factor
    • When we are comparing two competing hypotheses, H_0 vs. H_1, the Bayes Factor is an odds ratio for H_1.
    • i.e., the Bayes Factor as a measure of evidence in favor of H_1.

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

  • This will be our alternative to the p-value.

Posterior Hypothesis Testing

  • Bayes Factor

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

  • We will compare the BF to 1.
    • BF = 1: The plausibility of H_1 did not change in light of the observed data.
    • BF > 1: The plausibility of H_1 increased in light of the observed data.
      • The greater the Bayes Factor, the more convincing the evidence for H_1.
    • BF < 1: The plausibilty of H_1 decreased in light of the observed data.

Posterior Hypothesis Testing

  • Let’s now find the Bayes Factor:
Hypotheses
Prior Probability
Posterior Probability
H_0: \pi \ge 0.2 P[H_0] = 0.914 P[H_0 | Y= 14] = 0.151
H_1: \pi < 0.2 P[H_1] = 0.086 P[H_1 | Y = 14] = 0.849

\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{5.62}{0.09} \\ &\approx 62.44 \end{align*}

Posterior Hypothesis Testing

  • Let’s now find the Bayes Factor:
Hypotheses
Prior Probability
Posterior Probability
H_0: \pi \ge 0.2 P[H_0] = 0.914 P[H_0 | Y= 14] = 0.151
H_1: \pi < 0.2 P[H_1] = 0.086 P[H_1 | Y = 14] = 0.849


prior_odds <- pbeta(0.20, 4, 6)/(1-pbeta(0.20, 4, 6))
post_odds <- pbeta(0.20, 18, 92)/(1-pbeta(0.20, 18, 92))
(BF <- post_odds/prior_odds)
[1] 60.02232

Posterior Hypothesis Testing

  • It’s time to draw a conclusion!

    • Posterior probability: 0.85
    • Bayes factor: 60
  • We have fairly convincing evidence in factor of the claim that fewer than 20% of artists at major modern art museums are Gen X or younger.

  • This gives us more information - we have a holistic measure of the level of uncertainty about the claim.

    • This should help us inform our next steps.

Posterior Hypothesis Testing

  • We now want to test whether or not 30% of major museum artists are Gen X or younger.

\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}

  • Why is this an issue?

Posterior Hypothesis Testing

  • We now want to test whether or not 30% of major museum artists are Gen X or younger.

\begin{align*} H_0&: \ \pi = 0.3 \\ H_1&: \ \pi \ne 0.3 \end{align*}

  • Why is this an issue? The posterior probability of a point hypothesis is always 0!

P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0

Posterior Hypothesis Testing

  • The posterior probability of a point hypothesis is always 0!

P\left[ \pi =0.3 | Y = 14 \right] = \int_{0.3}^{0.3} f(\pi|y = 14) d\pi = 0

  • …. meaning:

\text{posterior odds} = \frac{P\left[ H_1 | Y = 14 \right]}{P\left[ H_0 | Y = 14 \right]} = \frac{1}{0}

Posterior Hypothesis Testing

  • Welp.

  • Let’s think about the 95% posterior credible interval for \pi: (0.10, 0.24).

    • Do we think that 0.3 is a plausible value?

Posterior Hypothesis Testing

  • Let’s reframe our hypotheses:

\begin{align*} H_0&: \ \pi \in (0.25, 0.35) \\ H_1&: \ \pi \not\in (0.25, 0.35) \end{align*}

  • Now, we can more rigorously claim belief in H_1.
    • The entire hypothesized range is above the 95% CI.
  • This also allows us a way to construct our hypothesis test with posterior probability and Bayes Factor.

Posterior Prediction

  • In addition to estimating the posterior and using the posterior distribution for hypothesis testing, we may be interested in predicting the outcome in a new dataset.

  • Suppose we get our hands on data for 20 more artworks displayed at the museum.

  • Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?

Posterior Prediction

  • Suppose we get our hands on data for 20 more artworks displayed at the museum.

  • Based on the posterior understanding of \pi that we’ve developed throughout this chapter, what number would you predict are done by artists that are Gen X or younger?

    • Logical response:
      • posterior guess was 16%
      • n = 20
      • n \hat{\pi} = 20(0.16) = 3.2

Posterior Prediction

  • Suppose we get our hands on data for 20 more artworks displayed at the museum. What number would you predict are done by artists that are Gen X or younger?
    • Two sources of variability in prediction:
      • Sampling variability: we do not expect n \hat{\pi} to be an integer.
      • Posterior variability in \pi: we know that 0.16 is not the only plausible \pi.
        • 95% credible interval for \pi: (0.10, 0.24)
        • What do we expect to see under each possible \pi?

Posterior Prediction

  • Let’s look at combining the sampling variability in our new Y and posterior variability in \pi.
    • Let Y' = y' be the (unknown) number of the 20 new artwork that are done by Gen X or younger artists.
      • y' \in \{0, 1, ..., 20\}.
    • Conditioned on \pi, the sampling variability in Y' can be modeled

Y'|\pi \sim \text{Bin}(20, \pi)

f(y'|\pi) = P[Y' = y'|\pi] = {20\choose{y'}} \pi^{y'} (1-\pi)^{20-y'}

Posterior Prediction

  • We can weight f(y'|\pi) by the posterior pdf, f(\pi|y=14).
    • This captures the chance of observing Y' = y' Gen Xers for a given \pi
    • At the same time, this accounts for the posterior plausibility of \pi.

f(y'|\pi) f(\pi|y=14)

Posterior Prediction

  • This leads us to the posterior predictive model of Y'.

f(y'|y) = \int f(y'|\pi) f(\pi|y) \ d \pi

  • The overall chance of observing Y' = y' weights the chance of observing this outcome under any possible \pi by the posterior plausibility of \pi.
    • Chance of observing this outcome under any \pi: f(y'|\pi).
    • Posterior plausibility of \pi: f(\pi|y).

Posterior Prediction

# STEP 1: DEFINE the model
art_model <- "
  data {
    int<lower = 0, upper = 100> Y;
  }
  parameters {
    real<lower = 0, upper = 1> pi;
  }
  model {
    Y ~ binomial(100, pi);
    pi ~ beta(4, 6);
  }
"

Posterior Prediction

# STEP 2: SIMULATE the posterior
art_sim <- stan(model_code = art_model, 
                data = list(Y = 14),  
                chains = 4, 
                iter = 5000*2, 
                seed = 84735)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 1:                0.013 seconds (Sampling)
Chain 1:                0.026 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.014 seconds (Warm-up)
Chain 2:                0.015 seconds (Sampling)
Chain 2:                0.029 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 3:                0.014 seconds (Sampling)
Chain 3:                0.027 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.013 seconds (Warm-up)
Chain 4:                0.013 seconds (Sampling)
Chain 4:                0.026 seconds (Total)
Chain 4: 

Posterior Prediction

# Parallel trace plot
mcmc_trace(art_sim, pars = "pi", size = 0.5) + xlab("iteration")

Posterior Prediction

# Density plot
mcmc_dens_overlay(art_sim, pars = "pi")

Posterior Prediction

# Autocorrelation plot
mcmc_acf(art_sim, pars = "pi")

Posterior Prediction

  • As we saw previously, the posterior was Beta(18, 92).

  • We will use the tidy() function from the broom.mixed package.

library(broom.mixed)
tidy(art_sim, conf.int = TRUE, conf.level = 0.95)
  • The approximate middle 95% CI for \pi is (0.100, 0.239).

  • Our approximation of the actual posterior median is 0.162.

Posterior Prediction

  • We can use the mcmc_areas() function from the bayesrules package to get a corresponding graph,
mcmc_areas(art_sim, pars = "pi", prob = 0.95)

Posterior Simulation

  • Unfortunately, tidy() does not give everything we may be interested in.
    • We can save the Markov chain values to a dataset and analyze.
# Store the 4 chains in 1 data frame
art_chains_df <- as.data.frame(art_sim, pars = "lp__", include = FALSE)
dim(art_chains_df)
[1] 20000     1
head(art_chains_df, n=3)

Posterior Simulation

  • We can then summarize the Markov chain values,
art_chains_df %>% 
  summarize(post_mean = mean(pi), 
            post_median = median(pi),
            post_mode = sample_mode(pi),
            lower_95 = quantile(pi, 0.025),
            upper_95 = quantile(pi, 0.975))
  • We have reproduced/verified the results from tidy() (and then some!)

Posterior Simulation

  • Now that we have saved the Markov chain values, we can use them to answer questions about the data.

  • Recall, we were interested in testing the claim that fewer than 20% of major museum artists are Gen X.

art_chains_df %>% 
  mutate(exceeds = pi < 0.20) %>% 
  tabyl(exceeds)
  • By the posterior, there is an 84.6% chance that Gen X artist representation is under 20%.

Posterior Simulation

  • Let us compare the results between using conjugate family knowledge and MCMC.
  • From this, we can see that MCMC gave us an accurate approximation.

  • We should use this as “proof” that the approximations are “reliable” for non-conjugate families.

    • Always look at diagnostics!

Homework / Practice

  • From the Bayes Rules! textbook:

    • 8.4
    • 8.8
    • 8.9
    • 8.10
    • 8.14
    • 8.15
    • 8.16