Balance and Sequentiality in Bayesian Analysis

Introduction

  • On Monday, we talked about the Beta-Binomial model for binary outcomes with an unknown probability of success, \pi.

  • We will now discuss sequentality in Bayesian analyses.

  • Working example:

    • In Alison Bechdel’s 1985 comic strip The Rule, a character states that they only see a movie if it satisfies the following three rules (Bechdel 1986):
      • the movie has to have at least two women in it;
      • these two women talk to each other; and
      • they talk about something besides a man.
    • Thinking of movies you’ve watched, what percentage of all recent movies do you think pass the Bechdel test? Is it closer to 10%, 50%, 80%, or 100%?

Introduction: Example

  • Let \pi, a random value between 0 and 1, denote the unknown proportion of recent movies that pass the Bechdel test.

  • Three friends (feminist, clueless, and optimist) have some prior ideas about \pi.

    • Reflecting upon movies that he has seen in the past, the feminist understands that the majority lack strong women characters.
    • The clueless doesn’t really recall the movies they’ve seen, and so are unsure whether passing the Bechdel test is common or uncommon.
    • Lastly, the optimist thinks that the Bechdel test is a really low bar for the representation of women in film, and thus assumes almost all movies pass the test.

Introduction: Example

  • Graph the following priors:
plot_beta(alpha = 1, beta = 1)
plot_beta(alpha = 5, beta = 11)
plot_beta(alpha = 14, beta = 1)
  • Which prior belongs to each friend?
    • Reflecting upon movies that he has seen in the past, the feminist understands that the majority lack strong women characters.
    • The clueless doesn’t really recall the movies they’ve seen, and so are unsure whether passing the Bechdel test is common or uncommon.
    • Lastly, the optimist thinks that the Bechdel test is a really low bar for the representation of women in film, and thus assumes almost all movies pass the test.

Introduction: Example

  • The clueless doesn’t really recall the movies they’ve seen, and so are unsure whether passing the Bechdel test is common or uncommon.

Introduction: Example

  • Reflecting upon movies that he has seen in the past, the feminist understands that the majority lack strong women characters.

Introduction: Example

  • Lastly, the optimist thinks that the Bechdel test is a really low bar for the representation of women in film, and thus assumes almost all movies pass the test.

Introduction: Example

  • The analysts agree to review a sample of n recent movies and record Y, the number that pass the Bechdel test.
    • Because the outcome is yes/no, the binomial distribution is appropriate for the data distribution.
    • We aren’t sure what the population proportion, \pi, is, so we will not restrict it to a fixed value.
      • Because we know \pi \in [0, 1], the beta distribution is appropriate for the prior distribution.

\begin{align*} Y|\pi &\sim \text{Bin}(n, \pi) \\ \pi &\sim \text{Beta}(\alpha, \beta) \end{align*}

Introduction: Example

  • Because we know \pi \in [0, 1], the beta distribution is appropriate for the prior distribution.

\begin{align*} Y|\pi &\sim \text{Bin}(n, \pi) \\ \pi &\sim \text{Beta}(\alpha, \beta) \end{align*}

  • From the previous chapter, we know that this results in the following posterior distribution

\pi | (Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)

Introduction: Example

  • Wait!!
    • Everyone gets their own prior?
    • … is there a “correct” prior?
    • …… is the Bayesian world always this subjective?

Introduction: Example

  • More clearly defined questions that we can actually answer:
    • To what extent might different priors lead the analysts to three different posterior conclusions about the Bechdel test?
      • How might this depend upon the sample size and outcomes of the movie data they collect?
    • To what extent will the analysts’ posterior understandings evolve as they collect more and more data?
    • Will they ever come to agreement about the representation of women in film?!

Different Priors \to Different Posteriors

  • The differing prior means show disagreement about whether \pi is closer to 0 or 1.

  • The differing levels of prior variability show that the analysts have different degrees of certainty in their prior information.

Different Priors \to Different Posteriors

  • Informative prior: reflects specific information about the unknown variable with high certainty, i.e., low variability.

Different Priors \to Different Posteriors

  • Vague or diffuse prior: reflects little specific information about the unknown variable.
    • A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.
    • This is effectively saying “🤷.”

Different Priors \to Different Posteriors

  • Okay, great - we have different priors.
    • How do the different priors affect the posterior?
  • We have data from FiveThirtyEight, reporting results of the Bechdel test.

Different Priors \to Different Posteriors

  • So how many pass the test in this sample?
bechdel20 %>% tabyl(binary) %>% adorn_totals("row")

Different Priors \to Different Posteriors

  • Let’s look at the graphs of just the prior and likelihood.
plot_beta_binomial(alpha = 5, beta = 11, y = 9, n = 20, posterior = FALSE) + theme_bw()
plot_beta_binomial(alpha = 1, beta = 1, y = 9, n = 20, posterior = FALSE) + theme_bw()
plot_beta_binomial(alpha = 14, beta = 1, y = 9, n = 20, posterior = FALSE) + theme_bw()
  • Questions to think about:

    • Whose posterior do you anticipate will look the most like the scaled likelihood?
    • Whose do you anticipate will look the least like the scaled likelihood?

Different Priors \to Different Posteriors

  • Let’s look at the graphs of just the prior and likelihood.

Different Priors \to Different Posteriors

  • Let’s look at the graphs of just the prior and likelihood.

Different Priors \to Different Posteriors

  • Let’s look at the graphs of just the prior and likelihood.

Different Priors \to Different Posteriors

  • Find the posterior distributions. (i.e., What are the updated parameters?)
Analyst Prior Posterior
the feminist Beta(5, 11) Beta(14, 22)
the clueless Beta(1, 1) Beta(10, 12)
the optimist Beta(14, 1) Beta(23, 12)
  • Let’s now explore what the posteriors look like.
plot_beta_binomial(alpha = 5, beta = 11, y = 9, n = 20) + theme_bw()
plot_beta_binomial(alpha = 1, beta = 1, y = 9, n = 20) + theme_bw()
plot_beta_binomial(alpha = 14, beta = 1, y = 9, n = 20) + theme_bw()

Different Priors \to Different Posteriors

  • Let’s now explore what the posteriors look like.

Different Priors \to Different Posteriors

  • Let’s now explore what the posteriors look like.

Different Priors \to Different Posteriors

  • Let’s now explore what the posteriors look like.

Different Priors \to Different Posteriors

  • In addition to priors affecting our posterior distributions… the data also affects it.

  • Let’s now consider three new analysts: they all share the optimistic Beta(14, 1) for \pi, however, they have access to different data.

    • Morteza reviews n = 13 movies from the year 1991, among which Y=6 (about 46%) pass the Bechdel.
    • Nadide reviews n = 63 movies from the year 2001, among which Y=29 (about 46%) pass the Bechdel.
    • Ursula reviews n = 99 movies from the year 2013, among which Y=46 (about 46%) pass the Bechdel.
  • How will the different data affect the posterior distributions?

Different Priors \to Different Posteriors

  • How will the different data affect the posterior distributions?
plot_beta_binomial(alpha = 14, beta = 1, y = 6, n = 13) + theme_bw()
plot_beta_binomial(alpha = 14, beta = 1, y = 29, n = 63) + theme_bw()
plot_beta_binomial(alpha = 14, beta = 1, y = 46, n = 99) + theme_bw()
  • Which posterior is the most in sync with their data?

  • Which posterior is the least in sync with their data?

Different Priors \to Different Posteriors

  • How will the different data affect the posterior distributions?

Different Priors \to Different Posteriors

  • How will the different data affect the posterior distributions?

Different Priors \to Different Posteriors

  • How will the different data affect the posterior distributions?

Different Priors \to Different Posteriors

  • Find the posterior distributions. (i.e., What are the updated parameters?)

    • Recall that all use the Beta(14, 1) prior.
Analyst
Data
Posterior
Morteza Y=6 of n=13 Beta(20, 8)
Nadide Y=29 of n=63 Beta(45, 35)
Ursula Y=46 of n=99 Beta(60, 54)
  • Let’s also explore what the posteriors look like.
plot_beta_binomial(alpha = 14, beta = 1, y = 6, n = 13) + theme_bw() 
plot_beta_binomial(alpha = 14, beta = 1, y = 29, n = 63) + theme_bw()
plot_beta_binomial(alpha = 14, beta = 1, y = 46, n = 99) + theme_bw()

Different Priors \to Different Posteriors

  • Let’s explore what the posteriors look like.

Different Priors \to Different Posteriors

  • Let’s explore what the posteriors look like.

Different Priors \to Different Posteriors

  • Let’s explore what the posteriors look like.

Different Priors \to Different Posteriors

  • What did we observe?
    • As n \to \infty, variance in the likelihood \to 0.
      • In Morteza’s small sample of 13 movies, the likelihood function is wide.
      • In Ursula’s larger sample size of 99 movies, the likelihood function is narrower.
    • We see that the narrower the likelihood, the more influence the data holds over the posterior.

Striking a Balance

  • Overall message: no matter the strength of and discrepancies among their prior understanding of \pi, analysts will come to a common posterior understanding in light of strong data.

Striking a Balance

  • The posterior can either favor the data or the prior.
    • The rate at which the posterior balance tips in favor of the data depends upon the prior.
  • Left to right on the graph, the sample size increases from n=13 to n=99 movies, while preserving the proportion that pass (\approx 0.46).
    • The likelihood’s insistence and the data’s influence over the posterior increase with sample size.
    • This also means that the influence of our prior understanding diminishes as we gather new data.
  • Top to bottom on the graph, priors move from informative (Beta(14,1)) to vague (Beta(1,1)).
    • Naturally, the more informative the prior, the greater its influence on the posterior.

Introduction: Sequentiality

  • Let’s now turn our thinking to - okay, we’ve updated our beliefs… but now we have new data!

  • The evolution in our posterior understanding happens incrementally, as we accumulate new data.

    • Scientists’ understanding of climate change has evolved over the span of decades as they gain new information.
    • Presidential candidates’ understanding of their chances of winning an election evolve over months as new poll results become available.

Introduction: Sequentiality

  • Let’s revisit Milgram’s behavioral study of obedience from Chapter 3. Recall, \pi represents the proportion of people that will obey authority, even if it means bringing harm to others.

  • Prior to Milgram’s experiments, our fictional psychologist expected that few people would obey authority in the face of harming another: \pi \sim \text{Beta}(1,10).

  • Now, suppose that the psychologist collected the data incrementally, day by day, over a three-day period.

  • Find the following posterior distributions, each building off the last:

    • Day 0: \text{Beta}(1,10).
    • Day 1: Y=1 out of n=10.
    • Day 2: Y=17 out of n=20.
    • Day 3: Y=8 out of n=10.

Introduction: Sequentiality

  • Find the following posterior distributions, each building off the last:

    • Day 0: \text{Beta}(1,10).
    • Day 1: Y=1 out of n=10: \text{Beta}(1,10) \to \text{Beta}(2, 19).
    • Day 2: Y=17 out of n=20: \text{Beta}(2, 19) \to \text{Beta}(19, 22).
    • Day 3: Y=8 out of n=10: \text{Beta}(19, 22) \to \text{Beta}(27, 24).
  • Recall from Chapter 3, our posterior was \text{Beta}(27,24)!

Sequential Bayesian Analysis or Bayesian Learning

  • In a sequential Bayesian analysis, a posterior model is updated incrementally as more data come in.
    • With each new piece of data, the previous posterior model reflecting our understanding prior to observing this data becomes the new prior model.
  • This is why we love Bayesian!
    • We evolve our thinking as new data come in.
  • These types of sequential analyses also uphold two fundamental properties:
    1. The final posterior model is data order invariant,
    2. The final posterior only depends upon the cumulative data.

Sequential Bayesian Analysis or Bayesian Learning

  • In order:
    • Day 0: \text{Beta}(1,10).
    • Day 1: Y=1 out of n=10: \text{Beta}(1,10) \to \text{Beta}(2, 19).
    • Day 2: Y=17 out of n=20: \text{Beta}(2, 19) \to \text{Beta}(19, 22).
    • Day 3: Y=8 out of n=10: \text{Beta}(19, 22) \to \text{Beta}(27, 24).
  • Out of order:
    • Day 0: \text{Beta}(1,10).
    • Day 3: Y=8 out of n=10: \text{Beta}(1,10) \to \text{Beta}(9, 12).
    • Day 2: Y=17 out of n=20: \text{Beta}(9, 12) \to \text{Beta}(26, 15).
    • Day 1: Y=1 out of n=10: \text{Beta}(26, 15) \to \text{Beta}(27, 24).

Sequential Bayesian Analysis or Bayesian Learning

Example: Mario Kart

  • In Mario Kart 8 Deluxe, item boxes give different items depending on race position. To reduce the “position bias,” only item boxes opened while the racer was in mid-pack (positions 4–10) were recorded.

  • You want to estimate the probability that an item box yields a Red Shell. When playing the Special Cup, only 31 red shells were seen in 114 boxes opened by mid-pack racers.

  • Find the posterior distribution under two priors:

    1. Flat/uninformative prior, Beta(1,1).
    2. Beta(\alpha, \beta) of your choosing.
  • How different are the posterior distributions?

Example: Mario Kart

  • Suppose that we know the individual data points for the Special Cup:

    • Cloudtop Cruise, of 28 boxes, 6 had a red shell.
    • Bone-Dry Dunes, of 27 boxes, 8 had a red shell.
    • Bowser’s Castle, of 29 boxes, 12 had a red shell.
    • Rainbow Road, of 30 boxes, 5 had a red shell.
  • Prove sequentiality to yourself.

    • Analyze the data race-by-race in this order: Cloudtop Cruise, Bone-Dry Dunes, Bowser’s Castle, and Rainbow Road.

    • Analyze the data race-by-race in this order: Rainbow Road, Bowser’s Castle, Cloudtop Cruise, and Bone-Dry Dunes.

Wrap Up

  • Today we have discussed balance and sequentiality.

  • Remember that order of data inclusion does not matter – we will end up with the same posterior.

  • We have seen that prior specification “matters” but there will not be a large difference in the posterior distribution when priors are more similar to one another.

  • Next week:

    • Gamma-Poisson
    • Normal-Normal
    • What to do with the posterior distribution.

Homework / Practice

  • From the Bayes Rules! textbook:

    • 4.3
    • 4.4
    • 4.6
    • 4.9
    • 4.15
    • 4.16
    • 4.17
    • 4.18
    • 4.19