STA6349: Applied Bayesian Analysis
On Monday, we talked about the Beta-Binomial model for binary outcomes with an unknown probability of success, \pi.
We will now discuss sequentality in Bayesian analyses.
Working example:
In Alison Bechdel’s 1985 comic strip The Rule, a character states that they only see a movie if it satisfies the following three rules (Bechdel 1986):
These criteria constitute the Bechdel test for the representation of women in film.
Thinking of movies you’ve watched, what percentage of all recent movies do you think pass the Bechdel test? Is it closer to 10%, 50%, 80%, or 100%?
Let \pi, a random value between 0 and 1, denote the unknown proportion of recent movies that pass the Bechdel test.
Three friends - the feminist, the clueless, and the optimist - have some prior ideas about \pi.
Reflecting upon movies that he has seen in the past, the feminist understands that the majority lack strong women characters.
The clueless doesn’t really recall the movies they’ve seen, and so are unsure whether passing the Bechdel test is common or uncommon.
Lastly, the optimist thinks that the Bechdel test is a really low bar for the representation of women in film, and thus assumes almost all movies pass the test.
Ultimately, the three friends have three different prior models of \pi.
Which one is which?
Reflecting upon movies that he has seen in the past, the feminist understands that the majority lack strong women characters.
The clueless doesn’t really recall the movies they’ve seen, and so are unsure whether passing the Bechdel test is common or uncommon.
Lastly, the optimist thinks that the Bechdel test is a really low bar for the representation of women in film, and thus assumes almost all movies pass the test.
The analysts agree to review a sample of n recent movies and record Y, the number that pass the Bechdel test.
Because the outcome is yes/no, the binomial distribution is appropriate for the data distribution.
We aren’t sure what the population proportion, \pi, is, so we will not restrict it to a fixed value.
\begin{align*} Y|\pi &\sim \text{Bin}(n, \pi) \\ \pi &\sim \text{Beta}(\alpha, \beta) \end{align*}
\pi | (Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)
Wait!!
Everyone gets their own prior?
… is there a “correct” prior?
…… is the Bayesian world always this subjective?
Wait!!
Everyone gets their own prior?
… is there a “correct” prior?
…… is the Bayesian world always this subjective?
More clearly defined questions that we can actually answer:
To what extent might different priors lead the analysts to three different posterior conclusions about the Bechdel test?
To what extent will the analysts’ posterior understandings evolve as they collect more and more data?
Will they ever come to agreement about the representation of women in film?!
The differing levels of prior variability show that the analysts have different degrees of certainty in their prior information.
Vague or diffuse prior: reflects little specific information about the unknown variable.
A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.
This is effectively saying “🤷.”
Okay, great - we have different priors.
We have data from FiveThirtyEight, reporting results of the Bechdel test.
Questions to think about:
Analyst | Prior | Posterior |
---|---|---|
the feminist | Beta(5, 11) | Beta(14, 22) |
the clueless | Beta(1, 1) | Beta(10, 12) |
the optimist | Beta(14, 1) | Beta(23, 12) |
In addition to priors affecting our posterior distributions… the data also affects it.
Let’s now consider three new analysts: they all share the optimistic Beta(14, 1) for \pi, however, they have access to different data.
Morteza reviews n = 13 movies from the year 1991, among which Y=6 (about 46%) pass the Bechdel.
Nadide reviews n = 63 movies from the year 2001, among which Y=29 (about 46%) pass the Bechdel.
Ursula reviews n = 99 movies from the year 2013, among which Y=46 (about 46%) pass the Bechdel.
How will the different data affect the posterior distributions?
Find the posterior distributions. (i.e., What are the updated parameters?)
Find the posterior distributions. (i.e., What are the updated parameters?)
Analyst | Data | Posterior |
---|---|---|
Morteza | Y=6 of n=13 | Beta(20, 8) |
Nadide | Y=29 of n=63 | Beta(45, 35) |
Ursula | Y=46 of n=99 | Beta(60, 54) |
What did we observe?
As n \to \infty, variance in the likelihood \to 0.
In Morteza’s small sample of 13 movies, the likelihood function is wide.
In Ursula’s larger sample size of 99 movies, the likelihood function is narrower.
We see that the narrower the likelihood, the more influence the data holds over the posterior.
The posterior can either favor the data or the prior.
Left to right on the graph, the sample size increases from n=13 to n=99 movies, while preserving the proportion that pass (\approx 0.46).
The likelihood’s insistence and the data’s influence over the posterior increase with sample size.
This also means that the influence of our prior understanding diminishes as we gather new data.
Top to bottom on the graph, priors move from informative (Beta(14,1)) to vague (Beta(1,1)).
Work with your breakout room to further explore different priors and data.
When we reconvene, a representative from each group will present findings.
Today we have covered the first half of Chapter 4.
We discussed how the prior and likelihood affect the posterior.
When we come back next week, we will discuss (and practice) sequential analysis.
4.3
4.4
4.6
4.9