Beta-Binomial Model

Introduction: Beta-Binomial Model

  • Last week, we learned how to think like a Bayesian.
    • Today, we will formalize the model we muddled through last time.
  • This is called the Beta-Binomial model.
    • The Beta distribution is the prior.
    • The Binomial distribution is the data distribution (or the likeihood).
    • The posterior also follows a Beta distribution.
  • Conjugate family: When the prior and posterior are the same named distribution, but different parameters.

Example Set Up

  • Consider the following scenario.
    • “Michelle” has decided to run for president and you’re her campaign manager for the state of Florida.
    • As such, you’ve conducted 30 different polls throughout the election season.
    • Though Michelle’s support has hovered around 45%, she polled at around 35% in the dreariest days and around 55% in the best days on the campaign trail.

Example Set Up

  • Past polls provide prior information about \pi, the proportion of Floridians that currently support Michelle.

    • In fact, we can reorganize this information into a formal prior probability model of \pi.
  • In a previous problem, we assumed that \pi could only be 0.2, 0.5, or 0.8, the corresponding chances of which were defined by a discrete probability model.

    • However, in the reality of Michelle’s election support, \pi \in [0, 1].
  • We can reflect this reality and conduct a Bayesian analysis by constructing a continuous prior probability model of \pi.

Example Set Up

  • A reasonable prior is represented by the curve on the right.

    • Notice that this curve preserves the overall information and variability in the past polls, i.e., Michelle’s support, \pi can be anywhere between 0 and 1, but is most likely around 0.45.

Example Set Up

  • Incorporating this more nuanced, continuous view of Michelle’s support, \pi, will require some new tools.
    • No matter if our parameter \pi is continuous or discrete, the posterior model of \pi will combine insights from the prior and data.
    • \pi isn’t the only variable of interest that lives on [0,1].
  • Maybe we’re interested in modeling the proportion of people that use public transit, the proportion of trains that are delayed, the proportion of people that prefer cats to dogs, etc.
    • The Beta-Binomial model provides the tools we need to study the proportion of interest, \pi, in each of these settings.

Beta Prior

  • In building the Bayesian election model of Michelle’s election support among Floridians, \pi, we begin with the prior.

    • Our continuous prior probability model of \pi is specified by the probability density function (pdf).
  • What values can \pi take and which are more plausible than others?

Beta Prior

  • Let \pi be a random variable, where \pi \in [0, 1].

  • The variability in \pi may be captured by a Beta model with shape hyperparameters \alpha > 0 and \beta > 0,

    • hyperparameter: a parameter used in a prior model.

\pi \sim \text{Beta}(\alpha, \beta),

Beta Prior: Shapes

  • Let’s explore the shape of the Beta:
plot_beta(1, 5) + theme_bw() + ggtitle("Beta(1, 5)")

Beta Prior: Shapes

  • Let’s explore the shape of the Beta:
plot_beta(1, 2) + theme_bw() + ggtitle("Beta(1, 2)")

Beta Prior: Shapes

  • Let’s explore the shape of the Beta:
plot_beta(3, 7) + theme_bw() + ggtitle("Beta(3, 7)")

Beta Prior: Shapes

  • Let’s explore the shape of the Beta:
plot_beta(1, 1) + theme_bw() + ggtitle("Beta(1, 1)")

Beta Prior: Shapes

  • Your turn!

  • How would you describe the typical behavior of a:

    • Beta(\alpha, \beta) variable, \pi, when \alpha=\beta?
    • Beta(\alpha, \beta) variable, \pi, when \alpha>\beta?
    • Beta(\alpha, \beta) variable, \pi, when \alpha<\beta?
  • For which model is there greater variability in the plausible values of \pi, Beta(20, 20) or Beta(5, 5)?

Beta Prior: Shapes

  • How would you describe the typical behavior of a Beta(\alpha, \beta) variable, \pi, when \alpha=\beta?

Beta Prior: Shapes

  • How would you describe the typical behavior of a Beta(\alpha, \beta) variable, \pi, when \alpha>\beta?

Beta Prior: Shapes

  • How would you describe the typical behavior of a Beta(\alpha, \beta) variable, \pi, when \alpha<\beta?

Beta Prior: Shapes

  • For which model is there greater variability in the plausible values of \pi, Beta(20, 20) or Beta(5, 5)?

Tuning the Beta Prior

  • We can tune the shape hyperparameters (\alpha and \beta) to reflect our prior information about Michelle’s election support, \pi.

  • In our example, we saw that she polled between 25 and 65 percentage points, with an average of 45 percentage points.

    • We want our Beta(\alpha, \beta) to have similar patterns, so we should pick \alpha and \beta such that \pi is around 0.45.

E[\pi] = \frac{\alpha}{\alpha+\beta} \approx 0.45

  • Using algebra, we can tune, and find

\alpha \approx \frac{9}{11} \beta

Tuning the Beta Prior

  • Your turn!

    • Graph the following and determine which is best for the example.
plot_beta(9, 11) + theme_bw()
plot_beta(27, 33) + theme_bw()
plot_beta(45, 55) + theme_bw()
  • Recall, this is what we are going for:

Tuning the Beta Prior

plot_beta(9, 11) + theme_bw() + ggtitle("Beta(9, 11)")

Tuning the Beta Prior

plot_beta(27, 33) + theme_bw() + ggtitle("Beta(27, 33)")

Tuning the Beta Prior

plot_beta(45, 55) + theme_bw() + ggtitle("Beta(45, 55)")

Tuning the Beta Prior

  • Now that we have a prior, we “know” some things.

\pi \sim \text{Beta}(45, 55)

  • From the properties of the beta distribution,

\begin{equation*} \begin{aligned} E[\pi] &= \frac{\alpha}{\alpha + \beta} & \text{ and } & \text{ } & \text{ } \\ &=\frac{45}{45+55} \\ &= 0.45 \end{aligned} \begin{aligned} \text{var}[\pi] &= \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \\ &= \frac{(45)(55)}{(45+55)^2(45+55+1)} \\ &= 0.0025 \end{aligned} \end{equation*}

Binomial Data Model

  • A new poll of n = 50 Floridians recorded Y, the number that support Michelle.
    • The results depend upon \pi (as \pi increases, Y tends to increase).
  • To model the dependence of Y on \pi, we assume
    • voters answer the poll independently of one another;
    • the probability that any polled voter supports your candidate Michelle is \pi
  • This is a binomial event, Y|\pi \sim \text{Bin}(50, \pi), with conditional pmf, f(y|\pi) defined for y \in \{0, 1, ..., 50\}

f(y|\pi) = P[Y = y|\pi] = {50 \choose y} \pi^y (1-\pi)^{50-y}

Binomial Data Model

  • The conditional pmf, f(y|\pi), gives us answers to a hypothetical question:

    • If Michelle’s support were given some value of \pi, then how many of the 50 polled voters (Y=y) might we expect to suppport her?
  • Let’s look at this graphically:

binom_prob <- tibble(n_success = 1:sample_size,
                     prob = dbinom(n_success, size=sample_size, prob=pi_value))

binom_prob %>%
  ggplot(aes(x=n_success,y=prob))+
  geom_col(width=0.2)+
  labs(x= "Number of Successes",
       y= "Probability") +
  theme_bw()

Binomial Data Model

Binomial Data Model

  • It is observed that Y=30 of the n=50 polled voters support Michelle.

  • We now want to find the likelihood function – remember that we treat Y=30 as the observed data and \pi as unknown,

\begin{align*} f(y|\pi) &= {50 \choose y} \pi^y (1-\pi)^{50-y} \\ L(\pi|y=30) &= {50 \choose 30} \pi^{30} (1-\pi)^{20} \end{align*}

  • This is valid for \pi \in [0, 1].

Binomial Data Model

  • What is the likelihood of 30/50 voters supporting Michelle?
dbinom(30, 50, pi)
  • You try this for \pi = \{0.25, 0.50, 0.75\}.
dbinom(30, 50, 0.25)
dbinom(30, 50, 0.5)
dbinom(30, 50, 0.75)

Binomial Data Model

  • What is the likelihood of 30/50 voters supporting Michelle?
dbinom(30, 50, 0.25)
[1] 1.29633e-07
dbinom(30, 50, 0.5)
[1] 0.04185915
dbinom(30, 50, 0.75)
[1] 0.007654701

Binomial Data Model

  • Challenge!

  • Create a graph showing what happens to the likelihood for different values of \pi.

    • i.e., have \pi on the x-axis and likelihood on the y-axis.
  • To get you started,

graph <- tibble(pi = seq(0, 1, 0.001)) %>%
  mutate(likelihood = dbinom(30, 50, pi))

Binomial Data Model

  • Create a graph showing what happens to the likelihood for different values of \pi.

  • Where is the maximum?

Binomial Data Model

  • Where is the maximum?
Error in `geom_text()`:
! Problem while setting up geom aesthetics.
ℹ Error occurred in the 3rd layer.
Caused by error in `list_sizes()`:
! `x$label` must be a vector, not a <latexexpression/expression> object.

The Beta Posterior Model

  • Looking at just the prior and the data distributions,

  • The prior is a bit more pessimistic about Michelle’s election support than the data obtained from the latest poll.

The Beta Posterior Model

  • Now including the posterior,

  • We can see that the posterior model of \pi is continuous and \in [0, 1].

  • The shape of the posterior appears to also have a Beta(\alpha, \beta) model.

    • The shape parameters (\alpha and \beta) have been updated.

The Beta Posterior Model

  • If we were to collect more information about Michelle’s support, we would use the current posterior as the new prior, then update our posterior.

    • How do we know what the updated parameters are?
summarize_beta_binomial(alpha = 45, beta = 55, y = 30, n = 50)

The Beta Posterior Model

  • We used Michelle’s election support to understand the Beta-Binomial model.

  • Let’s now generalize it for any appropriate situation.

\begin{align*} Y|\pi &\sim \text{Bin}(n, \pi) \\ \pi &\sim \text{Beta}(\alpha, \beta) \\ \pi | (Y=y) &\sim \text{Beta}(\alpha+y, \beta+n-y) \end{align*}

  • We can see that the posterior distribution reveals the influence of the prior (\alpha and \beta) and data (y and n).

The Beta Posterior Model

  • Under this updated distribution,

\pi | (Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)

  • we have updated moments:

\begin{align*} E[\pi | Y = y] &= \frac{\alpha + y}{\alpha + \beta + n} \\ \text{Var}[\pi|Y=y] &= \frac{(\alpha+y)(\beta+n-y)}{(\alpha+\beta+n)^2(\alpha+\beta+1)} \end{align*}

The Beta Posterior Model

  • Let’s pause and think about this from a theoretical standpoint.

  • The Beta distribution is a conjugate prior for the likelihood.

    • Conjugate prior: the posterior is from the same model family as the prior.
  • Recall the Beta prior, f(\pi),

L(\pi|y) = {n \choose y} \pi^y (1-\pi)^{n-y}

  • and the likelihood function, L(\pi|y).

f(\pi) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1}(1-\alpha)^{\beta-1}

The Beta Posterior Model

  • We can put the prior and likelihood together to create the posterior,

\begin{align*} f(\pi|y) &\propto f(\pi)L(\pi|y) \\ &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1}(1-\pi)^{\beta-1} \times {n \choose y} \pi^y (1-\pi)^{n-1} \\ &\propto \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1} \end{align*}

  • This is the same structure as the normalized Beta(\alpha+y, \beta+n-y),

f(\pi|y) = \frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+y) \Gamma(\beta+n-y)} \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}

Beta-Binomial: Example

  • In Mario Kart 8 Deluxe, item boxes give different items depending on race position. To reduce the “position bias,” only item boxes opened while the racer was in mid-pack (positions 4–10) were recorded.

  • You want to estimate the probability that an item box yields a Red Shell. When playing the Special Cup, only 31 red shells were seen in 114 boxes opened by mid-pack racers.

  • Find the posterior distribution under two priors:

    1. Flat/uninformative prior, Beta(1,1).
    2. Beta(\alpha, \beta) of your choosing.

Wrap Up: Beta-Binomial Model

  • We have built the Beta-Binomial model for \pi, an unknown proportion.

\begin{equation*} \begin{aligned} Y|\pi &\sim \text{Bin}(n,\pi) \\ \pi &\sim \text{Beta}(\alpha,\beta) & \end{aligned} \Rightarrow \begin{aligned} && \pi | (Y=y) &\sim \text{Beta}(\alpha+y, \beta+n-y) \\ \end{aligned} \end{equation*}

  • The prior model, f(\pi), is given by Beta(\alpha,\beta).

  • The data model, f(Y|\pi), is given by Bin(n,\pi).

  • The likelihood function, L(\pi|y), is obtained by plugging y into the Binomial pmf.

  • The posterior model is a Beta distribution with updated parameters \alpha+y and \beta+n-y.

Homework / Practice