July 10, 2025
Thursday
On Tuesday, we learned how to think like a Bayesian.
Today, we will formalize the model we muddled through last time.
This is called the Beta-Binomial model.
Conjugate family: When the prior and posterior are the same named distribution, but different parameters.
Past polls provide prior information about \pi, the proportion of Floridians that currently support Michelle.
In a previous problem, we assumed that \pi could only be 0.2, 0.5, or 0.8, the corresponding chances of which were defined by a discrete probability model.
We can reflect this reality and conduct a Bayesian analysis by constructing a continuous prior probability model of \pi.
A reasonable prior is represented by the curve on the right.
In building the Bayesian election model of Michelle’s election support among Floridians, \pi, we begin with the prior.
What values can \pi take and which are more plausible than others?
Let \pi be a random variable, where \pi \in [0, 1].
The variability in \pi may be captured by a Beta model with shape hyperparameters \alpha > 0 and \beta > 0,
\pi \sim \text{Beta}(\alpha, \beta),
Your turn!
How would you describe the typical behavior of a:
For which model is there greater variability in the plausible values of \pi, Beta(20, 20) or Beta(5, 5)?
We can tune the shape hyperparameters (\alpha and \beta) to reflect our prior information about Michelle’s election support, \pi.
In our example, we saw that she polled between 25 and 65 percentage points, with an average of 45 percentage points.
E[\pi] = \frac{\alpha}{\alpha+\beta} \approx 0.45
\alpha \approx \frac{9}{11} \beta
Your turn!
\pi \sim \text{Beta}(45, 55)
\begin{equation*} \begin{aligned} E[\pi] &= \frac{\alpha}{\alpha + \beta} & \text{ and } & \text{ } & \text{ } \\ &=\frac{45}{45+55} \\ &= 0.45 \end{aligned} \begin{aligned} \text{var}[\pi] &= \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \\ &= \frac{(45)(55)}{(45+55)^2(45+55+1)} \\ &= 0.0025 \end{aligned} \end{equation*}
f(y|\pi) = P[Y = y|\pi] = {50 \choose y} \pi^y (1-\pi)^{50-y}
The conditional pmf, f(y|\pi), gives us answers to a hypothetical question:
Let’s look at this graphically:
It is observed that Y=30 of the n=50 polled voters support Michelle.
We now want to find the likelihood function – remember that we treat Y=30 as the observed data and \pi as unknown,
\begin{align*} f(y|\pi) &= {50 \choose y} \pi^y (1-\pi)^{50-y} \\ L(\pi|y=30) &= {50 \choose 30} \pi^{30} (1-\pi)^{20} \end{align*}
Challenge!
Create a graph showing what happens to the likelihood for different values of \pi.
To get you started,
We can see that the posterior model of \pi is continuous and \in [0, 1].
The shape of the posterior appears to also have a Beta(\alpha, \beta) model.
If we were to collect more information about Michelle’s support, we would use the current posterior as the new prior, then update our posterior.
We used Michelle’s election support to understand the Beta-Binomial model.
Let’s now generalize it for any appropriate situation.
\begin{align*} Y|\pi &\sim \text{Bin}(n, \pi) \\ \pi &\sim \text{Beta}(\alpha, \beta) \\ \pi | (Y=y) &\sim \text{Beta}(\alpha+y, \beta+n-y) \end{align*}
\pi | (Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)
\begin{align*} E[\pi | Y = y] &= \frac{\alpha + y}{\alpha + \beta + n} \\ \text{Var}[\pi|Y=y] &= \frac{(\alpha+y)(\beta+n-y)}{(\alpha+\beta+n)^2(\alpha+\beta+1)} \end{align*}
Let’s pause and think about this from a theoretical standpoint.
The Beta distribution is a conjugate prior for the likelihood.
Recall the Beta prior, f(\pi),
L(\pi|y) = {n \choose y} \pi^y (1-\pi)^{n-y}
f(\pi) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1}(1-\alpha)^{\beta-1}
\begin{align*} f(\pi|y) &\propto f(\pi)L(\pi|y) \\ &= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1}(1-\pi)^{\beta-1} \times {n \choose y} \pi^y (1-\pi)^{n-1} \\ &\propto \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1} \end{align*}
f(\pi|y) = \frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+y) \Gamma(\beta+n-y)} \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}
\begin{equation*} \begin{aligned} Y|\pi &\sim \text{Bin}(n,\pi) \\ \pi &\sim \text{Beta}(\alpha,\beta) & \end{aligned} \Rightarrow \begin{aligned} && \pi | (Y=y) &\sim \text{Beta}(\alpha+y, \beta+n-y) \\ \end{aligned} \end{equation*}
The prior model, f(\pi), is given by Beta(\alpha,\beta).
The data model, f(Y|\pi), is given by Bin(n,\pi).
The likelihood function, L(\pi|y), is obtained by plugging y into the Binomial pmf.
The posterior model is a Beta distribution with updated parameters \alpha+y and \beta+n-y.
Recall the Beta-Binomial model,
The Beta-Binomial model is from a conjugate family (i.e., the posterior is from the same model family as the prior).
Now, we will learn about the Gamma-Poisson, another conjugate family.
Suppose we are now interested in modeling the number of spam calls we receive.
We take a guess and say that the value of \lambda that is most likely is around 5,
Why can’t we use the Beta distribution as our prior distribution?
Why can’t we use the binomial distribution as our data distribution?
Y \in \{0, 1, 2, ...\}
Y is the number of independent events that occur in a fixed amount of time or space.
\lambda > 0 is the rate at which these events occur.
Mathematically,
Y | \lambda \sim \text{Pois}(\lambda),
f(y|\lambda) = \frac{\lambda^y e^{-\lambda}}{y!}, \ \ \ y \in \{0,1, 2, ... \}
\lambda \sim \text{Gamma}(s, r)
f(\lambda) = \frac{r^s}{\Gamma(s)} \lambda^{s-1} e^{-r\lambda}
Let’s now tune our prior.
We are assuming \lambda \approx 5, somewhere between 2 and 7.
We know the mean of the gamma distribution,
E(\lambda) = \frac{s}{r} \approx 5 \to 5r \approx s
plot_gamma()
function to figure out what value of s and r we need.Y_i|\lambda \sim \text{Pois}(\lambda)
f(y_i|\lambda) = \frac{\lambda^{y_i} e^{-\lambda}}{y_i!}
\begin{align*} f\left(\overset{\to}{y_i}|\lambda\right) &= \prod_{i=1}^n f(y_i|\lambda) \\ &= f(y_1|\lambda) \times f(y_2|\lambda) \times ... \times f(y_n|\lambda) \\ &= \frac{\lambda^{y_1}e^{-\lambda}}{y_1!} \times \frac{\lambda^{y_2}e^{-\lambda}}{y_2!} \times ... \times \frac{\lambda^{y_n}e^{-\lambda}}{y_n!} \\ &= \frac{\left( \lambda^{y_1} \lambda^{y_2} \cdot \cdot \cdot \ \lambda^{y_n} \right) \left( e^{-\lambda} e^{-\lambda} \cdot \cdot \cdot e^{-\lambda}\right)}{y_1! y_2! \cdot \cdot \cdot y_n!} \\ &= \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !} \end{align*}
f\left(\overset{\to}{y_i}|\lambda\right) = \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !}
\begin{align*} L\left(\lambda|\overset{\to}{y_i}\right) &= \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !} \\ & \propto \lambda^{\sum y_i} e^{-n\lambda} \end{align*}
Let \lambda > 0 be an unknown rate parameter and (Y_1, Y_2, ... , Y_n) be an independent sample from the Poisson distribution.
The Gamma-Poisson Bayesian model is as follows:
\begin{align*} Y_i | \lambda &\overset{ind}\sim \text{Pois}(\lambda) \\ \lambda &\sim \text{Gamma}(s, r) \\ \lambda | \overset{\to}y &\sim \text{Gamma}\left( s + \sum y_i, r + n \right) \end{align*}
Suppose we use Gamma(10, 2) as the prior for \lambda, the daily rate of calls.
On four separate days in the second week of August (i.e., independent days), we received \overset{\to}y = (6, 2, 2, 1) calls.
We will use the plot_poisson_likelihood()
function:
lambda_upper_bound
limits the x axis – recall that \lambda \in (0, \infty)!lambda_upper_bound
’s default value is 10.We know our prior distribution is Gamma(10, 2) and the data distribution is Poi(2.75).
Thus, the posterior is as follows,
\begin{align*} \lambda | \overset{\to}y &\sim \text{Gamma}\left( s + \sum y_i, r + n \right) \\ &\sim \text{Gamma}\left(10 + 11, 2 + 4 \right) \\ &\sim \text{Gamma}\left(21, 6 \right) \end{align*}
The shape of the posterior has a Gamma(s, r) model.
plot_gamma_poisson()
function:Your turn! What is different if we had used one of the other priors?
Recall, we considered
summarize_gamma_poisson()
function to summarize the distribution,\begin{equation*} \begin{aligned} Y|\lambda &\sim \text{Poi}(\lambda) \\ \lambda &\sim \text{Gamma}(s, r) & \end{aligned} \Rightarrow \begin{aligned} && \lambda | y &\sim \text{Gamma}(s + \sum y_i, r + n) \\ \end{aligned} \end{equation*}
The prior model, f(\lambda), is given by Gamma(s, r).
The data model, f(Y|\lambda), is given by Poi(\lambda).
The posterior model is a Gamma distribution with updated parameters s+\sum y_i and r + n.
As scientists learn more about brain health, the dangers of concussions are gaining greater attention.
We are interested in \mu, the average volume (cm3) of a specific part of the brain: the hippocampus.
Wikipedia tells us that among the general population of human adults, each half of the hippocampus has volume between 3.0 and 3.5 cm3.
We will take a sample of n=25 participants and update our belief.
f(y) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left\{ \frac{-(y-\mu)^2}{2\sigma^2} \right\}
Y_i | \mu \sim N(\mu, \sigma^2)
f(\overset{\to}y | \mu) = \prod_{i=1}^n f(y_i | \mu) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\}
L(\mu|\overset{\to}y) \propto \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\} = \exp \left\{ \frac{- \sum_{i=1}^n(y_i-\mu)^2}{2\sigma^2} \right\}
Y_i | \mu \sim N(\mu, \sigma^2)
\mu \sim N(\theta, \tau^2),
f(\mu) = \frac{1}{\sqrt{2 \pi \tau^2}} \exp \left\{ \frac{-(\mu - \theta)^2}{2 \tau^2} \right\}
We can tune the hyperparameters \theta and \tau to reflect our understanding and uncertainty about the average hippocampal volume (\mu) among people with a history of concussions.
Wikipedia showed us that hippocampal volumes tend to be between 6 and 7 cm3 \to \theta=6.5.
When we set the standard deviation we can check the plausible range of values of \mu:
\theta \pm 2 \times \tau
(6.5 \pm 2 \times 0.4) = (5.7, 7.3)
Let \mu \in \mathbb{R} be an unknown mean parameter and (Y_1, Y_2, ..., Y_n) be an independent N(\mu, \sigma^2) sample where \sigma is assumed to be known.
The Normal-Normal Bayesian model is as follows:
\begin{align*} Y_i | \mu &\overset{\text{iid}} \sim N(\mu, \sigma^2) \\ \mu &\sim N(\theta, \tau^2) \\ \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \end{align*}
\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}
\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}
The posterior mean places less weight on the prior mean and more weight on the sample mean \bar{y}.
The posterior certainty about \mu increases and becomes more in sync with the data.
Let us now apply this to our example.
We have our prior model, \mu \sim N(6.5, 0.4^2).
Let’s look at the football
dataset in the bayesrules
package.
Let us now apply this to our example.
We have our prior model, \mu \sim N(6.5, 0.4^2).
Let’s look at the football
dataset in the bayesrules
package.
L(\mu|\overset{\to}y) \propto \exp \left\{ \frac{-(5.735 - \mu)^2}{2(0.5^2/25)} \right\}
\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
\begin{align*} \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \\ &\sim N\left( 6.5 \frac{0.5^2}{25 \cdot 0.4^2 + 0.5^2} + 5.735 \frac{25 \cdot 0.4^2}{25 \cdot 0.4^2 + 0.5^2}, \frac{0.4^2 \cdot 0.5^2}{25 \cdot 0.4^2 + 0.5^2} \right) \\ &\sim N(6.5 \cdot 0.0588 + 5.737 \cdot 0.9412, 0.09^2) \\ &\sim N(5.78, 0.09^2) \end{align*}
summarize_normal_normal()
function to summarize the distribution,\begin{equation*} \begin{aligned} Y_i | \mu &\overset{\text{iid}} \sim N(\mu, \sigma^2) \\ \mu &\sim N(\theta, \tau^2) & \end{aligned} \Rightarrow \begin{aligned} && \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \\ \end{aligned} \end{equation*}
The prior model, f(\mu), is given by N(\theta,\tau^2).
The data model, f(Y|\mu), is given by N(\mu, \sigma^2).
The posterior model is a Normal distribution with updated parameters
Today we have learned about the conjugate families.
While we are not forced to analyze our data using conjugate families, our lives are much easier when we can use the known relationships.
Note that while we did not do this with our posteriors in this lecture, we can now move forward with drawing conclusions about the posterior distribution.
STA6349 - Applied Bayesian Analysis - Summer 2025