Gamma-Poisson and Normal-Normal Models

STA6349: Applied Bayesian Analysis
Spring 2025

Introduction: Gamma-Poisson

  • Recall the Beta-Binomial from our previous lecture,

    • y \sim \text{Bin}(n, \pi) (data distribution)
    • \pi \sim \text{Beta}(\alpha, \beta) (prior distribution)
    • \pi|y \sim \text{Beta}(\alpha+y, \beta+n-y) (posterior distribution)
  • Beta-Binomial is from a conjugate family (i.e., the posterior is from the same model family as the prior).

  • Today, we will learn about other conjugate families, the Gamma-Poisson and the Normal-Normal.

Poisson Data Model

  • Suppose we are now interested in modeling the number of spam calls we receive.

    • This means that we are modeling the rate, \lambda.
  • We take a guess and say that the value of \lambda that is most likely is around 5,

    • … but reasonably ranges between 2 and 7 calls per day.
  • Why can’t we use the Beta distribution as our prior distribution?

Poisson Data Model

  • Suppose we are now interested in modeling the number of spam calls we receive.

    • This means that we are modeling the rate, \lambda.
  • We take a guess and say that the value of \lambda that is most likely is around 5,

    • … but reasonably ranges between 2 and 7 calls per day.
  • Why can’t we use the Beta distribution as our prior distribution?

    • \lambda is the mean of a count \to \lambda \in \mathbb{R}^+ \to \lambda is not limited to [0, 1] \to broken assumption for Beta distribution.

Poisson Data Model

  • Suppose we are now interested in modeling the number of spam calls we receive.
    • This means that we are modeling the rate, \lambda.
  • We take a guess and say that the value of \lambda that is most likely is around 5,
    • … but reasonably ranges between 2 and 7 calls per day.
  • Why can’t we use the Beta distribution as our prior distribution?
    • \lambda is the mean of a count \to \lambda \in \mathbb{R}^+ \to \lambda is not limited to [0, 1] \to broken assumption for Beta distribution.
  • Why can’t we use the binomial distribution as our data distribution?

Poisson Data Model

  • Suppose we are now interested in modeling the number of spam calls we receive.
    • This means that we are modeling the rate, \lambda.
  • We take a guess and say that the value of \lambda that is most likely is around 5,
    • … but reasonably ranges between 2 and 7 calls per day.
  • Why can’t we use the Beta distribution as our prior distribution?
    • \lambda is the mean of a count \to \lambda \in \mathbb{R}^+ \to \lambda is not limited to [0, 1] \to broken assumption for Beta distribution.
  • Why can’t we use the binomial distribution as our data distribution?
    • Y_i is a count \to Y_i \in \mathbb{N}^+ \to Y_i is not limited to \{0, 1\} \to broken assumption for Binomial distribution.

Poisson Data Model

  • We will use the Poisson distribution to model the number of spam calls – Y \in \{0, 1, 2, ...\}.
    • Y is the number of independent events that occur in a fixed amount of time or space.
    • \lambda > 0 is the rate at which these events occur.
  • Mathematically,

Y | \lambda \sim \text{Pois}(\lambda),

  • with pmf,

f(y|\lambda) = \frac{\lambda^y e^{-\lambda}}{y!}, \ \ \ y \in \{0,1, 2, ... \}

Poisson Data Model

  • \lambda defines the mean and the variance
    • The shape of the Poisson pmf depends on \lambda.

Poisson Data Model

  • \lambda defines the mean and the variance

    • The shape of the Poisson pmf depends on \lambda.

Poisson Data Model

  • We will be taking samples from different days.
    • We assume that the daily number of calls may different from day to day.
    • On each day i,

Y_i|\lambda \sim \text{Pois}(\lambda)

  • This has a unique pmf for each day (i),

f(y_i|\lambda) = \frac{\lambda^{y_i} e^{-\lambda}}{y_i!}

  • But really, we are interested in the joint information in our sample of n observations.
    • The joint pmf gives us this information.

Poisson Data Model

  • The joint pmf for the Poisson,

\begin{align*} f\left(\overset{\to}{y_i}|\lambda\right) &= \prod_{i=1}^n f(y_i|\lambda) \\ &= f(y_1|\lambda) \times f(y_2|\lambda) \times ... \times f(y_n|\lambda) \\ &= \frac{\lambda^{y_1}e^{-\lambda}}{y_1!} \times \frac{\lambda^{y_2}e^{-\lambda}}{y_2!} \times ... \times \frac{\lambda^{y_n}e^{-\lambda}}{y_n!} \\ &= \frac{\left( \lambda^{y_1} \lambda^{y_2} \cdot \cdot \cdot \ \lambda^{y_n} \right) \left( e^{-\lambda} e^{-\lambda} \cdot \cdot \cdot e^{-\lambda}\right)}{y_1! y_2! \cdot \cdot \cdot y_n!} \\ &= \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !} \end{align*}

Poisson Data Model

  • If the joint pmf for the Poisson is

f\left(\overset{\to}{y_i}|\lambda\right) = \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !}

  • then the likelihood function for \lambda > 0 is

\begin{align*} L\left(\lambda|\overset{\to}{y_i}\right) &= \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i !} \\ & \propto \lambda^{\sum y_i} e^{-n\lambda} \end{align*}

  • Pease see page 102 in the textbook for full derivations.

Gamma Prior

  • If \lambda is a continuous random variable that can take on any positive value (\lambda > 0), then the variability may be modeled with the Gamma distribution with
    • shape hyperparameter s>0
    • rate hyperparameter r>0.
  • Thus,

\lambda \sim \text{Gamma}(s, r)

  • and the Gamma pdf is given by

f(\lambda) = \frac{r^s}{\Gamma(s)} \lambda^{s-1} e^{-r\lambda}

Gamma Prior

  • …then the variability may be modeled with the Gamma distribution with shape s>0 and rate r>0.

Gamma Prior

  • Let’s now tune our prior.

  • We are assuming \lambda \approx 5, somewhere between 2 and 7.

  • We know the mean of the gamma distribution,

E(\lambda) = \frac{s}{r} \approx 5 \to 5r \approx s

  • Your turn! Use the plot_gamma() function to figure out what value of s and r we need.

Gamma Prior

  • Looking at different values:

Gamma-Poisson Conjugacy

  • Let \lambda > 0 be an unknown rate parameter and (Y_1, Y_2, ... , Y_n) be an independent sample from the Poisson distribution.

  • The Gamma-Poisson Bayesian model is as follows:

\begin{align*} Y_i | \lambda &\overset{ind}\sim \text{Pois}(\lambda) \\ \lambda &\sim \text{Gamma}(s, r) \\ \lambda | \overset{\to}y &\sim \text{Gamma}\left( s + \sum y_i, r + n \right) \end{align*}

  • The proof can be seen in section 5.2.4 of the textbook.

Gamma-Poisson Conjugacy

  • Suppose we use Gamma(10, 2) as the prior for \lambda, the daily rate of calls.

  • On four separate days in the second week of August (i.e., independent days), we received \overset{\to}y = (6, 2, 2, 1) calls.

  • Your turn! Use the plot_poisson_likelihood() function:

plot_poisson_likelihood(y = c(6, 2, 2, 1), lambda_upper_bound = 10)
  • Notes:
    • lambda_upper_bound limits the x axis – recall that \lambda \in (0, \infty)!
    • lambda_upper_bound’s default value is 10.

Gamma-Poisson Conjugacy

plot_poisson_likelihood(y = c(6, 2, 2, 1), lambda_upper_bound = 10) + theme_bw()

Gamma-Poisson Conjugacy

  • We can see that the average is around 2.75.

  • We can also verify that –

mean(c(6, 2, 2, 1))
[1] 2.75
  • We know our prior distribution is Gamma(10, 2) and the data distribution is Poi(2.75).

  • Thus, the posterior is as follows,

\begin{align*} \lambda | \overset{\to}y &\sim \text{Gamma}\left( s + \sum y_i, r + n \right) \\ &\sim \text{Gamma}\left(10 + 11, 2 + 4 \right) \\ &\sim \text{Gamma}\left(21, 6 \right) \end{align*}

Gamma-Poisson Conjugacy

  • Your turn! Use the plot_gamma_poisson() function:
plot_gamma_poisson(shape = 10, rate = 2, sum_y = 11, n = 4) + theme_bw()

Gamma-Poisson Conjugacy

  • Your turn! Use the plot_gamma_poisson() function:
plot_gamma_poisson(shape = 10, rate = 2, sum_y = 11, n = 4) + theme_bw()

Gamma-Poisson Conjugacy

  • Your turn! What is different if we had used one of the other priors?

Gamma-Poisson Conjugacy

  • Your turn! What is different if we had used Gamma(15, 3) as our prior?

Introduction: Normal-Normal

  • So far, we have learned two conjugate families:
    • Beta-Binomial (binary outcomes)
      • y \sim \text{Bin}(n, \pi) (data distribution)
      • \pi \sim \text{Beta}(\alpha, \beta) (prior distribution)
      • \pi|y \sim \text{Beta}(\alpha+y, \beta+n-y) (posterior distribution)
    • Gamma-Poisson (count outcomes)
      • Y_i | \lambda \overset{ind}\sim \text{Pois}(\lambda) (data distribution)
      • \lambda \sim \text{Gamma}(s, r) (prior distribution)
      • \lambda | \overset{\to}y \sim \text{Gamma}\left( s + \sum y_i, r + n \right) (posterior distribution)
  • Now, we will learn about another conjugate family, the Normal-Normal, for continuous outcomes.

Introduction

  • As scientists learn more about brain health, the dangers of concussions are gaining greater attention.

  • We are interested in \mu, the average volume (cm3) of a specific part of the brain: the hippocampus.

  • Wikipedia tells us that among the general population of human adults, each half of the hippocampus has volume between 3.0 and 3.5 cm3.

    • Total hippocampal volume of both sides of the brain is between 6 and 7 cm3.
    • Let’s assume that the mean hippocampal volume among people with a history of concussions is also somewhere between 6 and 7 cm3.
  • We will take a sample of n=25 participants and update our belief.

The Normal Model

  • Let Y \in \mathbb{R} be a continuous random variable.
    • The variability in Y may be represented with a Normal model with mean parameter \mu \in \mathbb{R} and standard deviation parameter \sigma > 0.
  • The Normal model’s pdf is as follows,

f(y) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left\{ \frac{-(y-\mu)^2}{2\sigma^2} \right\}

The Normal Model

  • Use the plot_normal() function to plot the following:
    • Vary \mu:
      • N(-1, 1)
      • N(0, 1)
      • N(1, 1)
    • Vary \sigma:
      • N(0, 1)
      • N(0, 2)
      • N(0, 3)

The Normal Model

  • If we vary \mu,

The Normal Model

  • If we vary \sigma,

The Normal Model

  • Our data model is as follows,

Y_i | \mu \sim N(\mu, \sigma^2)

  • The joint pdf is as follows,

f(\overset{\to}y | \mu) = \prod_{i=1}^n f(y_i | \mu) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\}

  • Meaning the likelihood is as follows,

L(\mu|\overset{\to}y) \propto \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\} = \exp \left\{ \frac{- \sum_{i=1}^n(y_i-\mu)^2}{2\sigma^2} \right\}

The Normal Model

  • Our data model is as follows,

Y_i | \mu \sim N(\mu, \sigma^2)

  • Returning to our brain analysis, we will assume that the hippocampal volumes of our n = 25 subjects have a normal distribution with mean \mu and standard deviation \sigma.
    • Right now, we are only interested in \mu, so we assume \sigma = 0.5 cm3
    • This choice suggests that most people have hippocampal volumes within 2 \sigma = 1 cm3.

Normal Prior

  • We know that with Y_i | \mu \sim N(\mu, \sigma^2), \mu \in \mathbb{R}.
    • We think a normal prior for \mu is reasonable.
  • Thus, we assume that \mu has a normal distribution around some mean, \theta, with standard deviation, \tau.

\mu \sim N(\theta, \tau^2),

  • meaning that \mu has prior pdf

f(\mu) = \frac{1}{\sqrt{2 \pi \tau^2}} \exp \left\{ \frac{-(\mu - \theta)^2}{2 \tau^2} \right\}

Normal Prior

  • We can tune the hyperparameters \theta and \tau to reflect our understanding and uncertainty about the average hippocampal volume (\mu) among people with a history of concussions.

  • Wikipedia showed us that hippocampal volumes tend to be between 6 and 7 cm3 \to \theta=6.5.

  • When we set the standard deviation we can check the plausible range of values of \mu:

    • Follow up: why 2?

\theta \pm 2 \times \tau

  • If we assume \tau=0.4,

(6.5 \pm 2 \times 0.4) = (5.7, 7.3)

Normal Prior

  • Thus, our tuned prior is \mu \sim N(6.5, 0.4^2)

  • This range incorporates our uncertainty - it is wider than the Wikipedia range.

Normal-Normal Conjugacy

  • Let \mu \in \mathbb{R} be an unknown mean parameter and (Y_1, Y_2, ..., Y_n) be an independent N(\mu, \sigma^2) sample where \sigma is assumed to be known.

  • The Normal-Normal Bayesian model is as follows:

\begin{align*} Y_i | \mu &\overset{\text{iid}} \sim N(\mu, \sigma^2) \\ \mu &\sim N(\theta, \tau^2) \\ \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \end{align*}

Normal-Normal Conjugacy

  • Let’s think about our posterior and some implications,

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

  • What happens as n increases?

Normal-Normal Conjugacy

  • Let’s think about our posterior and some implications,

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

  • What happens as n increases?

\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}

Normal-Normal Conjugacy

  • Let’s think about our posterior and some implications,

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}

  • The posterior mean places less weight on the prior mean and more weight on the sample mean \bar{y}.

  • The posterior certainty about \mu increases and becomes more in sync with the data.

Example

  • Let us now apply this to our example.

  • We have our prior model, \mu \sim N(6.5, 0.4^2).

  • Let’s look at the football dataset in the bayesrules package.

data(football)
concussion_subjects <- football %>% 
  filter(group == "fb_concuss")
  • What is the average hippocampal volume?

Example

  • Let us now apply this to our example.

  • We have our prior model, \mu \sim N(6.5, 0.4^2).

  • Let’s look at the football dataset in the bayesrules package.

data(football)
concussion_subjects <- football %>% 
  filter(group == "fb_concuss")
  • What is the average hippocampal volume?
mean(concussion_subjects$volume)
[1] 5.7346

Example

  • We can also plot the density!
concussion_subjects %>% ggplot(aes(x = volume)) + geom_density() + theme_bw()

Example

  • Now, we can plug in the information we have (n = 25, \bar{y} = 5.735, \sigma = 0.5) into our likelihood,

L(\mu|\overset{\to}y) \propto \exp \left\{ \frac{-(5.735 - \mu)^2}{2(0.5^2/25)} \right\}

Example

  • We are now ready to put together our posterior:
    • Data distribution, Y_i | \mu \overset{\text{iid}} \sim N(\mu, \sigma^2)
    • Prior distribution, \mu \sim N(\theta, \tau^2)
    • Posterior distribution, \mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
  • Given our information (\theta=6.5, \tau=0.4, n=25, \bar{y}=5.735, \sigma=0.5), our posterior is

\begin{align*} \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \\ &\sim N\left( 6.5 \frac{0.5^2}{25 \cdot 0.4^2 + 0.5^2} + 5.735 \frac{25 \cdot 0.4^2}{25 \cdot 0.4^2 + 0.5^2}, \frac{0.4^2 \cdot 0.5^2}{25 \cdot 0.4^2 + 0.5^2} \right) \\ &\sim N(6.5 \cdot 0.0588 + 5.737 \cdot 0.9412, 0.09^2) \\ &\sim N(5.78, 0.09^2) \end{align*}

Example

  • Looking at the posterior, we can see the weights

\begin{align*} \mu | \overset{\to}y &\sim N\left( 6.5 \frac{0.5^2}{25 \cdot 0.4^2 + 0.5^2} + 5.735 \frac{25 \cdot 0.4^2}{25 \cdot 0.4^2 + 0.5^2}, \frac{0.4^2 \cdot 0.5^2}{25 \cdot 0.4^2 + 0.5^2} \right) \\ &\sim N(6.5 \cdot 0.0588 + 5.737 \cdot 0.9412, 0.009^2) \end{align*}

  • 95% on the data mean, 6% on the prior mean.

Example

  • We can plot the distribution,

Example

  • We can summarize the distribution,
summarize_normal_normal(mean = 6.5, sd = 0.4, sigma = 0.5, y_bar = 5.735, n = 25) 
      model mean mode         var         sd
1     prior 6.50 6.50 0.160000000 0.40000000
2 posterior 5.78 5.78 0.009411765 0.09701425

Homework

  • 5.3
  • 5.5
  • 5.6
  • 5.9
  • 5.10