Normal-Normal Model

Introduction: Normal-Normal Model

Before today, we have learned two conjugate families:
- Beta-Binomial (binary outcomes)
  - y \sim \text{Bin}(n, \pi) (data distribution)
  - \pi \sim \text{Beta}(\alpha, \beta) (prior distribution)
  - \pi|y \sim \text{Beta}(\alpha+y, \beta+n-y) (posterior distribution)
- Gamma-Poisson (count outcomes)
  - Y_i | \lambda \overset{ind}\sim \text{Pois}(\lambda) (data distribution)
  - \lambda \sim \text{Gamma}(s, r) (prior distribution)
  - \lambda | \overset{\to}y \sim \text{Gamma}\left( s + \sum y_i, r + n \right) (posterior distribution)
Now, we will learn about another conjugate family, the Normal-Normal, for continuous outcomes.

Example Set Up

As scientists learn more about brain health, the dangers of concussions are gaining greater attention.
We are interested in \mu, the average volume (cm³) of a specific part of the brain: the hippocampus.
Wikipedia tells us that among the general population of human adults, each half of the hippocampus has volume between 3.0 and 3.5 cm³.
- Total hippocampal volume of both sides of the brain is between 6 and 7 cm³.
- Let’s assume that the mean hippocampal volume among people with a history of concussions is also somewhere between 6 and 7 cm³.
We will take a sample of n=25 participants and update our belief.

The Normal Model

Let Y \in \mathbb{R} be a continuous random variable.
- The variability in Y may be represented with a Normal model with mean parameter \mu \in \mathbb{R} and standard deviation parameter \sigma > 0.
The Normal model’s pdf is as follows,

f(y) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left\{ \frac{-(y-\mu)^2}{2\sigma^2} \right\}

The Normal Model

If we vary \mu,

The Normal Model

If we vary \sigma,

The Normal Model

Our data model is as follows,

Y_i | \mu \sim N(\mu, \sigma^2)

The joint pdf is as follows,

f(\overset{\to}y | \mu) = \prod_{i=1}^n f(y_i | \mu) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\}

Meaning the likelihood is as follows,

L(\mu|\overset{\to}y) \propto \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left\{ \frac{-(y_i-\mu)^2}{2\sigma^2} \right\} = \exp \left\{ \frac{- \sum_{i=1}^n(y_i-\mu)^2}{2\sigma^2} \right\}

The Normal Model

Our data model is as follows,

Y_i | \mu \sim N(\mu, \sigma^2)

Returning to our brain analysis, we will assume that the hippocampal volumes of our n = 25 subjects have a normal distribution with mean \mu and standard deviation \sigma.
- Right now, we are only interested in \mu, so we assume \sigma = 0.5 cm³
- This choice suggests that most people have hippocampal volumes within 2 \sigma = 1 cm³.

Normal Prior

We know that with Y_i | \mu \sim N(\mu, \sigma^2), \mu \in \mathbb{R}.
- We think a normal prior for \mu is reasonable.
Thus, we assume that \mu has a normal distribution around some mean, \theta, with standard deviation, \tau.

\mu \sim N(\theta, \tau^2),

meaning that \mu has prior pdf

f(\mu) = \frac{1}{\sqrt{2 \pi \tau^2}} \exp \left\{ \frac{-(\mu - \theta)^2}{2 \tau^2} \right\}

Tuning the Normal Prior

We can tune the hyperparameters \theta and \tau to reflect our understanding and uncertainty about the average hippocampal volume (\mu) among people with a history of concussions.
Wikipedia showed us that hippocampal volumes tend to be between 6 and 7 cm³ \to \theta=6.5.
When we set the standard deviation we can check the plausible range of values of \mu:
- Follow up: why 2?

\theta \pm 2 \times \tau

If we assume \tau=0.4,

(6.5 \pm 2 \times 0.4) = (5.7, 7.3)

Tuning the Normal Prior

Thus, our tuned prior is \mu \sim N(6.5, 0.4^2)

This range incorporates our uncertainty - it is wider than the Wikipedia range.

Normal-Normal Conjugacy

Let \mu \in \mathbb{R} be an unknown mean parameter and (Y_1, Y_2, ..., Y_n) be an independent N(\mu, \sigma^2) sample where \sigma is assumed to be known.
The Normal-Normal Bayesian model is as follows:

\begin{align*} Y_i | \mu &\overset{\text{iid}} \sim N(\mu, \sigma^2) \\ \mu &\sim N(\theta, \tau^2) \\ \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \end{align*}

Normal-Normal Conjugacy

Let’s think about our posterior and some implications,

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

What happens as n increases?

Normal-Normal Conjugacy

Let’s think about our posterior and some implications,

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

What happens as n increases?

\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}

Normal-Normal Conjugacy

Let’s think about our posterior and some implications,

\begin{align*} \frac{\sigma^2}{n\tau^2 + \sigma^2} &\to 0 \\ \frac{n\tau^2}{n\tau^2 + \sigma^2} &\to 1 \\ \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} &\to 0 \end{align*}

The posterior mean places less weight on the prior mean and more weight on the sample mean \bar{y}.
The posterior certainty about \mu increases and becomes more in sync with the data.

The Normal Posterior Model

Let us now apply this to our example.
We have our prior model, \mu \sim N(6.5, 0.4^2).
Let’s look at the football dataset in the bayesrules package.

data(football)
concussion_subjects <- football %>% 
  filter(group == "fb_concuss")

What is the average hippocampal volume?

The Normal Posterior Model

Let us now apply this to our example.
We have our prior model, \mu \sim N(6.5, 0.4^2).
Let’s look at the football dataset in the bayesrules package.

data(football)
concussion_subjects <- football %>% 
  filter(group == "fb_concuss")

What is the average hippocampal volume?

mean(concussion_subjects$volume)

[1] 5.7346

The Normal Posterior Model

We can also plot the density!

concussion_subjects %>% ggplot(aes(x = volume)) + geom_density() + theme_bw()

The Normal Posterior Model

Now, we can plug in the information we have (n = 25, \bar{y} = 5.735, \sigma = 0.5) into our likelihood,

L(\mu|\overset{\to}y) \propto \exp \left\{ \frac{-(5.735 - \mu)^2}{2(0.5^2/25)} \right\}

The Normal Posterior Model

We are now ready to put together our posterior:
- Data distribution, Y_i | \mu \overset{\text{iid}} \sim N(\mu, \sigma^2)
- Prior distribution, \mu \sim N(\theta, \tau^2)
- Posterior distribution, \mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)
Given our information (\theta=6.5, \tau=0.4, n=25, \bar{y}=5.735, \sigma=0.5), our posterior is

\mu | \overset{\to}y \sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right)

The Normal Posterior Model

Given our information (\theta=6.5, \tau=0.4, n=25, \bar{y}=5.735, \sigma=0.5), our posterior is

\begin{align*} \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \\ &\sim N\left( 6.5 \frac{0.5^2}{25 \cdot 0.4^2 + 0.5^2} + 5.735 \frac{25 \cdot 0.4^2}{25 \cdot 0.4^2 + 0.5^2}, \frac{0.4^2 \cdot 0.5^2}{25 \cdot 0.4^2 + 0.5^2} \right) \\ &\sim N(6.5 \cdot 0.0588 + 5.737 \cdot 0.9412, 0.09^2) \\ &\sim N(5.78, 0.09^2) \end{align*}

Looking at the posterior, we can see the weights
- 95% on the data mean, 6% on the prior mean.

The Normal Posterior Model

Looking at just the prior and data distributions,

The Normal Posterior Model

Now including the posterior,

The Normal Posterior Model

We can use the summarize_normal_normal() function to summarize the distribution,

summarize_normal_normal(mean = 6.5, sd = 0.4, sigma = 0.5, y_bar = 5.735, n = 25)

Wrap Up: Normal-Normal Model

We have built the Normal-Normal model for \mu, an unknown mean.

\begin{equation*} \begin{aligned} Y_i | \mu &\overset{\text{iid}} \sim N(\mu, \sigma^2) \\ \mu &\sim N(\theta, \tau^2) & \end{aligned} \Rightarrow \begin{aligned} && \mu | \overset{\to}y &\sim N\left( \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}, \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2} \right) \\ \end{aligned} \end{equation*}

The prior model, f(\mu), is given by N(\theta,\tau^2).
The data model, f(Y|\mu), is given by N(\mu, \sigma^2).
The posterior model is a Normal distribution with updated parameters
- mean = \theta \frac{\sigma^2}{n\tau^2 + \sigma^2} + \bar{y} \frac{n\tau^2}{n\tau^2 + \sigma^2}
- variance = \frac{\tau^2 \sigma^2}{n \tau^2 + \sigma^2}

Wrap Up

This week we have learned the other two conjugate families.
- Gamma-Poisson: count outcomes
- Normal-Normal: continuous outcomes
While we are not forced to analyze our data using conjugate families, our lives are much easier when we can use the known relationships.
Now that we know how to specify the posterior distributions, we can focus on moving forward with drawing conclusions about the posterior distribution.
- Probabilities
- Inference

Homework / Practice

From the Bayes Rules! textbook:
- 5.9
- 5.10