Regression in the Bayesian Framework

July 22, 2025
Tuesday

Introduction

  • Before today, we effectively were performing one-sample tests of means, proportions, and counts.

  • Now, we will focus on incorporating just a single predictor into our analysis.

    • Overall question: what is the relationship between Y (outcome) and X (predictor)?
  • We now switch to thinking about analysis in terms of regression.

Working Example

  • Capital Bikeshare is a bike sharing service in the Washington, D.C. area. To best serve its registered members, the company must understand the demand for its service. We will analyze the number of rides taken on a random sample of n days, (Y_1, Y_2, ..., Y_n).

  • Beacuse Y_i is a count variable, you might assume that ridership might need the Poisson distribution. However, past bike riding seasons have exhibited bell-shaped daily ridership with a variability in ridership that far exceeds the typical ridership.

    • (i.e., the Poisson assumption of \mu = \sigma does not hold here)
  • We will instead assume that the number of rides varies normally around some typical ridership, \mu, with standard deviation, \sigma.

Y_i|\mu,\sigma \overset{\text{ind}}{\sim} N(\mu, \sigma^2)

Working Example

  • In our example, Y is the number of rides and X is the temperature.

  • Our specific goal will be to model the relationship between ridership and temperature:

    • Does ridership tend to increase on warmer days?
    • If so, by how much?
    • How strong is this relationship?
  • It is reasonable to assume a positive relationship between temperature and number of rides.

    • As it warms up outside, folks are more likely to puruse outdoor activities, including biking.
  • For learning purposes, let’s focus on the model with a single predictor (y ~ x in R).

Building the Regression Model

  • Suppose we have n data pairs,

\{ (Y_1, X_1), (Y_2, X_2), ..., (Y_n, X_n) \}

  • where Y_i is the number of rides and X_i is the high temperature (oF) on day i.

  • Assuming that the relationship is linear, we can model

\mu_i = \beta_0 + \beta_1 X_i,

  • where \beta_0 and \beta_1 are model coefficients.

Building the Regression Model

  • What do we mean by “model coefficients?”

\mu_i = \beta_0 + \beta_1 X_i,

  • \beta_0 is the baseline for where our model crosses the y-axis, i.e., when X_i=0.
    • Is this meaningful when we are talking about the average ridership when it is 0oF in DC?
  • \beta_1 is the slope, or average change, in the outcome (Y) for a one unit increase in the predictor (X).
    • Interpretation: for a [1 unit of predictor] increase in [the predictor], [the outcome] [increases or decreases] by [abs(\beta_1)].
    • In our example, suppose \beta_1=4.5. For a 1oF increase in the temperature, the ridership increases by 4.5 riders.

Building the Regression Model

  • We are now interested in the model

Y_i|\beta_0,\beta_1,\sigma\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i

  • Note that \mu_i is the local mean (for a specific value of X).
    • In our data, \mu_i is the mean ridership for day i.
  • The global mean (regardless of the value of X) is given by \mu.
    • In our data, \mu is the mean ridership, regardless of day.
  • Under this model, \sigma is now measuring the variability from the local mean.

Building the Regression Model

  • We know the assumptions for linear regression in the frequentist framework.

  • In Bayesian Normal regression,

    • The observations are independent.
    • Y can be written as a linear function of X, \mu = \beta_0 + \beta_1 X.
    • For any value of X, Y varies normally around \mu with constant standard deviation \sigma.

Building the Regression Model

  • First, we will assume that our prior models of \beta_0, \beta_1, and \sigma are independent.

  • It is common to use the a normal prior for \beta_0 and \beta_1.

\begin{align*} \beta_0 &\sim N(m_0, s_0^2) \\ \beta_1 &\sim N(m_1, s_1^2) \end{align*}

  • Then, it is the default to use the exponential for \sigma; both are restricted to positive values.

\sigma \sim \text{Exp}(l)

  • Note that E[\sigma] = 1/l and \text{sd}[\sigma] = 1/l.

Building the Regression Model

  • Thus, our regression model has the following formulation:

\begin{align*} &\text{data}: & Y_i|\beta_0, \beta_1, \sigma & \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i \\ &\text{priors:} &\beta_0 & \sim N(m_0, s_0^2) \\ & & \beta_1 &\sim N(m_1, s_1^2) \\ & & & \sigma \sim \text{Exp}(l) \end{align*}

Tuning the Prior Models

  • Based on the past bikeshare analyses, we have the following centered prior understandings. What should our priors be?
  1. On an average temperature day for DC (65oF-70oF), there are typically around 5000 riders, but this could vary between 3000 and 7000 riders.

  2. For every one degree increase in temperature, ridership typically increases by 100 rides, but this could vary between 20 and 180 rides.

  3. At any given temperature, the daily ridership will tend to vary with a moderate standard deviation of 1250 rides.

Tuning the Prior Models

  1. On an average temperature day for DC (65oF-70oF), there are typically around 5000 riders, but this could vary between 3000 and 7000 riders.

\beta_{0\text{c}} \sim N(5000, 1000^2)

  1. For every one degree increase in temperature, ridership typically increases by 100 rides, but this could vary between 20 and 180 rides.

\beta_{1\text{c}} \sim N(100, 40^2)

  1. At any given temperature, the daily ridership will tend to vary with a moderate standard deviation of 1250 rides.
    • Recall, E[\sigma] = 1/l = 1250, so l = 1/1250 = 0.0008.

\sigma \sim \text{Exp}(0.0008)

Tuning the Prior Models

\beta_{0\text{c}} \sim N(5000, 1000^2) \ \ \ \beta_{1\text{c}} \sim N(100, 40^2) \ \ \ \sigma \sim \text{Exp}(0.0008)

Simulating the Posterior

  • Let’s now update what we know using the bikes data in the bayesrule package.

  • Looking at the basic relationship between the number of rides vs. the temperature (as it feels outside),

Simulating the Posterior

  • Note: I am skipping the derivation / the true math behind how to find the posterior in this situation. (It involves multiple integrals!)

  • We will use the stan_glm() function from rstanarm – it contains pre-defined Bayesian regression models.

    • stan_glm() also applies to the wider family of GzLM (i.e., logistic & Poisson/negbin).
library(rstanarm)
bike_model <- stan_glm(rides ~ temp_feel, # data model
                       data = bikes, # dataset
                       family = gaussian, # distribution to apply
                       prior_intercept = normal(5000, 1000), # b0_c
                       prior = normal(100, 40), # b1_c
                       prior_aux = exponential(0.0008), # sigma
                       chains = 4, # 4 chains
                       iter = 5000*2, # 10000 iterations - throw out first 5000
                       seed = 84735) # starting place in RNG

Simulating the Posterior

bike_model <- stan_glm(rides ~ temp_feel, # data model
                       data = bikes, # dataset
                       family = gaussian, # distribution to apply
                       prior_intercept = normal(5000, 1000), # b0_c
                       prior = normal(100, 40), # b1_c
                       prior_aux = exponential(0.0008), # sigma
                       chains = 4, # 4 chains
                       iter = 5000*2, # 10000 iterations - throw out first 5000
                       seed = 84735) # starting place in RNG
  • stan_glm() contains three types of information:
    • Data information: The first three arguments specify the structure of our data.
      • We want to model ridership by temperature (rides ~ temp_feel) using data = bikes and assuming a Normal data model, aka family = gaussian.

Simulating the Posterior

bike_model <- stan_glm(rides ~ temp_feel, # data model
                       data = bikes, # dataset
                       family = gaussian, # distribution to apply
                       prior_intercept = normal(5000, 1000), # b0_c
                       prior = normal(100, 40), # b1_c
                       prior_aux = exponential(0.0008), # sigma
                       chains = 4, # 4 chains
                       iter = 5000*2, # 10000 iterations - throw out first 5000
                       seed = 84735) # starting place in RNG
  • stan_glm() contains three types of information:
    • Prior information: The prior_intercept, prior, and prior_aux arguments give the priors for \beta_{0\text{c}}, \beta_{1\text{c}}, \sigma.

Simulating the Posterior

bike_model <- stan_glm(rides ~ temp_feel, # data model
                       data = bikes, # dataset
                       family = gaussian, # distribution to apply
                       prior_intercept = normal(5000, 1000), # b0_c
                       prior = normal(100, 40), # b1_c
                       prior_aux = exponential(0.0008), # sigma
                       chains = 4, # 4 chains
                       iter = 5000*2, # 10000 iterations - throw out first 5000
                       seed = 84735) # starting place in RNG
  • stan_glm() contains three types of information:
    • Markov chain information: The chains, iter, and seed arguments specify the number of Markov chains to run, the length or number of iterations for each chain, and the starting place of the RNG.

Simulating the Posterior

  • Wait, how does this work when we are looking at three model parameters?

  • We will have three vectors – one for each model parameter.

\begin{align*} <\beta_0^{(1)}, & \ \beta_0^{(2)}, ..., \beta_0^{(5000)}> \\ <\beta_1^{(1)}, & \ \beta_1^{(2)}, ..., \beta_1^{(5000)} > \\ <\sigma^{(1)}, & \ \sigma^{(2)}, ..., \sigma^{(5000)} > \\ \end{align*}

Simulation Diagnostics

  • Run the simulation code (from a previous slide).

  • Then, run diagnostics:

neff_ratio(bike_model)
rhat(bike_model)
mcmc_trace(bike_model, size = 0.1)
mcmc_dens_overlay(bike_model)

Simulation Diagnostics

  • Diagnostics:
neff_ratio(bike_model)
(Intercept)   temp_feel       sigma 
    1.03285     1.03505     0.96585 
rhat(bike_model)
(Intercept)   temp_feel       sigma 
  0.9998873   0.9999032   0.9999642 
  • Quick diagnostics indicate that the resulting chains are trustworthy.

  • The effective sample size ratios are slightly above 1 and the R-hat values are very close to 1.

Simulation Diagnostics

  • Diagnostics:

  • We can see that the chains are stable, mixing quickly, and behaving much like an independent sample.

Simulation Diagnostics

  • Diagnostics:

  • The density plot lets us visualize and examine the posterior models for each of our regression parameters, \beta_0, \beta_1, \sigma.

Interpreting the Posterior

  • Okay… what does this mean, though?
tidy(bike_model, effects = c("fixed", "aux"), conf.int = TRUE, conf.level = 0.80)
  • Thus, the posterior median relationship is

y = -2194.24 + 82.16x

  • For a 1 degree increase in temperature, we expect ridership to increase by about 82 rides, with 80% credible interval (75.7, 88.7).

Interpreting the Posterior

  • We can look at alternatives by drawing from the simulated data in bike_model.
    • The add_fitted_draws() function is from the tidybayes package.
bikes %>%
  add_fitted_draws(bike_model, n = 50) %>%
  ggplot(aes(x = temp_feel, y = rides)) +
    geom_line(aes(y = .value, group = .draw), alpha = 0.15) + 
    geom_point(data = bikes, size = 0.05) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_bw()

Interpreting the Posterior

  • We can look at alternatives by drawing from the simulated data in bike_model.

  • We see that the plausible models are not super variable.
    • This means we’re more confident about the relationship we’re observing.

Interpreting the Posterior

  • How does this compare against the frequentist version of regression?

Working Example

  • We will examine Australian weather from the weather_WU data in the bayesrules package.
    • This data contains 100 days of weather data for each of two Australian cities: Uluru and Wollongong.
data(weather_WU)
head(weather_WU)

Working Example

  • Let’s keep only the variables on afternoon temperatures (temp3pm) and a subset of possible predictors that we’d have access to in the morning:
weather_WU <- weather_WU %>% 
  select(location, windspeed9am, humidity9am, pressure9am, temp9am, temp3pm)
head(weather_WU)

Working Example

  • We begin our analysis with the familiar: a simple Normal regression model of temp3pm with one quantitative predictor, the morning temperature temp9am, both measured in degrees Celsius.
ggplot(weather_WU, aes(x = temp9am, y = temp3pm)) +
  geom_point(size = 0.2) + 
  theme_bw()

Working Example

  • Let’s model the 3 pm temperature as a function of the 9 am temperature on a given day i.
    • Outcome: Y_i = 3 pm temp
    • Predictor: X_{i1} = 9 am temp
  • Then, we can model it using the Bayesian normal regression model,

\begin{align*} Y_i | \beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2), \text{ with } \mu_i = \beta_0 + \beta_1 X_{i1} \\ \beta_{0c} &\sim N(25, 5^2) \\ \beta_1 &\sim N(0,3.1^2) \\ \sigma & \sim \text{Exp}(0.13) \end{align*}

  • Note that we are using the centered intercept as 0 degree mornings are rare in Australia.

Working Example

  • Simulating this model,
weather_model_1 <- stan_glm(
  temp3pm ~ temp9am, 
  data = weather_WU, family = gaussian,
  prior_intercept = normal(25, 5),
  prior = normal(0, 2.5, autoscale = TRUE), 
  prior_aux = exponential(1, autoscale = TRUE),
  chains = 4, iter = 5000*2, seed = 84735)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.077 seconds (Warm-up)
Chain 1:                0.123 seconds (Sampling)
Chain 1:                0.2 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 8e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.074 seconds (Warm-up)
Chain 2:                0.133 seconds (Sampling)
Chain 2:                0.207 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.078 seconds (Warm-up)
Chain 3:                0.136 seconds (Sampling)
Chain 3:                0.214 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 6e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.078 seconds (Warm-up)
Chain 4:                0.132 seconds (Sampling)
Chain 4:                0.21 seconds (Total)
Chain 4: 

Working Example

  • Note that we asked stan_glm() to autoscale our priors. What did it change them to?
prior_summary(weather_model_1)
Priors for model 'weather_model_1' 
------
Intercept (after predictors centered)
 ~ normal(location = 25, scale = 5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 3.1)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 0.13)
------
See help('prior_summary.stanreg') for more details

Working Example

mcmc_trace(weather_model_1, size = 0.1)

Working Example

mcmc_dens_overlay(weather_model_1)

Working Example

posterior_interval(weather_model_1, prob = 0.80)
                  10%      90%
(Intercept) 2.9498083 5.448752
temp9am     0.9802648 1.102423
sigma       3.8739305 4.409474
  • 80% credible interval for \beta_1: (0.98, 1.10)

  • 80% credible interval for \sigma: (3.87, 4.41)

Working Example

pp_check(weather_model_1)

Categorical Predictors

  • What if we look at the data by location?
ggplot(weather_WU, aes(x = temp3pm, fill = location)) + 
  geom_density(alpha = 0.5) + theme_bw()

  • We should probably look at location as a predictor…

Categorical Predictors

  • Let’s let X_{i2} be an indicator for the location,

X_{i2} = \begin{cases} 1 & \text{Wollongong} \\ 0 & \text{otherwise (i.e., Uluru).} \end{cases}

  • We are treating “not-Wollongong” as our reference group – in this case, it is Uluru.

\begin{array}{rl} \text{data:} & Y_i \mid \beta_0, \beta_1, \sigma \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \quad \text{with} \quad \mu_i = \beta_0 + \beta_1 X_{i2} \\ \text{priors:} & \beta_{0c} \sim N(25, 5^2) \\ & \beta_1 \sim N(0, 38^2) \\ & \sigma \sim \text{Exp}(0.13). \end{array}

Categorical Predictors

  • Let’s think about our model.

y = \beta_0 + \beta_1 x_2

  • What do our coefficients mean?
    • \beta_0 is the typical 3 pm temperature in Uluru (x_2=0).
    • \beta_1 is the typical difference in 3 pm temperature in Wollongong (x_2=1) as compared to Uluru (x_2=0).
    • \sigma represents the standard deviation in 3 pm temperatures in Wollongong and Uluru.

Categorical Predictors

  • Let’s think about our model.

y = \beta_0 + \beta_1 x_2

  • \beta_1 is the typical difference in 3 pm temperature in Wollongong (x_2=1) as compared to Uluru (x_2=0).

  • When we use a binary predictor, this results in two models, effectively.

    • One when x_1=0 (Uluru) and one when x_1=1 (Wollongong).

\begin{align*} y = \beta_0 + \beta_1 x_2 \to \text{(U)} \ y &= \beta_0 \\ \text{(W)} \ y&= \beta_0+\beta_1 \end{align*}

Categorical Predictors

  • Let’s simulate our posterior using weakly informative priors,
weather_model_2 <- stan_glm(
  temp3pm ~ location,
  data = weather_WU, family = gaussian,
  prior_intercept = normal(25, 5),
  prior = normal(0, 2.5, autoscale = TRUE), 
  prior_aux = exponential(1, autoscale = TRUE),
  chains = 4, iter = 5000*2, seed = 84735)

Categorical Predictors

mcmc_trace(weather_model_2, size = 0.1)

Categorical Predictors

mcmc_dens_overlay(weather_model_2)

Categorical Predictors

  • Looking at the posterior summary,
tidy(weather_model_2, effects = c("fixed", "aux"),
     conf.int = TRUE, conf.level = 0.80) %>% 
  select(-std.error)

Categorical Predictors

  • We can also look at the temperatures by location,

Multiple Predictors

  • What if we want to include multiple predictors?
    • Notice in the code, our model now has multiple predictors (temp9am and location).
    • Here, we are simulating the prior - this will allow us to graphically examine what we are claiming with the priors.
weather_model_3_prior <- stan_glm(
  temp3pm ~ temp9am + location,
  data = weather_WU, family = gaussian, 
  prior_intercept = normal(25, 5),
  prior = normal(0, 2.5, autoscale = TRUE), 
  prior_aux = exponential(1, autoscale = TRUE),
  chains = 4, iter = 5000*2, seed = 84735,
  prior_PD = TRUE)

Multiple Predictors

  • From the simulated priors,
    • We can look at different sets of 3 p.m. temperature data (left graph).

Multiple Predictors

  • From the simulated priors,
    • We can also look at our prior assumptions about the relationship between 3 p.m. and 9 a.m. temperature at each location (right graph).
      • Our prior is vague; we’re not sure of the relationship!

Multiple Predictors

  • Instead of starting the stan_glm() syntax from scratch, we can update() the weather_model_3_prior by setting prior_PD = FALSE:
weather_model_3 <- update(weather_model_3_prior, prior_PD = FALSE)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.088 seconds (Warm-up)
Chain 1:                0.144 seconds (Sampling)
Chain 1:                0.232 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 8e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.088 seconds (Warm-up)
Chain 2:                0.156 seconds (Sampling)
Chain 2:                0.244 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 7e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.091 seconds (Warm-up)
Chain 3:                0.143 seconds (Sampling)
Chain 3:                0.234 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.09 seconds (Warm-up)
Chain 4:                0.143 seconds (Sampling)
Chain 4:                0.233 seconds (Total)
Chain 4: 

Multiple Predictors

  • The simulation results in 20,000 posterior plausible relationships between temperature and location.

  • You try the following code:

weather_WU %>%
  add_fitted_draws(weather_model_3, n = 100) %>%
  ggplot(aes(x = temp9am, y = temp3pm, color = location)) +
    geom_line(aes(y = .value, group = paste(location, .draw)), alpha = .1) +
    geom_point(data = weather_WU, size = 0.1) +
  theme_bw()

Multiple Predictors

  • 3 p.m. temperature is positively associated with 9 a.m. temperature and tends to be higher in Uluru than in Wollongong.

  • Further, relative to the prior simulated relationships in Figure 11.9, these posterior relationships are very consistent

Multiple Predictors

  • Looking at the posterior summary statistics,
tidy(weather_model_3, effects = c("fixed", "aux"),
     conf.int = TRUE, conf.level = 0.80) %>% 
  select(-std.error)

Multiple Predictors

  • You try! Run the following code to look at our posterior predictive models.
# Simulate a set of predictions
set.seed(84735)
temp3pm_prediction <- posterior_predict(
  weather_model_3,
  newdata = data.frame(temp9am = c(10, 10), 
                       location = c("Uluru", "Wollongong")))

# Plot the posterior predictive models
mcmc_areas(temp3pm_prediction) +
  ggplot2::scale_y_discrete(labels = c("Uluru", "Wollongong")) + 
  xlab("temp3pm")

Multiple Predictors

  • Roughly speaking, we can anticipate 3 p.m. temperatures between 15 and 25 degrees in Uluru, and cooler temperatures between 8 and 18 in Wollongong.

Poisson Regression

# Simulate the prior distribution
equality_model_prior <- stan_glm(laws ~ percent_urban + historical, 
                                 data = equality, 
                                 family = poisson,
                                 prior_intercept = normal(2, 0.5),
                                 prior = normal(0, 2.5, autoscale = TRUE), 
                                 chains = 4, iter = 5000*2, seed = 84735, 
                                 prior_PD = TRUE)

# Update to simulate the posterior distribution
equality_model <- update(equality_model_prior, prior_PD = FALSE)

Negative Binomial Regression

# Simulate the prior distribution
books_negbin_sim <- stan_glm(
  books ~ age + wise_unwise, 
  data = pulse, family = neg_binomial_2,
  prior_intercept = normal(0, 2.5, autoscale = TRUE),
  prior = normal(0, 2.5, autoscale = TRUE), 
  prior_aux = exponential(1, autoscale = TRUE),
  chains = 4, iter = 5000*2, seed = 84735)

# Update to simulate the posterior distribution
equality_model <- update(equality_model_prior, prior_PD = FALSE)

Logistic Regression

# Simulate the prior distribution
rain_model_prior <- stan_glm(raintomorrow ~ humidity9am,
                             data = weather, family = binomial,
                             prior_intercept = normal(-1.4, 0.7),
                             prior = normal(0.07, 0.035),
                             chains = 4, iter = 5000*2, seed = 84735,
                             prior_PD = TRUE)

# Update to simulate the posterior distribution
rain_model_1 <- update(rain_model_prior, prior_PD = FALSE)

Wrap Up

  • Today, we have learned about regression in the Bayesian framework.

  • Thursday: Assignment 4.

  • Next Tuesday: Project 2 - in class work session.

  • Next Thursday: Project 2 presentations (LIVE).

Homework

  • 9.9, 9.10, 9.11, 9.12

  • 9.16, 9.17, 9.18

  • 12.5, 12.6, 12.7, 12.8

  • 13.10, 13.11