July 22, 2025
Tuesday
Before today, we effectively were performing one-sample tests of means, proportions, and counts.
Now, we will focus on incorporating just a single predictor into our analysis.
We now switch to thinking about analysis in terms of regression.
Capital Bikeshare is a bike sharing service in the Washington, D.C. area. To best serve its registered members, the company must understand the demand for its service. We will analyze the number of rides taken on a random sample of n days, (Y_1, Y_2, ..., Y_n).
Beacuse Y_i is a count variable, you might assume that ridership might need the Poisson distribution. However, past bike riding seasons have exhibited bell-shaped daily ridership with a variability in ridership that far exceeds the typical ridership.
We will instead assume that the number of rides varies normally around some typical ridership, \mu, with standard deviation, \sigma.
Y_i|\mu,\sigma \overset{\text{ind}}{\sim} N(\mu, \sigma^2)
In our example, Y is the number of rides and X is the temperature.
Our specific goal will be to model the relationship between ridership and temperature:
It is reasonable to assume a positive relationship between temperature and number of rides.
For learning purposes, let’s focus on the model with a single predictor (y ~ x
in R).
\{ (Y_1, X_1), (Y_2, X_2), ..., (Y_n, X_n) \}
where Y_i is the number of rides and X_i is the high temperature (oF) on day i.
Assuming that the relationship is linear, we can model
\mu_i = \beta_0 + \beta_1 X_i,
\mu_i = \beta_0 + \beta_1 X_i,
Y_i|\beta_0,\beta_1,\sigma\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i
We know the assumptions for linear regression in the frequentist framework.
In Bayesian Normal regression,
First, we will assume that our prior models of \beta_0, \beta_1, and \sigma are independent.
It is common to use the a normal prior for \beta_0 and \beta_1.
\begin{align*} \beta_0 &\sim N(m_0, s_0^2) \\ \beta_1 &\sim N(m_1, s_1^2) \end{align*}
\sigma \sim \text{Exp}(l)
\begin{align*} &\text{data}: & Y_i|\beta_0, \beta_1, \sigma & \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i \\ &\text{priors:} &\beta_0 & \sim N(m_0, s_0^2) \\ & & \beta_1 &\sim N(m_1, s_1^2) \\ & & & \sigma \sim \text{Exp}(l) \end{align*}
On an average temperature day for DC (65oF-70oF), there are typically around 5000 riders, but this could vary between 3000 and 7000 riders.
For every one degree increase in temperature, ridership typically increases by 100 rides, but this could vary between 20 and 180 rides.
At any given temperature, the daily ridership will tend to vary with a moderate standard deviation of 1250 rides.
\beta_{0\text{c}} \sim N(5000, 1000^2)
\beta_{1\text{c}} \sim N(100, 40^2)
\sigma \sim \text{Exp}(0.0008)
Let’s now update what we know using the bikes
data in the bayesrule
package.
Looking at the basic relationship between the number of rides vs. the temperature (as it feels outside),
Note: I am skipping the derivation / the true math behind how to find the posterior in this situation. (It involves multiple integrals!)
We will use the stan_glm()
function from rstanarm
– it contains pre-defined Bayesian regression models.
stan_glm()
also applies to the wider family of GzLM (i.e., logistic & Poisson/negbin).library(rstanarm)
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
stan_glm()
contains three types of information:
rides ~ temp_feel
) using data = bikes
and assuming a Normal data model, aka family = gaussian
.bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
stan_glm()
contains three types of information:
prior_intercept
, prior
, and prior_aux
arguments give the priors for \beta_{0\text{c}}, \beta_{1\text{c}}, \sigma.bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
stan_glm()
contains three types of information:
chains
, iter
, and seed
arguments specify the number of Markov chains to run, the length or number of iterations for each chain, and the starting place of the RNG.Wait, how does this work when we are looking at three model parameters?
We will have three vectors – one for each model parameter.
\begin{align*} <\beta_0^{(1)}, & \ \beta_0^{(2)}, ..., \beta_0^{(5000)}> \\ <\beta_1^{(1)}, & \ \beta_1^{(2)}, ..., \beta_1^{(5000)} > \\ <\sigma^{(1)}, & \ \sigma^{(2)}, ..., \sigma^{(5000)} > \\ \end{align*}
Run the simulation code (from a previous slide).
Then, run diagnostics:
(Intercept) temp_feel sigma
1.03285 1.03505 0.96585
(Intercept) temp_feel sigma
0.9998873 0.9999032 0.9999642
Quick diagnostics indicate that the resulting chains are trustworthy.
The effective sample size ratios are slightly above 1 and the R-hat values are very close to 1.
y = -2194.24 + 82.16x
bike_model
.
add_fitted_draws()
function is from the tidybayes
package.bike_model
.weather_WU
data in the bayesrules
package.
temp3pm
) and a subset of possible predictors that we’d have access to in the morning:temp3pm
with one quantitative predictor, the morning temperature temp9am
, both measured in degrees Celsius.\begin{align*} Y_i | \beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2), \text{ with } \mu_i = \beta_0 + \beta_1 X_{i1} \\ \beta_{0c} &\sim N(25, 5^2) \\ \beta_1 &\sim N(0,3.1^2) \\ \sigma & \sim \text{Exp}(0.13) \end{align*}
weather_model_1 <- stan_glm(
temp3pm ~ temp9am,
data = weather_WU, family = gaussian,
prior_intercept = normal(25, 5),
prior = normal(0, 2.5, autoscale = TRUE),
prior_aux = exponential(1, autoscale = TRUE),
chains = 4, iter = 5000*2, seed = 84735)
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.077 seconds (Warm-up)
Chain 1: 0.123 seconds (Sampling)
Chain 1: 0.2 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 8e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.074 seconds (Warm-up)
Chain 2: 0.133 seconds (Sampling)
Chain 2: 0.207 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 5e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.078 seconds (Warm-up)
Chain 3: 0.136 seconds (Sampling)
Chain 3: 0.214 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 6e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.078 seconds (Warm-up)
Chain 4: 0.132 seconds (Sampling)
Chain 4: 0.21 seconds (Total)
Chain 4:
stan_glm()
to autoscale our priors. What did it change them to?Priors for model 'weather_model_1'
------
Intercept (after predictors centered)
~ normal(location = 25, scale = 5)
Coefficients
Specified prior:
~ normal(location = 0, scale = 2.5)
Adjusted prior:
~ normal(location = 0, scale = 3.1)
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 0.13)
------
See help('prior_summary.stanreg') for more details
10% 90%
(Intercept) 2.9498083 5.448752
temp9am 0.9802648 1.102423
sigma 3.8739305 4.409474
80% credible interval for \beta_1: (0.98, 1.10)
80% credible interval for \sigma: (3.87, 4.41)
X_{i2} = \begin{cases} 1 & \text{Wollongong} \\ 0 & \text{otherwise (i.e., Uluru).} \end{cases}
\begin{array}{rl} \text{data:} & Y_i \mid \beta_0, \beta_1, \sigma \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \quad \text{with} \quad \mu_i = \beta_0 + \beta_1 X_{i2} \\ \text{priors:} & \beta_{0c} \sim N(25, 5^2) \\ & \beta_1 \sim N(0, 38^2) \\ & \sigma \sim \text{Exp}(0.13). \end{array}
y = \beta_0 + \beta_1 x_2
y = \beta_0 + \beta_1 x_2
\beta_1 is the typical difference in 3 pm temperature in Wollongong (x_2=1) as compared to Uluru (x_2=0).
When we use a binary predictor, this results in two models, effectively.
\begin{align*} y = \beta_0 + \beta_1 x_2 \to \text{(U)} \ y &= \beta_0 \\ \text{(W)} \ y&= \beta_0+\beta_1 \end{align*}
temp9am
and location
).stan_glm()
syntax from scratch, we can update()
the weather_model_3_prior
by setting prior_PD = FALSE
:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.088 seconds (Warm-up)
Chain 1: 0.144 seconds (Sampling)
Chain 1: 0.232 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 8e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.088 seconds (Warm-up)
Chain 2: 0.156 seconds (Sampling)
Chain 2: 0.244 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 7e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.091 seconds (Warm-up)
Chain 3: 0.143 seconds (Sampling)
Chain 3: 0.234 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.09 seconds (Warm-up)
Chain 4: 0.143 seconds (Sampling)
Chain 4: 0.233 seconds (Total)
Chain 4:
The simulation results in 20,000 posterior plausible relationships between temperature and location.
You try the following code:
3 p.m. temperature is positively associated with 9 a.m. temperature and tends to be higher in Uluru than in Wollongong.
Further, relative to the prior simulated relationships in Figure 11.9, these posterior relationships are very consistent
# Simulate a set of predictions
set.seed(84735)
temp3pm_prediction <- posterior_predict(
weather_model_3,
newdata = data.frame(temp9am = c(10, 10),
location = c("Uluru", "Wollongong")))
# Plot the posterior predictive models
mcmc_areas(temp3pm_prediction) +
ggplot2::scale_y_discrete(labels = c("Uluru", "Wollongong")) +
xlab("temp3pm")
# Simulate the prior distribution
equality_model_prior <- stan_glm(laws ~ percent_urban + historical,
data = equality,
family = poisson,
prior_intercept = normal(2, 0.5),
prior = normal(0, 2.5, autoscale = TRUE),
chains = 4, iter = 5000*2, seed = 84735,
prior_PD = TRUE)
# Update to simulate the posterior distribution
equality_model <- update(equality_model_prior, prior_PD = FALSE)
# Simulate the prior distribution
books_negbin_sim <- stan_glm(
books ~ age + wise_unwise,
data = pulse, family = neg_binomial_2,
prior_intercept = normal(0, 2.5, autoscale = TRUE),
prior = normal(0, 2.5, autoscale = TRUE),
prior_aux = exponential(1, autoscale = TRUE),
chains = 4, iter = 5000*2, seed = 84735)
# Update to simulate the posterior distribution
equality_model <- update(equality_model_prior, prior_PD = FALSE)
# Simulate the prior distribution
rain_model_prior <- stan_glm(raintomorrow ~ humidity9am,
data = weather, family = binomial,
prior_intercept = normal(-1.4, 0.7),
prior = normal(0.07, 0.035),
chains = 4, iter = 5000*2, seed = 84735,
prior_PD = TRUE)
# Update to simulate the posterior distribution
rain_model_1 <- update(rain_model_prior, prior_PD = FALSE)
Today, we have learned about regression in the Bayesian framework.
Thursday: Assignment 4.
Next Tuesday: Project 2 - in class work session.
Next Thursday: Project 2 presentations (LIVE).
9.9, 9.10, 9.11, 9.12
9.16, 9.17, 9.18
12.5, 12.6, 12.7, 12.8
13.10, 13.11
STA6349 - Applied Bayesian Analysis - Summer 2025