Before today, we effectively were performing one-sample tests of means, proportions, and counts.
Now, we will focus on incorporating just a single predictor into our analysis.
We now switch to thinking about analysis in terms of regression.
Capital Bikeshare is a bike sharing service in the Washington, D.C. area. To best serve its registered members, the company must understand the demand for its service. We will analyze the number of rides taken on a random sample of n days, (Y_1, Y_2, ..., Y_n).
Beacuse Y_i is a count variable, you might assume that ridership might need the Poisson distribution. However, past bike riding seasons have exhibited bell-shaped daily ridership with a variability in ridership that far exceeds the typical ridership.
We will instead assume that the number of rides varies normally around some typical ridership, \mu, with standard deviation, \sigma.
Y_i|\mu,\sigma \overset{\text{ind}}{\sim} N(\mu, \sigma^2)
In our example, Y is the number of rides and X is the temperature.
Our specific goal will be to model the relationship between ridership and temperature:
It is reasonable to assume a positive relationship between temperature and number of rides.
For learning purposes, let’s focus on the model with a single predictor (y ~ x in R).
\{ (Y_1, X_1), (Y_2, X_2), ..., (Y_n, X_n) \}
where Y_i is the number of rides and X_i is the high temperature (oF) on day i.
Assuming that the relationship is linear, we can model
\mu_i = \beta_0 + \beta_1 X_i,
\mu_i = \beta_0 + \beta_1 X_i,
\mu_i = \beta_0 + \beta_1 X_i,
Y_i|\beta_0,\beta_1,\sigma\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i
We know the assumptions for linear regression in the frequentist framework.
In Bayesian Normal regression,
First, we will assume that our prior models of \beta_0, \beta_1, and \sigma are independent.
It is common to use the a normal prior for \beta_0 and \beta_1.
\begin{align*} \beta_0 &\sim N(m_0, s_0^2) \\ \beta_1 &\sim N(m_1, s_1^2) \end{align*}
\sigma \sim \text{Exp}(l)
\begin{align*} &\text{data}: & Y_i|\beta_0, \beta_1, \sigma & \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i \\ &\text{priors:} &\beta_0 & \sim N(m_0, s_0^2) \\ & & \beta_1 &\sim N(m_1, s_1^2) \\ & & & \sigma \sim \text{Exp}(l) \end{align*}
On an average temperature day for DC (65oF-70oF), there are typically around 5000 riders, but this could vary between 3000 and 7000 riders.
For every one degree increase in temperature, ridership typically increases by 100 rides, but this could vary between 20 and 180 rides.
At any given temperature, the daily ridership will tend to vary with a moderate standard deviation of 1250 rides.
\beta_{0\text{c}} \sim N(5000, 1000^2)
\beta_{1\text{c}} \sim N(100, 40^2)
\sigma \sim \text{Exp}(0.0008)
Let’s now update what we know using the bikes data in the bayesrule package.
Looking at the basic relationship between the number of rides vs. the temperature (as it feels outside),
Note: I am skipping the derivation / the true math behind how to find the posterior in this situation. (It involves multiple integrals!)
We will use the stan_glm() function from rstanarm – it contains pre-defined Bayesian regression models.
stan_glm() also applies to the wider family of GzLM (i.e., logistic & Poisson/negbin).library(rstanarm)
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNGbike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNGstan_glm() contains three types of information:
rides ~ temp_feel) using data = bikes and assuming a Normal data model, aka family = gaussian.bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNGstan_glm() contains three types of information:
prior_intercept, prior, and prior_aux arguments give the priors for \beta_{0\text{c}}, \beta_{1\text{c}}, \sigma.bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNGstan_glm() contains three types of information:
chains, iter, and seed arguments specify the number of Markov chains to run, the length or number of iterations for each chain, and the starting place of the RNG.Wait, how does this work when we are looking at three model parameters?
We will have three vectors – one for each model parameter.
\begin{align*} <\beta_0^{(1)}, & \ \beta_0^{(2)}, ..., \beta_0^{(5000)}> \\ <\beta_1^{(1)}, & \ \beta_1^{(2)}, ..., \beta_1^{(5000)} > \\ <\sigma^{(1)}, & \ \sigma^{(2)}, ..., \sigma^{(5000)} > \\ \end{align*}
Run the simulation code (from a previous slide).
Then, run diagnostics:
(Intercept) temp_feel sigma
1.03285 1.03505 0.96585
(Intercept) temp_feel sigma
0.9998873 0.9999032 0.9999642
Quick diagnostics indicate that the resulting chains are trustworthy.
The effective sample size ratios are slightly above 1 and the R-hat values are very close to 1.
y = -2194.24 + 82.16x
bike_model.
add_fitted_draws() function is from the tidybayes package.bike_model.weather_WU data in the bayesrules package.
temp3pm) and a subset of possible predictors that we’d have access to in the morning:temp3pm with one quantitative predictor, the morning temperature temp9am, both measured in degrees Celsius.\begin{align*} Y_i | \beta_0, \beta_1, \sigma &\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2), \text{ with } \mu_i = \beta_0 + \beta_1 X_{i1} \\ \beta_{0c} &\sim N(25, 5^2) \\ \beta_1 &\sim N(0,3.1^2) \\ \sigma & \sim \text{Exp}(0.13) \end{align*}
weather_model_1 <- stan_glm(
temp3pm ~ temp9am,
data = weather_WU, family = gaussian,
prior_intercept = normal(25, 5),
prior = normal(0, 2.5, autoscale = TRUE),
prior_aux = exponential(1, autoscale = TRUE),
chains = 4, iter = 5000*2, seed = 84735)
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.066 seconds (Warm-up)
Chain 1: 0.119 seconds (Sampling)
Chain 1: 0.185 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 7e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.07 seconds (Warm-up)
Chain 2: 0.129 seconds (Sampling)
Chain 2: 0.199 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.073 seconds (Warm-up)
Chain 3: 0.128 seconds (Sampling)
Chain 3: 0.201 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 5e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.07 seconds (Warm-up)
Chain 4: 0.127 seconds (Sampling)
Chain 4: 0.197 seconds (Total)
Chain 4:
stan_glm() to autoscale our priors. What did it change them to?Priors for model 'weather_model_1'
------
Intercept (after predictors centered)
~ normal(location = 25, scale = 5)
Coefficients
Specified prior:
~ normal(location = 0, scale = 2.5)
Adjusted prior:
~ normal(location = 0, scale = 3.1)
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 0.13)
------
See help('prior_summary.stanreg') for more details
10% 90%
(Intercept) 2.9498083 5.448752
temp9am 0.9802648 1.102423
sigma 3.8739305 4.409474
80% credible interval for \beta_1: (0.98, 1.10)
80% credible interval for \sigma: (3.87, 4.41)
X_{i2} = \begin{cases} 1 & \text{Wollongong} \\ 0 & \text{otherwise (i.e., Uluru).} \end{cases}
\begin{array}{rl} \text{data:} & Y_i \mid \beta_0, \beta_1, \sigma \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \quad \text{with} \quad \mu_i = \beta_0 + \beta_1 X_{i2} \\ \text{priors:} & \beta_{0c} \sim N(25, 5^2) \\ & \beta_1 \sim N(0, 38^2) \\ & \sigma \sim \text{Exp}(0.13). \end{array}
y = \beta_0 + \beta_1 x_2
y = \beta_0 + \beta_1 x_2
\beta_1 is the typical difference in 3 pm temperature in Wollongong (x_2=1) as compared to Uluru (x_2=0).
When we use a binary predictor, this results in two models, effectively.
\begin{align*} y = \beta_0 + \beta_1 x_2 \to \text{(U)} \ y &= \beta_0 \\ \text{(W)} \ y&= \beta_0+\beta_1 \end{align*}
temp9am and location).stan_glm() syntax from scratch, we can update() the weather_model_3_prior by setting prior_PD = FALSE:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.08 seconds (Warm-up)
Chain 1: 0.136 seconds (Sampling)
Chain 1: 0.216 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 6e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.08 seconds (Warm-up)
Chain 2: 0.146 seconds (Sampling)
Chain 2: 0.226 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 7e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.08 seconds (Warm-up)
Chain 3: 0.137 seconds (Sampling)
Chain 3: 0.217 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.085 seconds (Warm-up)
Chain 4: 0.136 seconds (Sampling)
Chain 4: 0.221 seconds (Total)
Chain 4:
The simulation results in 20,000 posterior plausible relationships between temperature and location.
You try the following code:
3 p.m. temperature is positively associated with 9 a.m. temperature and tends to be higher in Uluru than in Wollongong.
Further, relative to the prior simulated relationships in Figure 11.9, these posterior relationships are very consistent
# Simulate a set of predictions
set.seed(84735)
temp3pm_prediction <- posterior_predict(
weather_model_3,
newdata = data.frame(temp9am = c(10, 10),
location = c("Uluru", "Wollongong")))
# Plot the posterior predictive models
mcmc_areas(temp3pm_prediction) +
ggplot2::scale_y_discrete(labels = c("Uluru", "Wollongong")) +
xlab("temp3pm")Today, we have learned about regression (under the normal distribution) in the Bayesian framework.
Wednesday: Evaluating the Model
Project stuff:
Next week:
From the Bayes Rules! textbook:
STA6349 - Applied Bayesian Analysis - Fall 2025