STA6349: Applied Bayesian Analysis
Spring 2025
Before today, we effectively were performing one-sample tests of means, proportions, and counts.
Now, we will focus on incorporating just a single predictor into our analysis.
We now switch to thinking about analysis in terms of regression.
Capital Bikeshare is a bike sharing service in the Washington, D.C. area. To best serve its registered members, the company must understand the demand for its service. We will analyze the number of rides taken on a random sample of n days, (Y_1, Y_2, ..., Y_n).
Beacuse Y_i is a count variable, you might assume that ridership might need the Poisson distribution. However, past bike riding seasons have exhibited bell-shaped daily ridership with a variability in ridership that far exceeds the typical ridership.
We will instead assume that the number of rides varies normally around some typical ridership, \mu, with standard deviation, \sigma.
Y_i|\mu,\sigma \overset{\text{ind}}{\sim} N(\mu, \sigma^2)
In our example, Y is the number of rides and X is the temperature.
Our specific goal will be to model the relationship between ridership and temperature:
It is reasonable to assume a positive relationship between temperature and number of rides.
Today we will focus on the model with a single predictor (y ~ x
in R).
\{ (Y_1, X_1), (Y_2, X_2), ..., (Y_n, X_n) \}
where Y_i is the number of rides and X_i is the high temperature (oF) on day i.
Assuming that the relationship is linear, we can model
\mu_i = \beta_0 + \beta_1 X_i,
\mu_i = \beta_0 + \beta_1 X_i,
Y_i|\beta_0,\beta_1,\sigma\overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i
We know the assumptions for linear regression in the frequentist framework.
In Bayesian Normal regression,
We know we must set priors for our parameters.
rstanarm
.First, we will assume that our prior models of \beta_0, \beta_1, and \sigma are independent.
It is common to use the a normal prior for \beta_0 and \beta_1.
\begin{align*} \beta_0 &\sim N(m_0, s_0^2) \\ \beta_1 &\sim N(m_1, s_1^2) \end{align*}
\sigma \sim \text{Exp}(l)
\begin{align*} &\text{data}: & Y_i|\beta_0, \beta_1, \sigma & \overset{\text{ind}}{\sim} N(\mu_i, \sigma^2) \text{ with } \mu_i = \beta_0 + \beta_1 X_i \\ &\text{priors:} &\beta_0 & \sim N(m_0, s_0^2) \\ & & \beta_1 &\sim N(m_1, s_1^2) \\ & & & \sigma \sim \text{Exp}(l) \end{align*}
\beta_{0\text{c}} \sim N(5000, 1000^2)
\beta_{1\text{c}} \sim N(100, 40^2)
\sigma \sim \text{Exp}(0.0008)
Let’s now update what we know using the bikes
data in the bayesrule
package.
Looking at the basic relationship between the number of rides vs. the temperature (as it feels outside),
Note: I am skipping the derivation / the true math behind how to find the posterior in this situation. (It involves multiple integrals!)
We will use the stan_glm()
function from rstanarm
– it contains pre-defined Bayesian regression models.
stan_glm()
also applies to the wider family of GzLM (i.e., logistic & Poisson/negbin).library(rstanarm)
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
stan_glm()
contains three types of information:
rides ~ temp_feel
) using data = bikes
and assuming a Normal data model, aka family = gaussian
.prior_intercept
, prior
, and prior_aux
arguments give the priors for \beta_{0\text{c}}, \beta_{1\text{c}}, \sigma.chains
, iter
, and seed
arguments specify the number of Markov chains to run, the length or number of iterations for each chain, and the starting place of the RNG.Wait, how does this work when we are looking at three model parameters?
We will have three vectors – one for each model parameter.
\begin{align*} <\beta_0^{(1)}, & \ \beta_0^{(2)}, ..., \beta_0^{(5000)}> \\ <\beta_1^{(1)}, & \ \beta_1^{(2)}, ..., \beta_1^{(5000)} > \\ <\sigma^{(1)}, & \ \sigma^{(2)}, ..., \sigma^{(5000)} > \\ \end{align*}
library(rstanarm)
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
bike_model <- stan_glm(rides ~ temp_feel, # data model
data = bikes, # dataset
family = gaussian, # distribution to apply
prior_intercept = normal(5000, 1000), # b0_c
prior = normal(100, 40), # b1_c
prior_aux = exponential(0.0008), # sigma
chains = 4, # 4 chains
iter = 5000*2, # 10000 iterations - throw out first 5000
seed = 84735) # starting place in RNG
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 3.6e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.36 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.126 seconds (Warm-up)
Chain 1: 0.196 seconds (Sampling)
Chain 1: 0.322 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 5e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.152 seconds (Warm-up)
Chain 2: 0.197 seconds (Sampling)
Chain 2: 0.349 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 7e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.123 seconds (Warm-up)
Chain 3: 0.191 seconds (Sampling)
Chain 3: 0.314 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 6e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.11 seconds (Warm-up)
Chain 4: 0.197 seconds (Sampling)
Chain 4: 0.307 seconds (Total)
Chain 4:
(Intercept) temp_feel sigma
1.03285 1.03505 0.96585
(Intercept) temp_feel sigma
0.9998873 0.9999032 0.9999642
Quick diagnostics indicate that the resulting chains are trustworthy.
The effective sample size ratios are slightly above 1 and the R-hat values are very close to 1.
y = -2194.24 + 82.16x
bike_model
.
add_fitted_draws()
function is from the tidybayes
package.bike_model
.Today, we have learned basic linear regression in the Bayesian framework.
What we have learned so far still holds true re: simulation and diagnostics.
Monday next week:
Wednesday next week:
Monday & Wednesday before finals:
9.9, 9.10, 9.11, 9.12
9.16, 9.17, 9.18