We have learned how to think like a Bayesian:
We have learned three conjugate families:
Once we have a posterior model, we must be able to apply the results.
f(\theta|y) = \frac{f(\theta) L(\theta|y)}{f(y)} \propto f(\theta)L(\theta|y)
f(y) = \int_{\theta_1} \int_{\theta_2} \cdot \cdot \cdot \int_{\theta_k} f(\theta) L(\theta|y) d\theta_k \cdot \cdot \cdot d\theta_2 d\theta_1
\left \{ \theta^{(1)}, \theta^{(2)}, ..., \theta^{(N)} \right \}
These \theta_i will have properties that reflect those of the posterior model for \theta.
To help us, we will apply these simulation techniques to Beta-Binomial and Gamma-Poisson models.
Note: we will use the following packages that may be new to you:
janitorrstanbayesplotIf you are using the server provided by HMCSE, they have been installed for you.
If you are working at home, please check to see if you have the libraries, then install if you do not.
install.packages(c("janitor", "rstan", "bayesplot"))Suppose there is an image that you can’t view in its entirety.
We can see snippets along a grid that sweeps from left to right across the image.
The finer the grid, the clearer the image; if the grid is fine enough, the result is a good approximation.
This is the general idea behind Bayesian grid approximation.
Our target image is the posterior pdf, f(\theta|y).
\left\{ \theta^{(1)}, \theta^{(2)}, ..., \theta^{(N)} \right\},
from a discretized approximation of the posterior pdf, f(\theta|y).
\begin{align*} Y|\pi &\sim \text{Bin}(10, \pi) \\ \pi &\sim \text{Beta}(2, 2) \end{align*}
\begin{align*} Y|\pi &\sim \text{Bin}(10, \pi) \\ \pi &\sim \text{Beta}(2, 2) \\ \pi | Y &\sim \text{Beta}(11, 3) \end{align*}
Instead of using the posterior we know, let’s approximate it using grid approximation.
First step: define a discrete grid of possible \theta values.
Instead of using the posterior we know, let’s approximate it using grid approximation.
First step: define a discrete grid of possible \theta values.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Second step: evaluate the prior pdf, f(\theta), and the likelihood function, L(\theta|y) at each \theta grid value.
dbeta() and dbinom() to evaluate the \text{Beta}(2,2) prior and \text{Bin}(10, \pi) likelihood with Y=9 at each \pi in pi_grid.Instead of using the posterior we know, let’s approximate it using grid approximation.
Second step: evaluate the prior pdf, f(\theta), and the likelihood function, L(\theta|y) at each \theta grid value.
dbeta() and dbinom() to evaluate the \text{Beta}(2,2) prior and \text{Bin}(10, \pi) likelihood with Y=9 at each \pi in pi_grid.Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
We now have a glimpse into the actual posterior pdf.
As we increase the number of possible \theta values, the better we can “see” the resulting posterior.
What happens if we try the following: n=50, n=100, n=500, n=1000?
As we increase the number of possible \theta values, the better we can “see” the resulting posterior.
What happens if we try the following: n=50, n=100, n=500, n=1000?
Markov chain Monte Carlo (MCMC) is an application of Markov chains to simulate probability models.
MCMC samples are not taken directly from the posterior pdf, f(\theta | y)… and they are not independent.
Suppose we have an N-length MCMC sample, \left\{ \theta^{(1)}, \theta^{(2)}, \theta^{(3)}, ..., \theta^{(N)} \right\}
f\left( \theta^{(i+1)} | \theta^{(i)}, y \right)
f\left( \theta^{(i+1)} | \theta^{(i)}, y \right) \ne f\left(\theta^{(i+1)}|y \right)
rstanrstan:
rstan notationdata: in our example, Y is the observed number of successes in 10 trials.
rstan that Y is an integer between 0 and 10.parameters: in our example, our model depends on \pi.
rstan that \pi can be any real number between 0 and 1.model: in our example, we need to specify Y \sim \text{Bin}(10, \pi) and \pi \sim \text{Beta}(2,2).rstanrstanmodel code: the character string defining the model (in our case, bb_model).data: a list of the observed data.
chains: how many parallel Markov chains to run.
iter: desired number of iterations, or length of Markov chain.
seed: used to set the seed of the RNG.rstan# STEP 2: simulate the posterior
bb_sim <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 5000*2, seed = 84735)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 4e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.013 seconds (Warm-up)
Chain 1: 0.014 seconds (Sampling)
Chain 1: 0.027 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.012 seconds (Warm-up)
Chain 2: 0.014 seconds (Sampling)
Chain 2: 0.026 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.013 seconds (Warm-up)
Chain 3: 0.012 seconds (Sampling)
Chain 3: 0.025 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.012 seconds (Warm-up)
Chain 4: 0.014 seconds (Sampling)
Chain 4: 0.026 seconds (Total)
Chain 4:
rstan, , parameters = pi
chains
iterations chain:1 chain:2 chain:3 chain:4
[1,] 0.8278040 0.7523560 0.6386998 0.9627572
[2,] 0.9087546 0.8922386 0.6254761 0.9413518
[3,] 0.6143859 0.8682356 0.7222360 0.9489624
[4,] 0.8462156 0.8792812 0.8100192 0.9413812
Remember, these are not a random sample from the posterior!
They are also not independent!
Each chain forms a dependent 5,000 length Markov chain of \left\{ \pi^{(1)}, \pi^{(2)}, ..., \pi^{(5000)}\right\}
rstanmcmc_trace() from bayesplot package) to see what the values did longitudinally.rstanmcmc_hist() and mcmc_dens() functions,Let’s now consider a smaller simulation, where n=50 (recall, overall n=100, but half is for burn-in).
Run the following code:
Let’s now consider a smaller simulation, where n=50 (recall, overall n=100, but half is for burn-in).
Run the following code:
bb_sim_short <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 50*2, seed = 84735)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: There aren't enough warmup iterations to fit the
Chain 1: three stages of adaptation as currently configured.
Chain 1: Reducing each adaptation stage to 15%/75%/10% of
Chain 1: the given number of warmup iterations:
Chain 1: init_buffer = 7
Chain 1: adapt_window = 38
Chain 1: term_buffer = 5
Chain 1:
Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 1: Iteration: 100 / 100 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0 seconds (Warm-up)
Chain 1: 0 seconds (Sampling)
Chain 1: 0 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: WARNING: There aren't enough warmup iterations to fit the
Chain 2: three stages of adaptation as currently configured.
Chain 2: Reducing each adaptation stage to 15%/75%/10% of
Chain 2: the given number of warmup iterations:
Chain 2: init_buffer = 7
Chain 2: adapt_window = 38
Chain 2: term_buffer = 5
Chain 2:
Chain 2: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 2: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 2: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 2: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 2: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 2: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 2: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 2: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 2: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 2: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 2: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 2: Iteration: 100 / 100 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0 seconds (Warm-up)
Chain 2: 0 seconds (Sampling)
Chain 2: 0 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: WARNING: There aren't enough warmup iterations to fit the
Chain 3: three stages of adaptation as currently configured.
Chain 3: Reducing each adaptation stage to 15%/75%/10% of
Chain 3: the given number of warmup iterations:
Chain 3: init_buffer = 7
Chain 3: adapt_window = 38
Chain 3: term_buffer = 5
Chain 3:
Chain 3: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 3: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 3: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 3: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 3: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 3: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 3: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 3: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 3: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 3: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 3: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 3: Iteration: 100 / 100 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0 seconds (Warm-up)
Chain 3: 0 seconds (Sampling)
Chain 3: 0 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: WARNING: There aren't enough warmup iterations to fit the
Chain 4: three stages of adaptation as currently configured.
Chain 4: Reducing each adaptation stage to 15%/75%/10% of
Chain 4: the given number of warmup iterations:
Chain 4: init_buffer = 7
Chain 4: adapt_window = 38
Chain 4: term_buffer = 5
Chain 4:
Chain 4: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 4: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 4: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 4: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 4: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 4: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 4: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 4: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 4: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 4: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 4: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 4: Iteration: 100 / 100 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0 seconds (Warm-up)
Chain 4: 0 seconds (Sampling)
Chain 4: 0 seconds (Total)
Chain 4:
The more a dependent Markov chain behaves like an independent sample, the smaller the error in the resulting posterior approximation.
Effective sample size (N_{\text{eff}}): the number of independent sample values it would take to produce an equivalently accurate posterior approximation.
Effective sample size ratio:
\frac{N_{\text{eff}}}{N}
We will use the neff_ratio() function to find this ratio.
In our example data,
[1] 0.3462257
[1] 0.4950171
Autocorrelation allows us to evaluate if our Markov chain sufficiently mimics the behavior of an independent sample.
Autocorrelation:
Autocorrelation allows us to evaluate if our Markov chain sufficiently mimics the behavior of an independent sample.
Strong autocorrelation or dependence is a bad thing.
No obvious patterns in the trace plot; dependence is relatively weak.
Autocorrelation plot quickly drops off and is effectively 0 by lag 5.
Confirmation that our Markov chain is mixing quickly.
This is an “unhealthy” Markov chain.
Trace plot shows strong trends \to autocorrelation in the Markov chain values.
Slow decrease in autocorrelation plot indicates that the dependence between chain values does not quickly fade away.
bb_sim, to every tenth value using the thin argument in stan().bb_sim, to every tenth value using the thin argument in stan().thinned_sim <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 5000*2, seed = 84735, thin = 10)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.011 seconds (Warm-up)
Chain 1: 0.013 seconds (Sampling)
Chain 1: 0.024 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.011 seconds (Warm-up)
Chain 2: 0.013 seconds (Sampling)
Chain 2: 0.024 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.011 seconds (Warm-up)
Chain 3: 0.01 seconds (Sampling)
Chain 3: 0.021 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.011 seconds (Warm-up)
Chain 4: 0.012 seconds (Sampling)
Chain 4: 0.023 seconds (Total)
Chain 4:
bb_sim, to every tenth value using the thin argument in stan().\hat{R} \approx \sqrt{\frac{\text{var}_{\text{combined}}}{\text{var}_{\text{within}}}}
rhat() function from the bayesplot package to find \hat{R}.We can see that our simulation is stable.
If we were to find \hat{R} for the other (obviously bad) simulation, it would be 5.35 😱
From the Bayes Rules! textbook:
STA6349 - Applied Bayesian Analysis - Fall 2025