STA6349: Applied Bayesian Analysis
We have learned how to think like a Bayesian:
We have learned three conjugate families:
Once we have a posterior model, we must be able to apply the results.
f(\theta|y) = \frac{f(\theta) L(\theta|y)}{f(y)} \propto f(\theta)L(\theta|y)
f(y) = \int_{\theta_1} \int_{\theta_2} \cdot \cdot \cdot \int_{\theta_k} f(\theta) L(\theta|y) d\theta_k \cdot \cdot \cdot d\theta_2 d\theta_1
We are going to explore two simulation techniques:
grid approximation
Markov chain Monte Carlo (MCMC)
Either method will produce a sample of N values for \theta.
\left \{ \theta^{(1)}, \theta^{(2)}, ..., \theta^{(N)} \right \}
These \theta_i will have properties that reflect those of the posterior model for \theta.
To help us, we will apply these simulation techniques to Beta-Binomial and Gamma-Poisson models.
Note that these models do not require simulation! We know their posteriors!
That’s why we are starting there – we can link the concepts to what we know. :)
Note: we will use the following packages that may be new to you:
janitor
rstan
bayesplot
If you are using the server provided by HMCSE, they have been installed for you.
If you are working at home, please check to see if you have the libraries, then install if you do not.
install.packages(c("janitor", "rstan", "bayesplot"))
Suppose there is an image that you can’t view in its entirety.
We can see snippets along a grid that sweeps from left to right across the image.
The finer the grid, the clearer the image; if the grid is fine enough, the result is a good approximation.
This is the general idea behind Bayesian grid approximation.
Our target image is the posterior pdf, f(\theta|y).
It is not necessary to observe all possible f(\theta|y) \ \forall \theta for us to understand its structure.
Instead, we evaluate f(\theta|y) at a finite, discrete grid of possible \theta values.
Then, we take random samples from this discretized pdf to approximate the full posterior pdf.
Grid approximation produces a sample of N independent \theta values, \left\{ \theta^{(1)}, \theta^{(2)}, \theta^{(N)} \right\}, from a discretized approximation of the posterior pdf, f(\theta|y).
Algorithm:
Define a discrete grid of possible \theta values.
Evaluate the prior pdf, f(\theta), and the likelihood function, L(\theta|y) at each \theta grid value.
Obtain a discrete approximation of the posterior pdf, f(\theta|y) by:
Calculating the product f(\theta) L(\theta|y) at each \theta grid value,
Normalize the products from (a) to sum to 1 across all \theta.
Randomly sample N \theta grid values with respect to their corresponding normalized posterior probabilities.
\begin{align*} Y|\pi &\sim \text{Bin}(10, \pi) \\ \pi &\sim \text{Beta}(2, 2) \end{align*}
Note that
Y is the number of successes in 10 independent trials.
Every trial has probability of success, \pi.
Our prior understanding about \pi is captured by a \text{Beta}(2,2) model.
If we observe Y = 9 successes, we know that the updated posterior model for \pi is \text{Beta}(11, 3).
Y + \alpha = 9+2
n - Y + \beta = 10-9+2
Instead of using the posterior we know, let’s approximate it using grid approximation.
First step: define a discrete grid of possible \theta values.
Instead of using the posterior we know, let’s approximate it using grid approximation.
First step: define a discrete grid of possible \theta values.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Second step: evaluate the prior pdf, f(\theta), and the likelihood function, L(\theta|y) at each \theta grid value.
dbeta()
and dbinom()
to evaluate the \text{Beta}(2,2) prior and \text{Bin}(10, \pi) likelihood with Y=9 at each \pi in pi_grid
.Instead of using the posterior we know, let’s approximate it using grid approximation.
Second step: evaluate the prior pdf, f(\theta), and the likelihood function, L(\theta|y) at each \theta grid value.
dbeta()
and dbinom()
to evaluate the \text{Beta}(2,2) prior and \text{Bin}(10, \pi) likelihood with Y=9 at each \pi in pi_grid
.Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
Third step: obtain a discrete approximation of the posterior pdf, f(\theta|y) by calculating the product f(\theta) L(\theta|y) at each \theta grid value and normalizing the products to sum to 1 across all \theta.
Instead of using the posterior we know, let’s approximate it using grid approximation.
We now have a glimpse into the actual posterior pdf.
As we increase the number of possible \theta values, the better we can “see” the resulting posterior.
What happens if we try the following: n=50, n=100, n=500, n=1000?
As we increase the number of possible \theta values, the better we can “see” the resulting posterior.
What happens if we try the following: n=50, n=100, n=500, n=1000?
Markov chain Monte Carlo (MCMC) is an application of Markov chains to simulate probability models.
MCMC samples are not taken directly from the posterior pdf, f(\theta | y)… and they are not independent.
Suppose we have an N-length MCMC sample, \left\{ \theta^{(1)}, \theta^{(2)}, \theta^{(3)}, ..., \theta^{(N)} \right\}
\theta^{(2)} is drawn from a model that depends on \theta^{(1)}.
\theta^{(3)} is drawn from a model that depends on \theta^{(2)}.
etc.
f\left( \theta^{(i+1)} | \theta^{(i)}, y \right)
f\left( \theta^{(i+1)} | \theta^{(i)}, y \right) \ne f\left(\theta^{(i+1)}|y \right)
rstan
We will use rstan
:
define the Bayesian model structure in rstan
notation
simulate the posterior
Again, we will use the Beta-Binomial model from earlier.
data
: in our example, Y is the observed number of successes in 10 trials.
rstan
that Y is an integer between 0 and 10.parameters
: in our example, our model depends on \pi.
rstan
that \pi can be any real number between 0 and 1.model
: in our example, we need to specify Y \sim \text{Bin}(10, \pi) and \pi \sim \text{Beta}(2,2).
rstan
rstan
Then, when we go to simulate, we first put in the model information
model code
: the character string defining the model (in our case, bb_model
).
data
: a list of the observed data.
Then, we put in the Markov chain information,
chains
: how many parallel Markov chains to run.
iter
: desired number of iterations, or length of Markov chain.
Half are thrown out as “burn in” samples.
seed
: used to set the seed of the RNG.
rstan
# STEP 2: simulate the posterior
bb_sim <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 5000*2, seed = 84735)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.4)’
using SDK: ‘MacOSX15.1.sdk’
clang -arch arm64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/arm64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
679 | #include <cmath>
| ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 4e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.011 seconds (Warm-up)
Chain 1: 0.012 seconds (Sampling)
Chain 1: 0.023 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.011 seconds (Warm-up)
Chain 2: 0.012 seconds (Sampling)
Chain 2: 0.023 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.011 seconds (Warm-up)
Chain 3: 0.011 seconds (Sampling)
Chain 3: 0.022 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.011 seconds (Warm-up)
Chain 4: 0.012 seconds (Sampling)
Chain 4: 0.023 seconds (Total)
Chain 4:
rstan
, , parameters = pi
chains
iterations chain:1 chain:2 chain:3 chain:4
[1,] 0.8278040 0.7523560 0.6386998 0.9627572
[2,] 0.9087546 0.8922386 0.6254761 0.9413518
[3,] 0.6143859 0.8682356 0.7222360 0.9489624
[4,] 0.8462156 0.8792812 0.8100192 0.9413812
Remember, these are not a random sample from the posterior!
They are also not independent!
Each chain forms a dependent 5,000 length Markov chain of \left\{ \pi^{(1)}, \pi^{(2)}, ..., \pi^{(5000)}\right\}
rstan
mcmc_trace()
from bayesplot
package) to see what the values did longitudinally.rstan
mcmc_hist()
and mcmc_dens()
functions,Simulations are not perfect…
What does a good Markov chain look like?
How can we tell if the Markov chain sample produces a reasonable approximation of the posterior?
How big should our Markov chain sample size be?
Unfortunately there is no one answer here.
Let’s now discuss diagnostic tools.
Trace plots
Parallel chains
Effective sample size
Autocorrelation
\hat{R}
Chain A has not stabilized after 5000 iterations.
It has not “found” or does not know how to explore the range of posterior plausible \pi values.
The downward trend also hints against independent noise.
We say that Chain A is mixing slowly.
Realistically, we are only going to do simulations when we can’t specify the posterior and must approximate
If we see bad trace plots:
Check the model (… or your code). Are the assumed prior and data models appropriate?
Run the chain for more iterations. Sometimes we just need a longer run to iron out issues.
bb_sim_short <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 50*2, seed = 84735)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: There aren't enough warmup iterations to fit the
Chain 1: three stages of adaptation as currently configured.
Chain 1: Reducing each adaptation stage to 15%/75%/10% of
Chain 1: the given number of warmup iterations:
Chain 1: init_buffer = 7
Chain 1: adapt_window = 38
Chain 1: term_buffer = 5
Chain 1:
Chain 1: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 1: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 1: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 1: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 1: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 1: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 1: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 1: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 1: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 1: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 1: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 1: Iteration: 100 / 100 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0 seconds (Warm-up)
Chain 1: 0 seconds (Sampling)
Chain 1: 0 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: WARNING: There aren't enough warmup iterations to fit the
Chain 2: three stages of adaptation as currently configured.
Chain 2: Reducing each adaptation stage to 15%/75%/10% of
Chain 2: the given number of warmup iterations:
Chain 2: init_buffer = 7
Chain 2: adapt_window = 38
Chain 2: term_buffer = 5
Chain 2:
Chain 2: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 2: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 2: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 2: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 2: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 2: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 2: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 2: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 2: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 2: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 2: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 2: Iteration: 100 / 100 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0 seconds (Warm-up)
Chain 2: 0 seconds (Sampling)
Chain 2: 0 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: WARNING: There aren't enough warmup iterations to fit the
Chain 3: three stages of adaptation as currently configured.
Chain 3: Reducing each adaptation stage to 15%/75%/10% of
Chain 3: the given number of warmup iterations:
Chain 3: init_buffer = 7
Chain 3: adapt_window = 38
Chain 3: term_buffer = 5
Chain 3:
Chain 3: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 3: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 3: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 3: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 3: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 3: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 3: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 3: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 3: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 3: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 3: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 3: Iteration: 100 / 100 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0 seconds (Warm-up)
Chain 3: 0 seconds (Sampling)
Chain 3: 0 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: WARNING: There aren't enough warmup iterations to fit the
Chain 4: three stages of adaptation as currently configured.
Chain 4: Reducing each adaptation stage to 15%/75%/10% of
Chain 4: the given number of warmup iterations:
Chain 4: init_buffer = 7
Chain 4: adapt_window = 38
Chain 4: term_buffer = 5
Chain 4:
Chain 4: Iteration: 1 / 100 [ 1%] (Warmup)
Chain 4: Iteration: 10 / 100 [ 10%] (Warmup)
Chain 4: Iteration: 20 / 100 [ 20%] (Warmup)
Chain 4: Iteration: 30 / 100 [ 30%] (Warmup)
Chain 4: Iteration: 40 / 100 [ 40%] (Warmup)
Chain 4: Iteration: 50 / 100 [ 50%] (Warmup)
Chain 4: Iteration: 51 / 100 [ 51%] (Sampling)
Chain 4: Iteration: 60 / 100 [ 60%] (Sampling)
Chain 4: Iteration: 70 / 100 [ 70%] (Sampling)
Chain 4: Iteration: 80 / 100 [ 80%] (Sampling)
Chain 4: Iteration: 90 / 100 [ 90%] (Sampling)
Chain 4: Iteration: 100 / 100 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0 seconds (Warm-up)
Chain 4: 0 seconds (Sampling)
Chain 4: 0 seconds (Total)
Chain 4:
The more a dependent Markov chain behaves like an independent sample, the smaller the error in the resulting posterior approximation.
Effective sample size (N_{\text{eff}}): the number of independent sample values it would take to produce an equivalently accurate posterior approximation.
Effective sample size ratio:
\frac{N_{\text{eff}}}{N}
We will use the neff_ratio()
function to find this ratio.
In our example data,
[1] 0.3462257
[1] 0.4950171
Autocorrelation allows us to evaluate if our Markov chain sufficiently mimics the behavior of an independent sample.
Autocorrelation:
Lag 1 autocorrelation measures the correlation between pairs of Markov chain values that are one “step” apart (e.g., \pi_i and \pi_{(i-1)}; e.g., \pi_4 and \pi_3).
Lag 2 autocorrelation measures the correlation between pairs of Markov chain values that are two “steps apart (e.g., \pi_i and \pi_{(i-2)}; e.g., \pi_4 and \pi_2).
Lag k autocorrelation measures the correlation between pairs of Markov chain values that are k “steps” apart (e.g., \pi_i and \pi_{(i-k)}; e.g., \pi_4 and \pi_{(4-k)}).
Strong autocorrelation or dependence is a bad thing.
It goes hand in hand with small effective sample size ratios.
These provide a warning sign that our resulting posterior approximations might be unreliable.
No obvious patterns in the trace plot; dependence is relatively weak.
Autocorrelation plot quickly drops off and is effectively 0 by lag 5.
Confirmation that our Markov chain is mixing quickly.
i.e., quickly moving around the range of posterior plausible \pi values
i.e., at least mimicking an independent sample.
This is an “unhealthy” Markov chain.
Trace plot shows strong trends \to autocorrelation in the Markov chain values.
Slow decrease in autocorrelation plot indicates that the dependence between chain values does not quickly fade away.
Fast mixing chains:
The chains move “quickly” around the range of posterior plausible values
The autocorrelation among the chain values drops off quickly.
The effective sample size ratio is reasonably large.
Slow mixing chains:
The chains move “slowly” around the range of posterior plauslbe values.
The autocorrelation among the chainv alues drops off very slowly.
The effective sample size ratio is small.
What do we do if we have a slow mixing chain?
Increase the chain size :)
Thin the Markov chain :|
bb_sim
, to every tenth value using the thin
argument in stan()
.bb_sim
, to every tenth value using the thin
argument in stan()
.thinned_sim <- stan(model_code = bb_model, data = list(Y = 9),
chains = 4, iter = 5000*2, seed = 84735, thin = 10)
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 2e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.009 seconds (Warm-up)
Chain 1: 0.01 seconds (Sampling)
Chain 1: 0.019 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.009 seconds (Warm-up)
Chain 2: 0.011 seconds (Sampling)
Chain 2: 0.02 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.009 seconds (Warm-up)
Chain 3: 0.009 seconds (Sampling)
Chain 3: 0.018 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.009 seconds (Warm-up)
Chain 4: 0.01 seconds (Sampling)
Chain 4: 0.019 seconds (Total)
Chain 4:
bb_sim
, to every tenth value using the thin
argument in stan()
.Warning!
The benefits of reduced autocorrelation do not necessarily outweigh the loss of chain values.
i.e., 5,000 Markov chain values with stronger autocorrelation may be a better posterior approximation than 500 chain values with weaker autocorrelation.
The effectiveness depends on the algorithm used to construct the Markov chain.
\hat{R} \approx \sqrt{\frac{\text{var}_{\text{combined}}}{\text{var}_{\text{within}}}}
where
\text{var}_{\text{combined}} is the variability in \theta across all chains combined.
\text{var}_{\text{within}} is the typical variability within any individual chain.
\hat{R} compares the variability in sampled \theta values across all chains combined with the variability within each individual change.
Ideally, \hat{R} \approx 1, showing stability across chains.
\hat{R} > 1 indicates instability with the variability in the combined chains larger than that of the variability within the chains.
\hat{R} > 1.05 raises red flags about the stability of the simulation.
rhat()
function from the bayesplot
package to find \hat{R}.We can see that our simulation is stable.
If we were to find \hat{R} for the other (obviously bad) simulation, it would be 5.35 😱
6.5
6.6
6.7
6.13
6.14
6.15
6.17