Poisson and Negative Binomial Regressions

Introduction

  • Continuous data: Linear

\hat{y_i} = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

  • Categorical data: Logistic

\ln\left(\frac{\hat{\pi_i}}{1-\hat{\pi_i}}\right) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

  • Count data: Poisson/negative binomial

\ln(\hat{y}_i) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

Poisson Regression

  • Generic code for Poisson regression:
posterior_model <- stan_glm(y ~ x1 + x2 + ... + xk, 
                            data = dataset_name, 
                            family = poisson,
                            prior_intercept = normal(0, 1, autoscale = TRUE),
                            prior = normal(0, 1, autoscale = TRUE), 
                            chains = 4, iter = 5000*2, seed = 84735, 
                            prior_PD = FALSE)

Example: Set Up

  • After a few successful rescue missions, Mickey Mouse and his friends noticed something strange: every time they set out on a new mission, the number of alert beacons that went off around the clubhouse varied. Sometimes the team moved silently, but other times the entire operation echoed with crashes, clanks, and Donald’s famous quacking. To understand what’s going on, the team started tracking three pieces of information for each mission:

    • alerts_triggered: the number of alert beacons that went off during the mission (possible values: 0, 1, 2, 3, …).
    • character: which character led the mission (Mickey, Minnie, Donald, or Goofy).
    • stealth_score: how carefully the team tried to move on a scale from 0 to 10, where higher values mean a quieter, more careful mission.

Example: Poisson

poisson_posterior <- stan_glm(alerts_triggered ~ character + stealth_score, 
                            data = friends, 
                            family = poisson,
                            prior_intercept = normal(0, 1, autoscale = TRUE),
                            prior = normal(0, 1, autoscale = TRUE), 
                            chains = 4, iter = 5000*2, seed = 84735, 
                            prior_PD = FALSE)

SAMPLING FOR MODEL 'count' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.15 seconds (Warm-up)
Chain 1:                0.172 seconds (Sampling)
Chain 1:                0.322 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 8e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.147 seconds (Warm-up)
Chain 2:                0.171 seconds (Sampling)
Chain 2:                0.318 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.15 seconds (Warm-up)
Chain 3:                0.17 seconds (Sampling)
Chain 3:                0.32 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 6e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.144 seconds (Warm-up)
Chain 4:                0.181 seconds (Sampling)
Chain 4:                0.325 seconds (Total)
Chain 4: 

Example: Poisson

  • Examining the results,
tidy(poisson_posterior, conf.int = TRUE, exponentiate = FALSE)

\ln(\hat{y}) = 1.48 + 0.51 \text{ Donald} + 0.26 \text{ Goofy} - 0.58 \text{ Minnie} - 0.24 \text{ stealth}

Example: Poisson

  • We are interested in the multiplicative effect on the expected count, so we exponentiate the coefficients and credible intervals:
tidy(poisson_posterior, conf.int = TRUE, exponentiate = TRUE)

Example: Poisson

  • How would I report the results in a table?
Predictor IRR (95% CI)
Donald 1.67 (1.21, 2.30)
Goofy 1.30 (0.95, 1.80)
Minnie 0.56 (0.36, 0.85)
Stealth 0.79 (0.75, 0.84)

Example: Poisson

  • Incident Rate Ratios (IRR): multiplicative effect on the expected count for a one unit increase in the predictor, holding other predictors constant.

  • As compared to Mickey,

    • Donald is expected to trigger 1.67 times as many alerts (67% more).
    • Goofy is expected to trigger 1.30 times as many alerts (30% more).
    • Minnie is expected to trigger 0.56 times as many alerts (44% fewer).
  • As stealth score increases, the expected number of alerts triggered decreases.

    • For a one-unit increase in stealth-score, the expected number of alerts triggered decreases by 21%.

Example: Poisson

  • To visualize our model, we first find predicted values,
friends <- friends %>%
  mutate(p_mickey = exp(1.48-0.235*stealth_score),
         p_donald = exp(1.48+0.51-0.235*stealth_score),
         p_goofy  = exp(1.48+0.26-0.235*stealth_score),
         p_minnie = exp(1.48-0.58-0.235*stealth_score))
  • Then, construct the ggplot,
friends %>% ggplot(aes(x = stealth_score, y = alerts_triggered)) +
  geom_point(size = 3) +
  geom_line(aes(y = p_mickey, color = "Mickey")) +
  geom_line(aes(y = p_donald, color = "Donald")) +
  geom_line(aes(y = p_goofy,  color = "Goofy")) +
  geom_line(aes(y = p_minnie, color = "Minnie")) +
  labs(x = "Stealth Score",
       y = "Expected Number of Alerts Triggered",
       color = "Character") +
  theme_bw()

Example: Poisson

Negative Binomial Regression

  • Generic code for negative binomial regression:
posterior_model <- stan_glm(y ~ x1 + x2 + ... + xk, 
                            data = dataset_name, 
                            family = neg_binomial_2,
                            prior_intercept = normal(0, 1, autoscale = TRUE),
                            prior = normal(0, 1, autoscale = TRUE), 
                            chains = 4, iter = 5000*2, seed = 84735, 
                            prior_PD = FALSE)

Example: Set Up

  • After weeks of studying the clubhouse rescue missions, Mickey and friends have noticed a problem: some missions are wildly unpredictable. When Donald Duck takes charge, the number of alert beacons that go off during a mission varies dramatically. On some days, his missions are surprisingly quiet. On others, Donald’s shouting and tripping set off nearly every beacon in the clubhouse!
friends2 %>% summarize(mean(alerts_triggered),
                       var(alerts_triggered))
  • We can see that the mean and variance do not meet the assumption for the Poisson distribution.

Example: Negative Binomial

negbin_posterior <- stan_glm(alerts_triggered ~ character + stealth_score, 
                             data = friends2, 
                             family = neg_binomial_2,
                             prior_intercept = normal(0, 1, autoscale = TRUE),
                             prior = normal(0, 1, autoscale = TRUE), 
                             chains = 4, iter = 5000*2, seed = 84735, 
                             prior_PD = FALSE)

SAMPLING FOR MODEL 'count' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.58 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.316 seconds (Warm-up)
Chain 1:                0.462 seconds (Sampling)
Chain 1:                0.778 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.323 seconds (Warm-up)
Chain 2:                0.384 seconds (Sampling)
Chain 2:                0.707 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.32 seconds (Warm-up)
Chain 3:                0.418 seconds (Sampling)
Chain 3:                0.738 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'count' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.3e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.306 seconds (Warm-up)
Chain 4:                0.367 seconds (Sampling)
Chain 4:                0.673 seconds (Total)
Chain 4: 

Example: Negative Binomial

tidy(negbin_posterior, conf.int = TRUE, exponentiate = FALSE)

\ln(\hat{y}) = 1.39 - 0.10 \text{ Donald} + 0.08 \text{ Goofy} - 1.20 \text{ Minnie} - 0.15 \text{ stealth}

Example: Negative Binomial

  • We are interested in the multiplicative effect on the expected count, so we exponentiate the coefficients and credible intervals:
tidy(negbin_posterior, conf.int = TRUE, exponentiate = TRUE)

Example: Negative Binomial

  • How would I report the results in a table?
Predictor IRR (95% CI)
Donald 0.91 (0.60, 1.37)
Goofy 1.08 (0.71, 1.63)
Minnie 0.30 (0.18, 0.48)
Stealth 0.86 (0.80, 0.93)

Example: Negative Binomial

  • Incident Rate Ratios (IRR): multiplicative effect on the expected count for a one unit increase in the predictor, holding other predictors constant.

  • As compared to Mickey,

    • Donald is expected to trigger 0.91 times as many alerts (9% fewer).
    • Goofy is expected to trigger 1.08 times as many alerts (8% increase).
    • Minnie is expected to trigger 0.30 times as many alerts (70% fewer).
  • As stealth score increases, the expected number of alerts triggered decreases.

    • For a one-unit increase in stealth-score, the expected number of alerts triggered decreases by 14%.

Example: Negative Binomial

  • To visualize our model, we first find predicted values,

() = 1.39 - 0.10 + 0.08 - 1.20 - 0.15

friends2 <- friends2 %>%
  mutate(p_mickey = exp(1.39 - 0.15*stealth_score),
         p_donald = exp(1.39 - 0.10 -0.15*stealth_score),
         p_goofy  = exp(1.39 + 0.08 -0.15*stealth_score),
         p_minnie = exp(1.39 - 1.20 -0.15*stealth_score))
  • Then, construct the ggplot,
friends2 %>% ggplot(aes(x = stealth_score, y = alerts_triggered)) +
  geom_point(size = 3) +
  geom_line(aes(y = p_mickey, color = "Mickey")) +
  geom_line(aes(y = p_donald, color = "Donald")) +
  geom_line(aes(y = p_goofy,  color = "Goofy")) +
  geom_line(aes(y = p_minnie, color = "Minnie")) +
  labs(x = "Stealth Score",
       y = "Expected Number of Alerts Triggered",
       color = "Character") +
  theme_bw()

Example: Negative Binomial

Wrap Up

  • Poisson and negative binomial regressions are used to model count data.
    • Poisson: mean = variance.
    • Negative binomial: variance > mean (overdispersion).
  • Coefficients are interpreted as log-incident rate ratios (log-IRR).
    • Exponentiating coefficients gives incident rate ratios (IRR).
  • Rest of lecture:
    • How to “read” subject-matter articles to extract information for analysis.
      • Worksheet due Tuesday next week.
  • Next week:
    • Monday: Assignment 3 (Regression Analysis)
    • Wednesday: Meet with EES collaborator (no formal class). Discuss methodology and their research objective(s).

Practice / Homework

  • From the Bayes Rules! textbook:

    • 12.5, 12.6, 12.7, 12.8, 12.9