Negative Binomial Regression

Introduction

  • We have previously discussed continuous outcomes:

    • y \sim N(\mu, \sigma^2)
    • y \sim \text{gamma}(\mu, \gamma)
    • y \sim \text{beta}(\alpha, \beta)
    • y \sim \text{binomial}(n, \pi)
    • y \sim \text{multinomial}(n, [\pi_1, ... \pi_c])
  • This week, we will consider count outcomes:

    • Poisson
    • Negative binomial

Negative Binomial Regression

  • We can model count outcomes using negative binomial regression,

\ln\left( y \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k

  • This is similar to Poisson regression.

    • For interpretations, we are not interested in the natural log of the count.
    • We will again interpret in terms of incident rate ratios (IRR).

Negative Binomial Regression

  • Negative binomial regression is used specifically for count response variables.

    • Examples:

      • Number of votes received
      • Number of attempts before success
  • How is this different than Poisson regression?

Negative Binomial Regression

  • How is this different than Poisson regression?

    • … it is and isn’t.
  • Poisson regression assumes that the mean and variance of the count variable are equal (equidispersion).

    • When we have overdispersion (mean > variance), the negative binomial regression can be a better fit.
  • Negative binomial regression includes an additional parameter, \theta, to account for overdispersion.

    • As \theta \to \infty, the negative binomial distribution approaches the Poisson distribution.
    • As \theta \to 0, the variance increases, allowing for more overdispersion.

Negative Binomial Distribution

Negative binomial distribution with varying μ and θ

Poisson or Negative Binomial?

  • How do I determine which to use?

  • Easy check: mean and variance.

    • Literally, mean() and var().
mean(dataset_name$count_variable)
var(dataset_name$count_variable)
  • Another way to eyeball: histogram.

    • We will see very skewed data if the mean > variance.
hist(dataset_name$count_variable)

Poisson or Negative Binomial?

  • Wait – why do we care?

  • When data is overdispersed, we know that the standard error for \beta_i is underestimated in Poisson regression.

    • The SE is in the denominator of the test statistic. Smaller value \to higher test statistic.
    • Larger test statistic \to smaller p-value.
    • Smaller p-value \to easier to reject the null hypothesis.
    • Rejecting when we should not is a Type I error.
  • When we apply the negative binomial instead, the overdispersion parameter remedies this issue.

Lecture Example Set Up

  • Pluto spends his days at Mickey’s park chasing squirrels and interacting with guests. Disney researchers are interested in understanding what factors influence the number of squirrels Pluto chases per hour.

  • For 300 observation periods, researchers recorded:

    • squirrels_chased: Number of squirrels chased during the observation period
    • temperature: Temperature (°F)
    • crowd: Park crowd level (guests per 100 people)
    • mickey_present: Whether Mickey is present (Yes/No)
    • time_of_day: Time of day (Morning, Afternoon, Evening)

Lecture Example Set Up

  • Pulling in the data,
pluto <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W6_pluto.csv")
pluto %>% head()

Example 1: Poisson Regression

  • Recall the first model from last lecture: the number of squirrels chased as a function of temperature, crowd, and their interaction.
m1_poi <- glm(squirrels_chased ~ temperature + crowd + temperature:crowd,
                 data = pluto,
                 family = poisson(link = "log"))
m1_poi %>% tidy()

Example 1: Poisson Regression

  • Let’s now examine the outcome,
mean(pluto$squirrels_chased)
[1] 41.36333
var(pluto$squirrels_chased)
[1] 345.6903
hist(pluto$squirrels_chased)
Histogram of squirrels chased
  • There is definitely overdispersion in this dataset.

Negative Binomial Regression (R)

  • To perform negative binomial regression, we need the glm.nb() function from the MASS package.

    • MASS is another package I will not load in, personally. I call the function directly from the package,
model <- MASS::glm.nb(y ~ x_1 + x_2 + ... + x_k,
                      data = dataset_name)
  • Again, we can request the incident rate ratios directly out of tidy(),
model %>% tidy(conf.int = TRUE, exponentiate = TRUE)
  • Then, everything else we have learned follows as normal.

Example 1: Negative Binomial Regression

  • Modeling using negative binomial instead,
m1_nb <- MASS::glm.nb(squirrels_chased ~ temperature + crowd + temperature:crowd,
                      data = pluto)
m1_nb %>% tidy()

Example 1: Comparison

  • Poisson:
  • Negative Binomial:

Example 1: Comparison

  • Note that the coefficients are not very different…
summary(m1_poi)$coefficients[, 1]
      (Intercept)       temperature             crowd temperature:crowd 
     0.1857624458      0.0363002481      0.0393306623     -0.0003112087 
summary(m1_nb)$coefficients[, 1]
      (Intercept)       temperature             crowd temperature:crowd 
     0.2049441484      0.0358999533      0.0387828971     -0.0003010588 
  • while the standard errors are different…
summary(m1_poi)$coefficients[, 2]
      (Intercept)       temperature             crowd temperature:crowd 
     3.277795e-01      4.313848e-03      6.113214e-03      8.055095e-05 
summary(m1_nb)$coefficients[, 2]
      (Intercept)       temperature             crowd temperature:crowd 
     0.6472187867      0.0086521525      0.0125807573      0.0001681197 

Example 1: Comparison

  • … and the p-values are, too.
summary(m1_poi)$coefficients[, 4]
      (Intercept)       temperature             crowd temperature:crowd 
     5.708977e-01      3.935022e-17      1.245243e-10      1.117731e-04 
summary(m1_nb)$coefficients[, 4]
      (Intercept)       temperature             crowd temperature:crowd 
     7.515065e-01      3.335648e-05      2.051211e-03      7.333480e-02 
  • Let’s pay particularly close attention to the interaction term…

Statistical Inference

  • Our approach to determining significance remains the same.

    • For continuous and binary predictors, we can use the Wald test (z-test from tidy()).

    • For omnibus tests, we can use the likelihood ratio test (LRT).

      • Significance of regression line (full/reduced LRT).
      • Significance of categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).
      • Interaction terms with categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).

Example 1: Significant Regression Line

  • Let’s determine if this is a significant regression line.
m1_full <- MASS::glm.nb(squirrels_chased ~ temperature + crowd + temperature:crowd, data = pluto)
m1_reduced <- MASS::glm.nb(squirrels_chased ~ 1, data = pluto)
anova(m1_reduced, m1_full, test = "Chisq")
  • Yes, this is a significant regression line (p < 0.001).

Example 1: Testing the Interaction

  • Let’s now determine if the interaction is significant,
m1_nb %>% car::Anova(type = 3)
  • No! The interaction between temperature and crowd is not significant (p = 0.075).

Example 1: Removing the Interaction

  • Rerunning our model without the interaction,
m1_final <- MASS::glm.nb(squirrels_chased ~ temperature + crowd, data = pluto)
m1_final %>% tidy()

\ln\left( y \right) = 1.32 + 0.02 \text{ temp} + 0.02 \text{ crowd}

Interpreting the Model

  • Recall the negative binomial regression model,

\ln\left( y \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k

  • We are modeling the log of the count.

    • Interpreting \hat{\beta}_i is an additive effect on the count.

    • Interpreting e^{\hat{\beta}_i} is a multiplicative effect on the count.

  • Thus, when interpreting the slope for x_i, we will look at the Incident Rate Ratio, e^{\hat{\beta}_i}.

Incident Rate Ratios (R)

  • Like in Poisson regression, we can request the incident rate ratios directly out of tidy(),
model %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Example 1: Interpreting the Model

  • Finding our incident rate ratios,
m1_final %>% tidy(conf.int = TRUE, exponentiate = TRUE)
  • In table form,
Predictor IRR (95% CI for IRR) p-value
Temperature 1.021 (1.017, 1.026) < 0.001
Crowd 1.017 (1.014, 1.019) < 0.001

Example 1: Interpreting the Model

  • Let’s interpret the incident rate ratios.
Predictor IRR (95% CI for IRR) p-value
Temperature 1.021 (1.017, 1.026) < 0.001
Crowd 1.017 (1.014, 1.019) < 0.001
  • For every 1°F increase in temperature, the expected number of squirrels chased increases by a factor of 1.021, holding crowd constant. This is a 2.1% increase.

  • For every 100 additional guests in the park, the expected number of squirrels chased increases by a factor of 1.017, holding temperature constant. This is a 1.7% increase.

Wrap Up

  • In this lecture, we have introduced negative binomial regression.

    • Negative binomial regression is appropriate for overdispersed count outcomes.
  • We again saw the log link function.

    • We interpret coefficients as multiplicative effects on the count of the outcome.

    • These are called the incident rate ratios (IRR).

  • In the next lecture, we will review visualization of Poisson and negative binomial models.

  • Next week, we will bring all topics together and review choosing the appropriate regression model for your data.