Logistic Regression

Introduction

  • Last week, we focused on linear regression appropriate for continuous outcomes that are approximately normally distributed.

\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

  • Interpretation of \hat{\beta}_0 (y-intercept): The expected value of y when all predictors are equal to zero.

  • Interpretation of \hat{\beta}_i (slopes): The expected change in y for a one-unit increase in x_i, holding all other predictors constant.

Introduction

  • Today we will cover binary logistic regression.

  • Categorical data: Logistic

\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

  • Count data: Poisson/negative binomial

\ln(\hat{y}) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

Introduction

  • Without going into statistical theory, the left hand side of the equation is affected by the model’s link function.
    • Normal: Identity link
      • We are modeling y directly
    • Logistic: Logit link
      • We are modeling the logit of \pi
    • Poisson/Negative Binomial: Log link
      • We are modeling the log of y

Introduction

  • The left hand side dictates how we provide interpretations.
    • If we are not modeling y directly, we need to transform back to the original scale for interpretation purposes.
  • There are other link functions that can be used in generalized linear models, as long as the model converges.
    • When done, it is to transform the left hand side to be more interpretable.
  • Remember that if we have \ln() on the left hand side, we can exponentiate to get back to the original scale.
    • This means that our interpretations will be in terms of multiplicative changes rather than additive changes.
    • For each increase in x, we will multiply the left hand side by some factor rather than add some amount to it.

Example

  • Mickey Mouse and his friends have been organizing a series of rescue missions around the clubhouse. Each mission is led by one of four possible leaders (Mickey, Minnie, Donald, or Goofy) and the goal is to determine which factors make a mission most likely to succeed.

  • For each mission, the team records:

    • mission_success: whether the rescue mission succeeded (1) or failed (0),
    • character: the leader of the mission (Mickey, Minnie, Donald, or Goofy), and
    • stealth_score: a numerical measure (0–10) of how quietly and carefully the mission was executed; higher values indicate more stealthy, well-coordinated missions.
  • Let’s now set up a logistic regression to predict mission success as a function of who led the mission, the stealth score, and the interaction between the two. In R, our model will be

\text{mission\_success} \sim \text{character} + \text{stealth\_score} + \text{character}:\text{stealth\_score}

Example

  • Running the model,
mickey_model <- stan_glm(mission_success ~ character + stealth_score + character:stealth_score,
                         data = mickey, 
                         family = binomial,
                         prior_intercept = normal(0, 1, autoscale = TRUE),
                         prior = normal(0, 1, autoscale = TRUE),
                         chains = 4, iter = 5000*2, seed = 8716,
                         prior_PD = FALSE)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.377 seconds (Warm-up)
Chain 1:                0.369 seconds (Sampling)
Chain 1:                0.746 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 7e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.377 seconds (Warm-up)
Chain 2:                0.395 seconds (Sampling)
Chain 2:                0.772 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.37 seconds (Warm-up)
Chain 3:                0.408 seconds (Sampling)
Chain 3:                0.778 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.2e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.374 seconds (Warm-up)
Chain 4:                0.343 seconds (Sampling)
Chain 4:                0.717 seconds (Total)
Chain 4: 

Example

  • Examining the results,
tidy(mickey_model, conf.int = TRUE)

Example

  • This results in the model,

\begin{align*} \ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right) =& \ -0.193 \\ & \ -1.005 \text{ Donald} + 0.228 \text{ Goofy} + 1.917 \text{ Minnie} + 0.296 \text{ stealth} \ \\ & \ + 0.185 \text{ Donald} \times \text{stealth} -0.198 \text{ Goofy}\times \text{stealth} \\ & \ \ \ \ \ \ + 0.403 \text{ Minnie} \times \text{stealth} \end{align*}

Example

  • Interpreting the coefficients:
    • Intercept (\hat{\beta}_0): The log-odds of mission success when Mickey leads (reference category) and the stealth score is 0.
    • Character coefficients (\hat{\beta}_1, \hat{\beta}_2, \hat{\beta}_3): The change in log-odds of mission success when Minnie, Donald, or Goofy lead, respectively, compared to Mickey, holding stealth score constant.
    • Stealth score coefficient (\hat{\beta}_4): The change in log-odds of mission success for a one-unit increase in stealth score when Mickey leads.
    • Interaction coefficients (\hat{\beta}_5, \hat{\beta}_6, \hat{\beta}_7): The additional change in log-odds of mission success for a one-unit increase in stealth score when Minnie, Donald, or Goofy lead, respectively, compared to Mickey.

Example

  • What if we want to separate into individual models for each friend?

Example

  • Donald’s model:

Example

  • Goofy’s model:

Example

  • Minnie’s model:

Example

  • Mickey’s model:

Odds Ratios

  • Recall that we do not want to interpret in terms of the log odds (i.e., in terms of)

\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)

  • Instead, we want to interpret in terms of the odds themselves (i.e., in terms of)

\frac{\hat{\pi}}{1-\hat{\pi}}

Odds Ratios

  • To convert from log-odds to odds, we exponentiate the coefficients.

\text{odds ratio}_i = \exp\{ \hat{\beta}_i \}

  • Note (1), if we are reporting odds ratios rather than \hat{\beta}_i, then:
    • An odds ratio of 1 \to no effect.
    • An odds ratio greater than 1 \to increase in odds.
    • An odds ratio less than 1 \to decrease in odds.
  • Note (2), if we are reporting odds ratios rather than \hat{\beta}_i, then we also want to report the intervals for the odds ratios and not the intervals for \hat{\beta}_i.

Example

  • We can request OR and their corresponding CI this out of tidy(),
tidy(mickey_model, conf.int = TRUE, exponentiate = TRUE)

Odds Ratios

  • But… we have an interaction in our model?

  • We can take the same approach as finding the individual slopes!

    • Take the bounds and add/subtract as necessary.
  • Remember that we need to add on the original scale, not the converted odds ratios.

Example

  • Let’s look at our model results, but only look at the CI results for stealth score and the interactions.
tidy(mickey_model, conf.int = TRUE) %>% 
  select(term, conf.low, conf.high) %>%
  filter(str_detect(term, "stealth")) 

Example

  • For Mickey,
    • LB: .000111
    • UB: 0.6076
  • For Donald,
    • LB: .000111 - 0.2135 = -0.2134
    • UB: 0.6076 + 0.5965 = 1.2041
  • For Goofy,
    • LB: .000111 - 0.5299 = -0.5298
    • UB: 0.6076 + 0.1372 = 0.7448
  • For Minnie,
    • LB: .000111 - 0.2653 = -0.2652
    • UB: 0.6076 + 0.5519 = 1.1595

Example

  • Comparing side by side with CI for OR,
Friend CI for \beta CI for OR
Mickey (0.000111, 0.6076) (1.00, 1.84)
Donald (-0.2134, 1.204) (0.81, 3.33)
Goofy (-0.5298, 0.7448) (0.59, 2.11)
Minnie (-0.2652, 1.1595) (0.77, 3.19)

Example

  • How would I report the results in a table?
Friend OR (95% CI for OR)
Mickey 1.23 (1.00, 1.84)
Donald 1.62 (0.81, 3.33)
Goofy 1.11 (0.59, 2.11)
Minnie 1.53 (0.77, 3.19)
  • For all characters, as stealth score increases, the odds of mission success increase.
  • The strength of this effect varies by character, but the 95% CIs for all characters include 1, indicating a null effect.

Visualizing the Model

  • We can put together a data visualization to help us understand what’s going on in our model.

  • In this example, we will have strength on the x-axis and define lines by the friend.

    • i.e., I am graphing the individual models we saw earlier.
  • First, we have to find predicted values,

mickey <- mickey %>%
  mutate(p_mickey = exp(-0.193 + 0.296*stealth_score) / (1+exp(-0.193 + 0.296*stealth_score)),
         p_minnie = exp(1.724 + 0.426*stealth_score) / (1+exp(1.724 + 0.426*stealth_score)),
         p_donald = exp(-1.198 + 0.481*stealth_score) / (1+exp(-1.198 + 0.481*stealth_score)),
         p_goofy  = exp(0.035 + 0.101*stealth_score) / (1+exp(0.035 + 0.101*stealth_score)))

Example

  • Then, we can graph using these predicted values
mickey %>% ggplot(aes(x = stealth_score, y = mission_success)) +
  geom_point(size = 3) +
  geom_line(aes(y = p_mickey, color = "black")) +
  geom_line(aes(y = p_minnie, color = "red")) +
  geom_line(aes(y = p_donald, color = "blue")) +
  geom_line(aes(y = p_goofy, color = "green")) +
  labs(x = "Stealth Score",
       y = "Probability of Mission Success",
       color = "Leader") +
  scale_color_manual(labels = c("Mickey", "Minnie", "Donald", "Goofy"),
                     values = c("black", "red", "blue", "green")) +
  theme_bw()

Example

Wrap Up

  • Binary logistic regression is used for binary outcomes (e.g., success/failure).

  • Recall we also have

    • Ordinal logistic regression \to for ordered categorical outcomes (e.g., low/medium/high).
    • Multinomial logistic regression \to for unordered categorical outcomes (e.g., red/blue/green).
  • Wednesday:

    • Poisson and Negative Binomial regression for count data.
    • How to “read” subject-matter articles to extract information for analysis.

Practice / Homework