Logistic Regression

Introduction

Last week, we focused on linear regression appropriate for continuous outcomes that are approximately normally distributed.

\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

Interpretation of \hat{\beta}_0 (y-intercept): The expected value of y when all predictors are equal to zero.
Interpretation of \hat{\beta}_i (slopes): The expected change in y for a one-unit increase in x_i, holding all other predictors constant.

Introduction

Today we will cover binary logistic regression.
Categorical data: Logistic

\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

Count data: Poisson/negative binomial

\ln(\hat{y}) = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} + ... + \hat{\beta}_k x_{ik}

Introduction

Without going into statistical theory, the left hand side of the equation is affected by the model’s link function.
- Normal: Identity link
  - We are modeling y directly
- Logistic: Logit link
  - We are modeling the logit of \pi
- Poisson/Negative Binomial: Log link
  - We are modeling the log of y

Introduction

The left hand side dictates how we provide interpretations.
- If we are not modeling y directly, we need to transform back to the original scale for interpretation purposes.
There are other link functions that can be used in generalized linear models, as long as the model converges.
- When done, it is to transform the left hand side to be more interpretable.
Remember that if we have \ln() on the left hand side, we can exponentiate to get back to the original scale.
- This means that our interpretations will be in terms of multiplicative changes rather than additive changes.
- For each increase in x, we will multiply the left hand side by some factor rather than add some amount to it.

Example

Mickey Mouse and his friends have been organizing a series of rescue missions around the clubhouse. Each mission is led by one of four possible leaders (Mickey, Minnie, Donald, or Goofy) and the goal is to determine which factors make a mission most likely to succeed.
For each mission, the team records:
- mission_success: whether the rescue mission succeeded (1) or failed (0),
- character: the leader of the mission (Mickey, Minnie, Donald, or Goofy), and
- stealth_score: a numerical measure (0–10) of how quietly and carefully the mission was executed; higher values indicate more stealthy, well-coordinated missions.
Let’s now set up a logistic regression to predict mission success as a function of who led the mission, the stealth score, and the interaction between the two. In R, our model will be

\text{mission\_success} \sim \text{character} + \text{stealth\_score} + \text{character}:\text{stealth\_score}

Example

Running the model,

mickey_model <- stan_glm(mission_success ~ character + stealth_score + character:stealth_score,
                         data = mickey, 
                         family = binomial,
                         prior_intercept = normal(0, 1, autoscale = TRUE),
                         prior = normal(0, 1, autoscale = TRUE),
                         chains = 4, iter = 5000*2, seed = 8716,
                         prior_PD = FALSE)


SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.377 seconds (Warm-up)
Chain 1:                0.369 seconds (Sampling)
Chain 1:                0.746 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 7e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.377 seconds (Warm-up)
Chain 2:                0.395 seconds (Sampling)
Chain 2:                0.772 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.37 seconds (Warm-up)
Chain 3:                0.408 seconds (Sampling)
Chain 3:                0.778 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.2e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.374 seconds (Warm-up)
Chain 4:                0.343 seconds (Sampling)
Chain 4:                0.717 seconds (Total)
Chain 4:

Example

Examining the results,

tidy(mickey_model, conf.int = TRUE)

Example

This results in the model,

\begin{align*} \ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right) =& \ -0.193 \\ & \ -1.005 \text{ Donald} + 0.228 \text{ Goofy} + 1.917 \text{ Minnie} + 0.296 \text{ stealth} \ \\ & \ + 0.185 \text{ Donald} \times \text{stealth} -0.198 \text{ Goofy}\times \text{stealth} \\ & \ \ \ \ \ \ + 0.403 \text{ Minnie} \times \text{stealth} \end{align*}

Example

Interpreting the coefficients:
- Intercept (\hat{\beta}_0): The log-odds of mission success when Mickey leads (reference category) and the stealth score is 0.
- Character coefficients (\hat{\beta}_1, \hat{\beta}_2, \hat{\beta}_3): The change in log-odds of mission success when Minnie, Donald, or Goofy lead, respectively, compared to Mickey, holding stealth score constant.
- Stealth score coefficient (\hat{\beta}_4): The change in log-odds of mission success for a one-unit increase in stealth score when Mickey leads.
- Interaction coefficients (\hat{\beta}_5, \hat{\beta}_6, \hat{\beta}_7): The additional change in log-odds of mission success for a one-unit increase in stealth score when Minnie, Donald, or Goofy lead, respectively, compared to Mickey.

Example

What if we want to separate into individual models for each friend?

Example

Donald’s model:

Example

Goofy’s model:

Example

Minnie’s model:

Example

Mickey’s model:

Odds Ratios

Recall that we do not want to interpret in terms of the log odds (i.e., in terms of)

\ln\left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)

Instead, we want to interpret in terms of the odds themselves (i.e., in terms of)

\frac{\hat{\pi}}{1-\hat{\pi}}

Odds Ratios

To convert from log-odds to odds, we exponentiate the coefficients.

\text{odds ratio}_i = \exp\{ \hat{\beta}_i \}

Note (1), if we are reporting odds ratios rather than \hat{\beta}_i, then:
- An odds ratio of 1 \to no effect.
- An odds ratio greater than 1 \to increase in odds.
- An odds ratio less than 1 \to decrease in odds.
Note (2), if we are reporting odds ratios rather than \hat{\beta}_i, then we also want to report the intervals for the odds ratios and not the intervals for \hat{\beta}_i.

Example

We can request OR and their corresponding CI this out of tidy(),

tidy(mickey_model, conf.int = TRUE, exponentiate = TRUE)

Odds Ratios

But… we have an interaction in our model?
We can take the same approach as finding the individual slopes!
- Take the bounds and add/subtract as necessary.
Remember that we need to add on the original scale, not the converted odds ratios.

Example

Let’s look at our model results, but only look at the CI results for stealth score and the interactions.

tidy(mickey_model, conf.int = TRUE) %>% 
  select(term, conf.low, conf.high) %>%
  filter(str_detect(term, "stealth"))

Example

For Mickey,
- LB: .000111
- UB: 0.6076
For Donald,
- LB: .000111 - 0.2135 = -0.2134
- UB: 0.6076 + 0.5965 = 1.2041
For Goofy,
- LB: .000111 - 0.5299 = -0.5298
- UB: 0.6076 + 0.1372 = 0.7448
For Minnie,
- LB: .000111 - 0.2653 = -0.2652
- UB: 0.6076 + 0.5519 = 1.1595

Example

Comparing side by side with CI for OR,

Friend	CI for \beta	CI for OR
Mickey	(0.000111, 0.6076)	(1.00, 1.84)
Donald	(-0.2134, 1.204)	(0.81, 3.33)
Goofy	(-0.5298, 0.7448)	(0.59, 2.11)
Minnie	(-0.2652, 1.1595)	(0.77, 3.19)

Example

How would I report the results in a table?

Friend	OR (95% CI for OR)
Mickey	1.23 (1.00, 1.84)
Donald	1.62 (0.81, 3.33)
Goofy	1.11 (0.59, 2.11)
Minnie	1.53 (0.77, 3.19)

For all characters, as stealth score increases, the odds of mission success increase.
The strength of this effect varies by character, but the 95% CIs for all characters include 1, indicating a null effect.

Visualizing the Model

We can put together a data visualization to help us understand what’s going on in our model.
In this example, we will have strength on the x-axis and define lines by the friend.
- i.e., I am graphing the individual models we saw earlier.
First, we have to find predicted values,

mickey <- mickey %>%
  mutate(p_mickey = exp(-0.193 + 0.296*stealth_score) / (1+exp(-0.193 + 0.296*stealth_score)),
         p_minnie = exp(1.724 + 0.426*stealth_score) / (1+exp(1.724 + 0.426*stealth_score)),
         p_donald = exp(-1.198 + 0.481*stealth_score) / (1+exp(-1.198 + 0.481*stealth_score)),
         p_goofy  = exp(0.035 + 0.101*stealth_score) / (1+exp(0.035 + 0.101*stealth_score)))

Example

Then, we can graph using these predicted values

mickey %>% ggplot(aes(x = stealth_score, y = mission_success)) +
  geom_point(size = 3) +
  geom_line(aes(y = p_mickey, color = "black")) +
  geom_line(aes(y = p_minnie, color = "red")) +
  geom_line(aes(y = p_donald, color = "blue")) +
  geom_line(aes(y = p_goofy, color = "green")) +
  labs(x = "Stealth Score",
       y = "Probability of Mission Success",
       color = "Leader") +
  scale_color_manual(labels = c("Mickey", "Minnie", "Donald", "Goofy"),
                     values = c("black", "red", "blue", "green")) +
  theme_bw()

Example

Wrap Up

Binary logistic regression is used for binary outcomes (e.g., success/failure).
Recall we also have
- Ordinal logistic regression \to for ordered categorical outcomes (e.g., low/medium/high).
- Multinomial logistic regression \to for unordered categorical outcomes (e.g., red/blue/green).
Wednesday:
- Poisson and Negative Binomial regression for count data.
- How to “read” subject-matter articles to extract information for analysis.

Practice / Homework

From the Bayes Rules! textbook:
- 13.6, 13.7
- 13.10, 13.11