Binary Logistic Regression

Introduction

  • We have previously discussed continuous outcomes:

    • y \sim N(\mu, \sigma^2)
    • y \sim \text{gamma}(\mu, \gamma)
    • y \sim \text{beta}(\alpha, \beta)
  • This week, we will consider categorical outcomes:

    • Binary
    • Multinomial

Binary Logistic Regression

  • We model binary outcomes using logistic regression,

\text{logit}(\pi) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,

  • where \pi = \text{P}[Y = 1] = the probability of the outcome/event.

    • Recall, \text{logit}() is the log-odds transformation. Here, \text{logit}(\pi) = \ln\left(\frac{\pi}{1-\pi}\right)

Binary Logistic Regression

Logistic curves with varying slopes

Binary Logistic Regression

  • Logistic regression is used specifically for binary response variables that take on two discrete values.

    • These are categorical outcomes typically coded as 0 and 1

    • Examples:

      • Customer churn (stayed vs. left)
      • Course outcome (pass vs. fail)
      • System status (operational vs. down)

Binary Logistic Regression

  • How is this different than beta regression?

    • Beta regression is used for continuous response variables that represent proportions or probabilities bounded between 0 and 1 (but not including the endpoints).

    • Examples:

      • Proportion of time spent on different activities
      • Percentage of correct answers on a test
      • Proportion of land covered by vegetation

Binomial Distribution

  • The Binomial distribution is a discrete probability distribution defined for the set \{0, ..., n\}.

    • The distribution gives the probability of observing a certain number of successes in n independent Bernoulli trials.
  • The Binomial distribution is characterized by \pi = P[Y = 1], the probability of success, and n, the number of trials (sample size).

    • As n increases, the distribution becomes more spread out.

    • Varying \pi changes the shape of the distribution.

      • When \pi is close to 0 or 1, the distribution is skewed.

      • When \pi is around 0.5, the distribution is more symmetric.

Binomial Distribution

  • Same n, varying \pi:
Plots of binomial distribution with n=10 and varying pi

Binomial Distribution

  • Same \pi, varying n:
Plots of binomial distribution with pi = 0.5 and varying n

Data Management

  • Binary outcomes are typically coded as 0/1.

    • 0 = event did not occur
    • 1 = event did occur
  • Some softwares allow us to get away with factor variables rather than numeric 0/1, but it is best practice to code binary outcomes as 0/1.

    • The code you will need will depend on how the original variable is coded.

    • Effectively, you will have something like

dataset_name <- dataset_name %>%
  mutate(y = if_else(original_variable == "event occurred", 1, 0))

Binary Logistic Regression (R)

  • We will use the glm() function to perform binary logistic regression,
m <- glm(y ~ x_1 + x_2 + ... + x_k,
         data = dataset_name,
         family = binomial(link = "logit"))

Lecture Example Set Up

  • Goofy and Max are gearing up for a huge audition. To make sure he’s ready, Max decides to test out a few different preparation plans leading up to the audition Each plan mixes a different approach to practicing, resting, and getting through the day.

  • Some days Goofy insists on structure and early bedtimes, other days Max does things his own way. This leads to four plans:

    • Goofy-led plan
    • Max-led plan
    • Balanced plan
    • Chaotic plan

Lecture Example Set Up

  • Each day, Max tracks a few behaviors he thinks might affect how well things go:

    • Hours of practice he gets in
    • Hours of sleep the night before
    • Caffeine intake (mg)
    • Confidence level (0–10)
  • Max then records if he nailed the (practice) audition for that day.

Lecture Example Set Up

  • Pulling in the data,
max <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W5_max.csv")
max %>% head()

Example 1: Binary Logistic Regression

  • Let’s model whether Max nailed the (practice) audition (nailed_audition) based on the plan followed, hours of practice, and hours of sleep.
m1 <- glm(nailed_audition ~ plan + practice_hours + sleep_hours,
         data = max,
         family = binomial(link = "logit"))
m1 %>% tidy()

Example 1: Binary Logistic Regression

  • Our resulting model is as follows,
m1 %>% coefficients()
   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
    -4.1686058     -1.4096353     -0.6619264     -0.3138744      0.6478403 
   sleep_hours 
     0.4031619 

\ln \left( \frac{\hat{\pi}}{1-\hat{\pi}} \right) = -4.17 - 1.41 \text{ C} -0.66 \text{ G} - 0.31 \text{ M} + 0.65 \text{ prac} + 0.40 \text{ sleep}

Interpreting the Model

  • Recall the binary logistic regression model,

\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,

  • We are modeling the log odds, which are not intuitive with interpretations.

    • Interpreting \hat{\beta}_i, it is an additive effect on the log odds.
  • To be able to discuss the odds, we will “undo” the natural log by exponentiation.

    • When interpreting e^{\hat{\beta}_i}, it is a multiplicative effect on the odds.
  • Thus, when interpreting the slope for x_i, we will look at the odds ratio, e^{\hat{\beta}_i}.

Interpreting the Model

  • Why is it a multiplicative effect?

\begin{align*} \ln \left( \frac{\pi}{1-\pi} \right) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \exp\left\{ \ln \left( \frac{\pi}{1-\pi} \right) \right\} &= \exp\left\{ \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \right\} \\ \frac{\pi}{1-\pi} &= e^{\beta_0} e^{\beta_1 x_1} \cdots e^{\beta_k x_k} \end{align*}

Interpreting the Model

  • For continuous predictors:

    • For a 1 [unit of predictor] increase in [predictor name], the odds of [outcome] are multiplied by [e^{\hat{\beta}_i}].
    • For a 1 [unit of predictor] increase in [predictor name], the odds of [outcome] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].
  • For categorical predictors:

    • As compared to [the reference group], the odds of [outcome] for [group that the slope belongs to] are multiplied by [e^{\hat{\beta}_i}].
    • As compared to [the reference group], the odds of [outcome] for [group that the slope belongs to] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].

Odds Ratios (R)

  • We can request the odds ratios directly out of tidy(),
model %>% tidy(conf.int = TRUE, exponentiate = TRUE)
  • Note: if you have an interaction term in the model, you must stratify your interpretations by the levels of the interacting variable(s) before exponentiating.

Example 1: Interpreting the Model

  • Revisiting our results,
m1 %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Example 1: Interpreting the Model

  • Interpreting our model,
round(exp(coefficients(m1)), 2)
   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
          0.02           0.24           0.52           0.73           1.91 
   sleep_hours 
          1.50 
  • When compared to the balanced plan,

    • Chaotic days have odds of nailing the audition multiplied by 0.24. This is a decrease of 76%.
    • Goofy-led days have odds of nailing the audition multiplied by 0.52. This is a decrease of 48%.
    • Max-led days have odds of nailing the audition multiplied by 0.72. This is a decrease of 28%.

Example 1: Interpreting the Model

  • Interpreting our model,
round(exp(coefficients(m1)), 2)
   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
          0.02           0.24           0.52           0.73           1.91 
   sleep_hours 
          1.50 
  • For a 1 hour increase in practice, the odds of nailing the audition are multiplied by 1.91. This is an increase of 91%.

  • For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.50. This is an increase of 50%.

Reporting Results

  • When reporting results from models that do not use the identity link (i.e., non-normal distributions), we go from reporting \hat{\beta}_i to reporting e^{\hat{\beta}_i}.
Predictor OR (95% CI for OR) p-value
variable 1 OR1 (LL1, UL1) p1
variable 2 OR2 (LL2, UL2) p2
variable k ORk (LLk, ULk) pk

Example 1: Reporting Results

  • In our example,
Predictor OR (95% CI for OR) p-value
Plan: Chaotic 0.24 (0.11, 0.53) < 0.001
Plan: Goofy 0.52 (0.26, 1.04) 0.030
Plan: Max 0.73 (0.36, 1.44) 0.302
Practice Hours 1.91 (1.54, 2.42) < 0.001
Sleep Hours 1.50 (1.23, 1.83) < 0.001

Statistical Inference

  • Our approach to determining significance remains the same.

    • For continuous and binary predictors, we can use the Wald test (z-test from tidy()).

    • For omnibus tests, we can use the likelihood ratio test (LRT).

      • Significance of regression line (full/reduced LRT).
      • Significance of categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).
      • Interaction terms with categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).

Example 1: Significance of Regression Line

  • In our example, we can test for significance of a regression line:
m1_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours, data = max, family = binomial(link = "logit"))
m1_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m1_reduced, m1_full, test = "LRT")
  • Yes, this is a significant regression line (p < 0.001).

Example 1: Significance of Predictors

  • In our model, we have a categorical predictor with 4 levels – car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.
m1 %>% car::Anova(type = 3)
  • Plan is a significant predictor (p = 0.002).

  • Practice hours is a significant predictor (p < 0.001).

  • Sleep hours is a significant predictor (p < 0.001).

Example 2: Binary Logistic Regression

  • Let’s now model whether Max nailed the audition (nailed_audition) based on the plan followed, hours of practice, hours of sleep, caffeine intake, and the interaction between hours of sleep and plan.
m2 <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan,
         data = max,
         family = binomial(link = "logit"))
m2 %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Example 2: Significance of Regression Line

  • Let’s make sure this is a significant model,
m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")
  • Yes, this is a significant regression line (p < 0.001); at least one of the slopes are non-zero.

Example 2: Significance of Predictors

  • In our model, we have a categorical predictor with 4 levels – car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.
m2 %>% car::Anova(type = 3)
  • The interaction between plan and sleep hours is a significant predictor (p = 0.007).

Example 2: Significance of Predictors

  • In our model, we have a categorical predictor with 4 levels – the full/reduced + anova() approach (or car::Anova(type =3)) is necessary.
m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")
  • The interaction between plan and sleep hours is a significant predictor (p = 0.007).

Example 2: Significance of Predictors

  • The interaction between plan and sleep hours is a significant predictor (p = 0.007).
m2 %>% car::Anova(type = 3)
  • What does this actually mean?

    • The effect of sleep hours on the odds of nailing the audition depends on which plan is being followed.

      • i.e., \hat{\beta}_{\text{sleep hours}} is not constant across all plans.
    • We will stratify by plan when interpreting the effect of sleep hours.

Example 2: Odds Ratios

  • Using the original coefficients, we can stratify by plan:
coefficients(m2)
            (Intercept)             planChaotic               planGoofy 
          -2.6808466148           -2.5619519420           -6.3101012814 
                planMax          practice_hours             sleep_hours 
           0.7932279783            0.6615468372            0.2101498676 
            caffeine_mg planChaotic:sleep_hours   planGoofy:sleep_hours 
          -0.0009170096            0.1657909939            0.7892868764 
    planMax:sleep_hours 
          -0.1592494154 
  • Then, for each type of plan, the intercept and slope for sleep hours is as follows:
Plan Intercept Sleep Hours Slope Sleep Hours OR
Balanced -2.68 0.21 exp(0.21) = 1.23
Chaotic -2.68 - 2.56 = -5.24 0.21 + 0.17 = 0.38 exp(0.38) = 1.46
Goofy -2.68 -6.31 = -8.99 0.21 + 0.79 = 1.00 exp(1.00) = 2.72
Max -2.68 + 0.79 = -1.89 0.21 - 0.16 = 0.05 exp(0.05) = 1.05

Example 2: Interpreting the Model

  • Now let’s interpret the coefficients,
Plan Intercept Sleep Hours Slope Sleep Hours OR
Balanced -2.68 0.21 exp(0.21) = 1.23
Chaotic -2.68 - 2.56 = -5.24 0.21 + 0.17 = 0.38 exp(0.38) = 1.46
Goofy -2.68 -6.31 = -8.99 0.21 + 0.79 = 1.00 exp(1.00) = 2.72
Max -2.68 + 0.79 = -1.89 0.21 - 0.16 = 0.05 exp(0.05) = 1.05
  • Interpreting the odds ratios for sleep hours:

    • Balanced plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.23 (a 23% increase).

    • Chaotic plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.46 (a 46% increase).

Example 2: Interpreting the Model

  • Now let’s interpret the coefficients,
Plan Intercept Sleep Hours Slope Sleep Hours OR
Balanced -2.68 0.21 exp(0.21) = 1.23
Chaotic -2.68 - 2.56 = -5.24 0.21 + 0.17 = 0.38 exp(0.38) = 1.46
Goofy -2.68 -6.31 = -8.99 0.21 + 0.79 = 1.00 exp(1.00) = 2.72
Max -2.68 + 0.79 = -1.89 0.21 - 0.16 = 0.05 exp(0.05) = 1.05
  • Interpreting the odds ratios for sleep hours:

    • Goofy-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 2.72 (a 172% increase).

    • Max-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.05 (a 5% increase).

Example 2: Interpreting the Model

  • Reviewing what’s in the model,
round(exp(coefficients(m2)), 2)
            (Intercept)             planChaotic               planGoofy 
                   0.07                    0.08                    0.00 
                planMax          practice_hours             sleep_hours 
                   2.21                    1.94                    1.23 
            caffeine_mg planChaotic:sleep_hours   planGoofy:sleep_hours 
                   1.00                    1.18                    2.20 
    planMax:sleep_hours 
                   0.85 
  • Our interpretations have covered plan and sleep hours.

    • We can still interpret caffeine (mg) and practice time (hr) – they are not involved in an interaction term.
  • For a 1 mg increase in caffeine, the odds of nailing the audition are multiplied by 1.00 (a 0% increase).

  • For a 1 hour increase in practice time, the odds of nailing the audition are multiplied by 1.89 (a 89% increase).

Examples 1 and 2: Model Fit

  • We can still use AIC and BIC to examine/compare model fit.

  • Let’s use the BIC to determine if m1 or m2 fit the model better.

m1 %>% BIC()
[1] 454.8547
m2 %>% BIC()
[1] 465.7631
  • Interestingly, the m1 fits the data better, despite the interaction being significant.

    • This suggests that the added complexity of m2 is not worth the slight increase in model fit.

Wrap Up

  • In this lecture, we have introduced binary logistic regression.

    • Binary logistic regression is appropriate for binary outcomes.
  • We again saw the logit link function.

    • We interpret coefficients as multiplicative effects on the odds of the event happening.
  • In the next lecture, we will review multinomial logistic regression.