Binary Logistic Regression

Introduction

We have previously discussed continuous outcomes:
- y \sim N(\mu, \sigma^2)
- y \sim \text{gamma}(\mu, \gamma)
- y \sim \text{beta}(\alpha, \beta)
This week, we will consider categorical outcomes:
- Binary
- Multinomial

Binary Logistic Regression

We model binary outcomes using logistic regression,

\text{logit}(\pi) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,

where \pi = \text{P}[Y = 1] = the probability of the outcome/event.
- Recall, \text{logit}() is the log-odds transformation. Here, \text{logit}(\pi) = \ln\left(\frac{\pi}{1-\pi}\right)

Binary Logistic Regression

Binary Logistic Regression

Logistic regression is used specifically for binary response variables that take on two discrete values.
- These are categorical outcomes typically coded as 0 and 1
- Examples:
  - Customer churn (stayed vs. left)
  - Course outcome (pass vs. fail)
  - System status (operational vs. down)

Binary Logistic Regression

How is this different than beta regression?
- Beta regression is used for continuous response variables that represent proportions or probabilities bounded between 0 and 1 (but not including the endpoints).
- Examples:
  - Proportion of time spent on different activities
  - Percentage of correct answers on a test
  - Proportion of land covered by vegetation

Binomial Distribution

The Binomial distribution is a discrete probability distribution defined for the set \{0, ..., n\}.
- The distribution gives the probability of observing a certain number of successes in n independent Bernoulli trials.
The Binomial distribution is characterized by \pi = P[Y = 1], the probability of success, and n, the number of trials (sample size).
- As n increases, the distribution becomes more spread out.
- Varying \pi changes the shape of the distribution.
  - When \pi is close to 0 or 1, the distribution is skewed.
  - When \pi is around 0.5, the distribution is more symmetric.

Binomial Distribution

Same n, varying \pi:

Binomial Distribution

Same \pi, varying n:

Data Management

Binary outcomes are typically coded as 0/1.
- 0 = event did not occur
- 1 = event did occur
Some softwares allow us to get away with factor variables rather than numeric 0/1, but it is best practice to code binary outcomes as 0/1.
- The code you will need will depend on how the original variable is coded.
- Effectively, you will have something like

dataset_name <- dataset_name %>%
  mutate(y = if_else(original_variable == "event occurred", 1, 0))

Binary Logistic Regression (R)

We will use the glm() function to perform binary logistic regression,

m <- glm(y ~ x_1 + x_2 + ... + x_k,
         data = dataset_name,
         family = binomial(link = "logit"))

Lecture Example Set Up

Goofy and Max are gearing up for a huge audition. To make sure he’s ready, Max decides to test out a few different preparation plans leading up to the audition Each plan mixes a different approach to practicing, resting, and getting through the day.
Some days Goofy insists on structure and early bedtimes, other days Max does things his own way. This leads to four plans:
- Goofy-led plan
- Max-led plan
- Balanced plan
- Chaotic plan

Lecture Example Set Up

Each day, Max tracks a few behaviors he thinks might affect how well things go:
- Hours of practice he gets in
- Hours of sleep the night before
- Caffeine intake (mg)
- Confidence level (0–10)
Max then records if he nailed the (practice) audition for that day.

Lecture Example Set Up

Pulling in the data,

max <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W5_max.csv")
max %>% head()

Example 1: Binary Logistic Regression

Let’s model whether Max nailed the (practice) audition (nailed_audition) based on the plan followed, hours of practice, and hours of sleep.

m1 <- glm(nailed_audition ~ plan + practice_hours + sleep_hours,
         data = max,
         family = binomial(link = "logit"))
m1 %>% tidy()

Example 1: Binary Logistic Regression

Our resulting model is as follows,

m1 %>% coefficients()

   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
    -4.1686058     -1.4096353     -0.6619264     -0.3138744      0.6478403 
   sleep_hours 
     0.4031619

\ln \left( \frac{\hat{\pi}}{1-\hat{\pi}} \right) = -4.17 - 1.41 \text{ C} -0.66 \text{ G} - 0.31 \text{ M} + 0.65 \text{ prac} + 0.40 \text{ sleep}

Interpreting the Model

Recall the binary logistic regression model,

\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,

We are modeling the log odds, which are not intuitive with interpretations.
- Interpreting \hat{\beta}_i, it is an additive effect on the log odds.
To be able to discuss the odds, we will “undo” the natural log by exponentiation.
- When interpreting e^{\hat{\beta}_i}, it is a multiplicative effect on the odds.
Thus, when interpreting the slope for x_i, we will look at the odds ratio, e^{\hat{\beta}_i}.

Interpreting the Model

Why is it a multiplicative effect?

\begin{align*} \ln \left( \frac{\pi}{1-\pi} \right) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \exp\left\{ \ln \left( \frac{\pi}{1-\pi} \right) \right\} &= \exp\left\{ \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \right\} \\ \frac{\pi}{1-\pi} &= e^{\beta_0} e^{\beta_1 x_1} \cdots e^{\beta_k x_k} \end{align*}

Interpreting the Model

For continuous predictors:
- For a 1 [unit of predictor] increase in [predictor name], the odds of [outcome] are multiplied by [e^{\hat{\beta}_i}].
- For a 1 [unit of predictor] increase in [predictor name], the odds of [outcome] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].
For categorical predictors:
- As compared to [the reference group], the odds of [outcome] for [group that the slope belongs to] are multiplied by [e^{\hat{\beta}_i}].
- As compared to [the reference group], the odds of [outcome] for [group that the slope belongs to] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].

Odds Ratios (R)

We can request the odds ratios directly out of tidy(),

model %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Note: if you have an interaction term in the model, you must stratify your interpretations by the levels of the interacting variable(s) before exponentiating.

Example 1: Interpreting the Model

Revisiting our results,

m1 %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Example 1: Interpreting the Model

Interpreting our model,

round(exp(coefficients(m1)), 2)

   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
          0.02           0.24           0.52           0.73           1.91 
   sleep_hours 
          1.50

When compared to the balanced plan,
- Chaotic days have odds of nailing the audition multiplied by 0.24. This is a decrease of 76%.
- Goofy-led days have odds of nailing the audition multiplied by 0.52. This is a decrease of 48%.
- Max-led days have odds of nailing the audition multiplied by 0.72. This is a decrease of 28%.

Example 1: Interpreting the Model

Interpreting our model,

round(exp(coefficients(m1)), 2)

   (Intercept)    planChaotic      planGoofy        planMax practice_hours 
          0.02           0.24           0.52           0.73           1.91 
   sleep_hours 
          1.50

For a 1 hour increase in practice, the odds of nailing the audition are multiplied by 1.91. This is an increase of 91%.
For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.50. This is an increase of 50%.

Reporting Results

When reporting results from models that do not use the identity link (i.e., non-normal distributions), we go from reporting \hat{\beta}_i to reporting e^{\hat{\beta}_i}.

Predictor	OR (95% CI for OR)	p-value
variable 1	OR₁ (LL₁, UL₁)	p₁
variable 2	OR₂ (LL₂, UL₂)	p₂
…	…	…
variable k	OR_k (LL_k, UL_k)	p_k

Example 1: Reporting Results

In our example,

Predictor	OR (95% CI for OR)	p-value
Plan: Chaotic	0.24 (0.11, 0.53)	< 0.001
Plan: Goofy	0.52 (0.26, 1.04)	0.030
Plan: Max	0.73 (0.36, 1.44)	0.302
Practice Hours	1.91 (1.54, 2.42)	< 0.001
Sleep Hours	1.50 (1.23, 1.83)	< 0.001

Statistical Inference

Our approach to determining significance remains the same.
- For continuous and binary predictors, we can use the Wald test (z-test from tidy()).
- For omnibus tests, we can use the likelihood ratio test (LRT).
  - Significance of regression line (full/reduced LRT).
  - Significance of categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).
  - Interaction terms with categorical predictors with c \ge 3 levels (car::Anova(type = 3) or full/reduced LRT).

Example 1: Significance of Regression Line

In our example, we can test for significance of a regression line:

m1_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours, data = max, family = binomial(link = "logit"))
m1_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m1_reduced, m1_full, test = "LRT")

Yes, this is a significant regression line (p < 0.001).

Example 1: Significance of Predictors

In our model, we have a categorical predictor with 4 levels – car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.

m1 %>% car::Anova(type = 3)

Plan is a significant predictor (p = 0.002).
Practice hours is a significant predictor (p < 0.001).
Sleep hours is a significant predictor (p < 0.001).

Example 2: Binary Logistic Regression

Let’s now model whether Max nailed the audition (nailed_audition) based on the plan followed, hours of practice, hours of sleep, caffeine intake, and the interaction between hours of sleep and plan.

m2 <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan,
         data = max,
         family = binomial(link = "logit"))
m2 %>% tidy(conf.int = TRUE, exponentiate = TRUE)

Example 2: Significance of Regression Line

Let’s make sure this is a significant model,

m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")

Yes, this is a significant regression line (p < 0.001); at least one of the slopes are non-zero.

Example 2: Significance of Predictors

In our model, we have a categorical predictor with 4 levels – car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.

m2 %>% car::Anova(type = 3)

The interaction between plan and sleep hours is a significant predictor (p = 0.007).

Example 2: Significance of Predictors

In our model, we have a categorical predictor with 4 levels – the full/reduced + anova() approach (or car::Anova(type =3)) is necessary.

m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")

The interaction between plan and sleep hours is a significant predictor (p = 0.007).

Example 2: Significance of Predictors

The interaction between plan and sleep hours is a significant predictor (p = 0.007).

m2 %>% car::Anova(type = 3)

What does this actually mean?
- The effect of sleep hours on the odds of nailing the audition depends on which plan is being followed.
  - i.e., \hat{\beta}_{\text{sleep hours}} is not constant across all plans.
- We will stratify by plan when interpreting the effect of sleep hours.

Example 2: Odds Ratios

Using the original coefficients, we can stratify by plan:

coefficients(m2)

            (Intercept)             planChaotic               planGoofy 
          -2.6808466148           -2.5619519420           -6.3101012814 
                planMax          practice_hours             sleep_hours 
           0.7932279783            0.6615468372            0.2101498676 
            caffeine_mg planChaotic:sleep_hours   planGoofy:sleep_hours 
          -0.0009170096            0.1657909939            0.7892868764 
    planMax:sleep_hours 
          -0.1592494154

Then, for each type of plan, the intercept and slope for sleep hours is as follows:

Plan	Intercept	Sleep Hours Slope	Sleep Hours OR
Balanced	-2.68	0.21	exp(0.21) = 1.23
Chaotic	-2.68 - 2.56 = -5.24	0.21 + 0.17 = 0.38	exp(0.38) = 1.46
Goofy	-2.68 -6.31 = -8.99	0.21 + 0.79 = 1.00	exp(1.00) = 2.72
Max	-2.68 + 0.79 = -1.89	0.21 - 0.16 = 0.05	exp(0.05) = 1.05

Example 2: Interpreting the Model

Now let’s interpret the coefficients,

Plan	Intercept	Sleep Hours Slope	Sleep Hours OR
Balanced	-2.68	0.21	exp(0.21) = 1.23
Chaotic	-2.68 - 2.56 = -5.24	0.21 + 0.17 = 0.38	exp(0.38) = 1.46
Goofy	-2.68 -6.31 = -8.99	0.21 + 0.79 = 1.00	exp(1.00) = 2.72
Max	-2.68 + 0.79 = -1.89	0.21 - 0.16 = 0.05	exp(0.05) = 1.05

Interpreting the odds ratios for sleep hours:
- Balanced plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.23 (a 23% increase).
- Chaotic plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.46 (a 46% increase).

Example 2: Interpreting the Model

Now let’s interpret the coefficients,

Plan	Intercept	Sleep Hours Slope	Sleep Hours OR
Balanced	-2.68	0.21	exp(0.21) = 1.23
Chaotic	-2.68 - 2.56 = -5.24	0.21 + 0.17 = 0.38	exp(0.38) = 1.46
Goofy	-2.68 -6.31 = -8.99	0.21 + 0.79 = 1.00	exp(1.00) = 2.72
Max	-2.68 + 0.79 = -1.89	0.21 - 0.16 = 0.05	exp(0.05) = 1.05

Interpreting the odds ratios for sleep hours:
- Goofy-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 2.72 (a 172% increase).
- Max-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.05 (a 5% increase).

Example 2: Interpreting the Model

Reviewing what’s in the model,

round(exp(coefficients(m2)), 2)

            (Intercept)             planChaotic               planGoofy 
                   0.07                    0.08                    0.00 
                planMax          practice_hours             sleep_hours 
                   2.21                    1.94                    1.23 
            caffeine_mg planChaotic:sleep_hours   planGoofy:sleep_hours 
                   1.00                    1.18                    2.20 
    planMax:sleep_hours 
                   0.85

Our interpretations have covered plan and sleep hours.
- We can still interpret caffeine (mg) and practice time (hr) – they are not involved in an interaction term.
For a 1 mg increase in caffeine, the odds of nailing the audition are multiplied by 1.00 (a 0% increase).
For a 1 hour increase in practice time, the odds of nailing the audition are multiplied by 1.89 (a 89% increase).

Examples 1 and 2: Model Fit

We can still use AIC and BIC to examine/compare model fit.
Let’s use the BIC to determine if m1 or m2 fit the model better.

m1 %>% BIC()

[1] 454.8547

m2 %>% BIC()

[1] 465.7631

Interestingly, the m1 fits the data better, despite the interaction being significant.
- This suggests that the added complexity of m2 is not worth the slight increase in model fit.

Wrap Up

In this lecture, we have introduced binary logistic regression.
- Binary logistic regression is appropriate for binary outcomes.
We again saw the logit link function.
- We interpret coefficients as multiplicative effects on the odds of the event happening.
In the next lecture, we will review multinomial logistic regression.