We have previously discussed continuous outcomes:
This week, we will consider categorical outcomes:
\text{logit}(\pi) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,
where \pi = \text{P}[Y = 1] = the probability of the outcome/event.
Logistic regression is used specifically for binary response variables that take on two discrete values.
These are categorical outcomes typically coded as 0 and 1
Examples:
How is this different than beta regression?
Beta regression is used for continuous response variables that represent proportions or probabilities bounded between 0 and 1 (but not including the endpoints).
Examples:
The Binomial distribution is a discrete probability distribution defined for the set \{0, ..., n\}.
The Binomial distribution is characterized by \pi = P[Y = 1], the probability of success, and n, the number of trials (sample size).
As n increases, the distribution becomes more spread out.
Varying \pi changes the shape of the distribution.
When \pi is close to 0 or 1, the distribution is skewed.
When \pi is around 0.5, the distribution is more symmetric.
Binary outcomes are typically coded as 0/1.
Some softwares allow us to get away with factor variables rather than numeric 0/1, but it is best practice to code binary outcomes as 0/1.
The code you will need will depend on how the original variable is coded.
Effectively, you will have something like
glm() function to perform binary logistic regression,Goofy and Max are gearing up for a huge audition. To make sure he’s ready, Max decides to test out a few different preparation plans leading up to the audition Each plan mixes a different approach to practicing, resting, and getting through the day.
Some days Goofy insists on structure and early bedtimes, other days Max does things his own way. This leads to four plans:
Each day, Max tracks a few behaviors he thinks might affect how well things go:
Max then records if he nailed the (practice) audition for that day.
nailed_audition) based on the plan followed, hours of practice, and hours of sleep. (Intercept) planChaotic planGoofy planMax practice_hours
-4.1686058 -1.4096353 -0.6619264 -0.3138744 0.6478403
sleep_hours
0.4031619
\ln \left( \frac{\hat{\pi}}{1-\hat{\pi}} \right) = -4.17 - 1.41 \text{ C} -0.66 \text{ G} - 0.31 \text{ M} + 0.65 \text{ prac} + 0.40 \text{ sleep}
\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,
We are modeling the log odds, which are not intuitive with interpretations.
To be able to discuss the odds, we will “undo” the natural log by exponentiation.
Thus, when interpreting the slope for x_i, we will look at the odds ratio, e^{\hat{\beta}_i}.
\begin{align*} \ln \left( \frac{\pi}{1-\pi} \right) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \exp\left\{ \ln \left( \frac{\pi}{1-\pi} \right) \right\} &= \exp\left\{ \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \right\} \\ \frac{\pi}{1-\pi} &= e^{\beta_0} e^{\beta_1 x_1} \cdots e^{\beta_k x_k} \end{align*}
For continuous predictors:
For categorical predictors:
tidy(), (Intercept) planChaotic planGoofy planMax practice_hours
0.02 0.24 0.52 0.73 1.91
sleep_hours
1.50
When compared to the balanced plan,
(Intercept) planChaotic planGoofy planMax practice_hours
0.02 0.24 0.52 0.73 1.91
sleep_hours
1.50
For a 1 hour increase in practice, the odds of nailing the audition are multiplied by 1.91. This is an increase of 91%.
For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.50. This is an increase of 50%.
| Predictor | OR (95% CI for OR) | p-value |
|---|---|---|
| variable 1 | OR1 (LL1, UL1) | p1 |
| variable 2 | OR2 (LL2, UL2) | p2 |
| … | … | … |
| variable k | ORk (LLk, ULk) | pk |
| Predictor | OR (95% CI for OR) | p-value |
|---|---|---|
| Plan: Chaotic | 0.24 (0.11, 0.53) | < 0.001 |
| Plan: Goofy | 0.52 (0.26, 1.04) | 0.030 |
| Plan: Max | 0.73 (0.36, 1.44) | 0.302 |
| Practice Hours | 1.91 (1.54, 2.42) | < 0.001 |
| Sleep Hours | 1.50 (1.23, 1.83) | < 0.001 |
Our approach to determining significance remains the same.
For continuous and binary predictors, we can use the Wald test (z-test from tidy()).
For omnibus tests, we can use the likelihood ratio test (LRT).
car::Anova(type = 3) or full/reduced LRT).car::Anova(type = 3) or full/reduced LRT).m1_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours, data = max, family = binomial(link = "logit"))
m1_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m1_reduced, m1_full, test = "LRT")car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.Plan is a significant predictor (p = 0.002).
Practice hours is a significant predictor (p < 0.001).
Sleep hours is a significant predictor (p < 0.001).
nailed_audition) based on the plan followed, hours of practice, hours of sleep, caffeine intake, and the interaction between hours of sleep and plan.m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ 1, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")car::Anova(type =3) (or the full/reduced + anova() approach) is necessary.anova() approach (or car::Anova(type =3)) is necessary.m2_full <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg + sleep_hours:plan, data = max, family = binomial(link = "logit"))
m2_reduced <- glm(nailed_audition ~ plan + practice_hours + sleep_hours + caffeine_mg, data = max, family = binomial(link = "logit"))
anova(m2_reduced, m2_full, test = "LRT")What does this actually mean?
The effect of sleep hours on the odds of nailing the audition depends on which plan is being followed.
We will stratify by plan when interpreting the effect of sleep hours.
(Intercept) planChaotic planGoofy
-2.6808466148 -2.5619519420 -6.3101012814
planMax practice_hours sleep_hours
0.7932279783 0.6615468372 0.2101498676
caffeine_mg planChaotic:sleep_hours planGoofy:sleep_hours
-0.0009170096 0.1657909939 0.7892868764
planMax:sleep_hours
-0.1592494154
| Plan | Intercept | Sleep Hours Slope | Sleep Hours OR |
|---|---|---|---|
| Balanced | -2.68 | 0.21 | exp(0.21) = 1.23 |
| Chaotic | -2.68 - 2.56 = -5.24 | 0.21 + 0.17 = 0.38 | exp(0.38) = 1.46 |
| Goofy | -2.68 -6.31 = -8.99 | 0.21 + 0.79 = 1.00 | exp(1.00) = 2.72 |
| Max | -2.68 + 0.79 = -1.89 | 0.21 - 0.16 = 0.05 | exp(0.05) = 1.05 |
| Plan | Intercept | Sleep Hours Slope | Sleep Hours OR |
|---|---|---|---|
| Balanced | -2.68 | 0.21 | exp(0.21) = 1.23 |
| Chaotic | -2.68 - 2.56 = -5.24 | 0.21 + 0.17 = 0.38 | exp(0.38) = 1.46 |
| Goofy | -2.68 -6.31 = -8.99 | 0.21 + 0.79 = 1.00 | exp(1.00) = 2.72 |
| Max | -2.68 + 0.79 = -1.89 | 0.21 - 0.16 = 0.05 | exp(0.05) = 1.05 |
Interpreting the odds ratios for sleep hours:
Balanced plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.23 (a 23% increase).
Chaotic plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.46 (a 46% increase).
| Plan | Intercept | Sleep Hours Slope | Sleep Hours OR |
|---|---|---|---|
| Balanced | -2.68 | 0.21 | exp(0.21) = 1.23 |
| Chaotic | -2.68 - 2.56 = -5.24 | 0.21 + 0.17 = 0.38 | exp(0.38) = 1.46 |
| Goofy | -2.68 -6.31 = -8.99 | 0.21 + 0.79 = 1.00 | exp(1.00) = 2.72 |
| Max | -2.68 + 0.79 = -1.89 | 0.21 - 0.16 = 0.05 | exp(0.05) = 1.05 |
Interpreting the odds ratios for sleep hours:
Goofy-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 2.72 (a 172% increase).
Max-led plan: For a 1 hour increase in sleep, the odds of nailing the audition are multiplied by 1.05 (a 5% increase).
(Intercept) planChaotic planGoofy
0.07 0.08 0.00
planMax practice_hours sleep_hours
2.21 1.94 1.23
caffeine_mg planChaotic:sleep_hours planGoofy:sleep_hours
1.00 1.18 2.20
planMax:sleep_hours
0.85
Our interpretations have covered plan and sleep hours.
For a 1 mg increase in caffeine, the odds of nailing the audition are multiplied by 1.00 (a 0% increase).
For a 1 hour increase in practice time, the odds of nailing the audition are multiplied by 1.89 (a 89% increase).
We can still use AIC and BIC to examine/compare model fit.
Let’s use the BIC to determine if m1 or m2 fit the model better.
Interestingly, the m1 fits the data better, despite the interaction being significant.
m2 is not worth the slight increase in model fit.In this lecture, we have introduced binary logistic regression.
We again saw the logit link function.
In the next lecture, we will review multinomial logistic regression.