We have previously discussed continuous outcomes:
This week, we will consider categorical outcomes:
\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k
\ln \left( \frac{\pi_j}{\pi_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k
We now are creating c-1 models.
The general form of each model is:
\ln \left( \frac{\hat{\pi}_j}{\hat{\pi}_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k
multinom() function from the nnet package for modeling,Note that trace = FALSE only affects printed output
multinom()Fortunately for us, we know data management for this already, as our outcome will be a factor variable. Recall,
When we include factor variables, R defaults to the “first” level as the reference group.
There are (more than) two approaches we can take to “relevel” a variable.
Use the factor() function with the levels argument to set the order of the levels.
“Brute force” by defining a new character variable using if_else() statements and defining the levels as “1 - first category name”, “2 - second category name”, etc.
When dealing with factor variables, ask yourself what the reference group should be.
multinom()Previously, I have mentioned that you can use as.factor() within the formula of a model to treat a variable as a factor…
multinom().Thus, we must make sure that any categorical variables are coded appropriately in the dataset before modeling.
During performance weekends, things don’t just go “well” or “poorly” — they can unfold in several very different ways. Depending on how Max prepares and which plan he follows with Goofy, the final outcome of the performance falls into one of four categories:
Each day, Max tracks his practice time, amount of sleep, caffeine intake, and the preparation plan he used.
… that’s a lot of output! o_o
Remember that we are fitting three separate models here:
(Intercept) planGoofy planMax planChaotic practice_hours
Disaster 6.587583 0.5249369 0.686156 1.749325 -0.5475893
Messy 4.450541 1.2985642 2.933813 1.048919 -0.2963095
Iconic 1.962543 -2.0882399 -2.534283 -1.738306 0.2184171
sleep_hours
Disaster -0.9967134
Messy -0.8866107
Iconic -0.1948100
\begin{align*} \ln \left( \frac{\pi_{\text{Messy}}}{\pi_{\text{Solid}}} \right) &= 6.59 + 0.52\,\text{G} + 0.69\,\text{M} + 1.75\,\text{C} - 0.55\,\text{prac} - 1.00\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Disaster}}}{\pi_{\text{Solid}}} \right) &= 4.45 + 1.29\,\text{G} + 2.93\,\text{M} + 1.05\,\text{C} - 0.30\,\text{prac} - 0.89\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Iconic}}}{\pi_{\text{Solid}}} \right) &= 1.96 - 2.09\,\text{G} - 2.53\,\text{M} - 1.74\,\text{C} + 0.22\,\text{prac} - 0.19\,\text{sleep} \end{align*}
\ln \left( \frac{\hat{\pi}_j}{\hat{\pi}_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k
We are modeling the log odds, like in binary logistic regression, and will instead look to the odds ratio, e^{\hat{\beta}_i}
Interpreting \hat{\beta}_i is an additive effect on the log odds.
Interpreting e^{\hat{\beta}_i} is a multiplicative effect on the odds.
We interpret odds ratios much like binary logistic regression.
\beta_i < 0 \to 0 < \text{OR}_i < 1 \to a decrease in odds
\beta_i = 0 \to \text{OR}_i = 1 \to no change in odds
\beta_i > 0 \to \text{OR}_i > 1 \to an increase in odds
Continuous predictors:
Categorical predictors:
As compared to [predictor reference category], the odds of [response category j_{\text{resp}}], as compared to [response reference category], for [predictor category j_{\text{pred}}] are multiplied by e^{\hat{\beta}_{j{_{\text{pred}}}}}.
\begin{align*} \ln \left( \frac{\pi_{\text{Messy}}}{\pi_{\text{Solid}}} \right) &= 6.59 + 0.52\,\text{G} + 0.69\,\text{M} + 1.75\,\text{C} - 0.55\,\text{prac} - 1.00\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Disaster}}}{\pi_{\text{Solid}}} \right) &= 4.45 + 1.29\,\text{G} + 2.93\,\text{M} + 1.05\,\text{C} - 0.30\,\text{prac} - 0.89\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Iconic}}}{\pi_{\text{Solid}}} \right) &= 1.96 - 2.09\,\text{G} - 2.53\,\text{M} - 1.74\,\text{C} + 0.22\,\text{prac} - 0.19\,\text{sleep} \\ \end{align*}
| Predictor | ORDiaster | ORMessy | ORIconic |
|---|---|---|---|
| Plan: Goofy vs. Balanced | 1.69 | 3.66 | 0.12 |
| Plan: Max vs. Balanced | 1.99 | 18.80 | 0.08 |
| Plan: Chaotic vs. Balanced | 5.75 | 2.85 | 0.18 |
| Practice Time (hours) | 0.58 | 0.74 | 1.24 |
| Hours Slept | 0.37 | 0.41 | 0.82 |
When looking at the odds ratios:
As compared to following a balanced plan, following:
For each additional hour of practice, the odds of an disaster performance are multiplied by 0.58 (42% decrease).
For each additional hour of sleep, the odds of a disaster performance are multiplied by 0.37 (63% decrease).
We may also be interested in comparing odds ratios across models,
Examining practice time:
For each additional hour of practice, the odds of:
This underscores the importance of practice time for Max’s performance outcome!
As with other regression models, we will want to determine the significance of our model and predictors.
Significance of regression line: no change.
anova(reduced, full, test = "Chisq").Significance of predictors: … which way?
Significance of predictors (omnibus): no change.
car::Anova(type = "III")anova(reduced, full, test = "Chisq").Significance of predictors (model-specific omnibus): new.
car::Anova(type = "III") on each model, oranova(reduced, full, test = "Chisq") on each model.Significance of predictors (model-specific individual): no change.
tidy()m1_full <- multinom(performance_outcome ~ plan + practice_hours + sleep_hours, data = goofy, trace = FALSE)
m1_reduced <- multinom(performance_outcome ~ 1, data = goofy, trace = FALSE)
anova(m1_reduced, m1_full, test = "Chisq") But how do we determine if individual predictors are significant within each model (\pi_{\text{group}}/\pi_{\text{ref}}\sim x_i)?
tidy().If we want to get into the model-specific categorical predictor significance, we have a “rough” approach:
goofy %>%
filter(performance_outcome %in% c("Iconic","Solid")) %>%
multinom(performance_outcome ~ plan + practice_hours + sleep_hours,
data = .,
trace = FALSE) %>%
car::Anova(type=3) %>%
tidy()goofy %>%
filter(performance_outcome %in% c("Messy","Solid")) %>%
multinom(performance_outcome ~ plan + practice_hours + sleep_hours,
data = .,
trace = FALSE) %>%
car::Anova(type=3) %>%
tidy()goofy %>%
filter(performance_outcome %in% c("Disaster","Solid")) %>%
multinom(performance_outcome ~ plan + practice_hours + sleep_hours,
data = .,
trace = FALSE) %>%
car::Anova(type=3) %>%
tidy()Finally, we can look at the model-specific individual predictor significance.
However, our tidy() output is totally overwhelming. We will filter,
In this lecture, we have introduced multinomial logistic regression.
We again saw the logit link function.
In the next lecture, we will review data visualization for logistic regressions.