In the previous weeks, we have built up what we understand about data visualization.
In this lecture, we will focus on visualizing logistic regression models.
\begin{align*} \text{logit}(\hat{\pi}) &= \hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k \\ \ln \left( \frac{\hat{\pi}}{1-\hat{\pi}} \right) &= \hat{\beta}_{0} + \hat{\beta}_{1} x_1 + ... + \hat{\beta}_{k} x_k \end{align*}
\begin{align*} \text{logit}(\hat{\pi}_j) &= \hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k \\ \ln \left( \frac{\hat{\pi}_j}{1-\hat{\pi}_j} \right) &= \hat{\beta}_{0} + \hat{\beta}_{1} x_1 + ... + \hat{\beta}_{k} x_k \end{align*}
\begin{align*} \text{logit}(\pi) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \ln\left(\frac{\pi}{1 - \pi}\right) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \frac{\pi}{1 - \pi} &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \pi &= (1-\pi) e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \pi &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} - \pi(e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) \\ \pi + \pi(e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \pi(1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \end{align*}
\begin{align*} \text{logit}(\pi) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ & \ \ \vdots \\ \pi(1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \pi &= \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}} \end{align*}
Goofy and Max are gearing up for a huge audition. To make sure he’s ready, Max decides to test out a few different preparation plans leading up to the audition Each plan mixes a different approach to practicing, resting, and getting through the day.
Each day, Max tracks a few behaviors he thinks might affect how well things go, then records if he nailed the (practice) audition for that day.
Pulling in the data,
\ln \left( \frac{\hat{\pi}}{1-\hat{\pi}} \right) = -4.17 - 1.41 \text{ C} -0.66 \text{ G} - 0.31 \text{ M} + 0.65 \text{ prac} + 0.40 \text{ sleep}
Let’s create predicted values for this model.
Let’s put practice hours on the x-axis, holding sleep hours constant at its median value.
We can create lines for each plan.
max <- max %>%
mutate(predicted_chaos = exp(-4.17 - 1.41*1 - 0.66*0 - 0.31*0 + 0.65*practice_hours + 0.40*7)/ (1 + exp(-4.17 - 1.41*1 - 0.66*0 - 0.31*0 + 0.65*practice_hours + 0.40*7)),
predicted_goofy = exp(-4.17 - 1.41*0 - 0.66*1 - 0.31*0 + 0.65*practice_hours + 0.40*7)/ (1 + exp(-4.17 - 1.41*0 - 0.66*1 - 0.31*0 + 0.65*practice_hours + 0.40*7)),
predicted_max = exp(-4.17 - 1.41*0 - 0.66*0 - 0.31*1 + 0.65*practice_hours + 0.40*7)/ (1 + exp(-4.17 - 1.41*0 - 0.66*0 - 0.31*1 + 0.65*practice_hours + 0.40*7)),
predicted_balanced = exp(-4.17 - 1.41*0 - 0.66*0 - 0.31*0 + 0.65*practice_hours + 0.40*7)/ (1 + exp(-4.17 - 1.41*0 - 0.66*0 - 0.31*0 + 0.65*practice_hours + 0.40*7)))max %>% ggplot(aes(x = practice_hours)) +
geom_point(aes(y = nailed_audition, color = plan), alpha = 0.2) +
geom_line(aes(y = predicted_chaos, color = "Chaotic")) +
geom_line(aes(y = predicted_goofy, color = "Goofy")) +
geom_line(aes(y = predicted_max, color = "Max")) +
geom_line(aes(y = predicted_balanced, color = "Balanced")) +
labs(x = "Practice Hours",
y = "Predicted Probability of Nailed Performance",
color = "Plan") +
theme_bw()During performance weekends, things don’t just go “well” or “poorly” — they can unfold in several very different ways. Depending on how Max prepares and which plan he follows with Goofy, the final outcome of the performance falls into one of four categories:
Pulling in the data,
\begin{align*} \ln \left( \frac{\pi_{\text{Messy}}}{\pi_{\text{Solid}}} \right) &= 6.59 + 0.52\,\text{G} + 0.69\,\text{M} + 1.75\,\text{C} - 0.55\,\text{prac} - 1.00\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Disaster}}}{\pi_{\text{Solid}}} \right) &= 4.45 + 1.29\,\text{G} + 2.93\,\text{M} + 1.05\,\text{C} - 0.30\,\text{prac} - 0.89\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Iconic}}}{\pi_{\text{Solid}}} \right) &= 1.96 - 2.09\,\text{G} - 2.53\,\text{M} - 1.74\,\text{C} + 0.22\,\text{prac} - 0.19\,\text{sleep} \\ \end{align*}
There are different ways to think about this.
\begin{align*} \ln \left( \frac{\pi_{\text{Messy}}}{\pi_{\text{Solid}}} \right) &= 6.59 + 0.52\,\text{G} + 0.69\,\text{M} + 1.75\,\text{C} - 0.55\,\text{prac} - 1.00\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Disaster}}}{\pi_{\text{Solid}}} \right) &= 4.45 + 1.29\,\text{G} + 2.93\,\text{M} + 1.05\,\text{C} - 0.30\,\text{prac} - 0.89\,\text{sleep} \\ \ln \left( \frac{\pi_{\text{Iconic}}}{\pi_{\text{Solid}}} \right) &= 1.96 - 2.09\,\text{G} - 2.53\,\text{M} - 1.74\,\text{C} + 0.22\,\text{prac} - 0.19\,\text{sleep} \\ \end{align*}
There are different ways to think about this.
# messy
6.59 + 0.52*goofy + 0.69*max + 1.75*chaos - 0.55*practice_hours - 1.00*median(sleep_hours)
# disaster
4.45 + 1.29*goofy + 2.93*max + 1.05*chaos - 0.30*practice_hours - 0.89*median(sleep_hours)
# iconic
1.96 - 2.09*goofy - 2.53*max - 1.74*chaos + 0.22*practice_hours - 0.19*median(sleep_hours)# messy
exp(6.59 + 0.52*goofy + 0.69*max + 1.75*chaos - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*goofy + 0.69*max + 1.75*chaos - 0.55*practice_hours - 1.00*median(sleep_hours)))
# disaster
exp(4.45 + 1.29*goofy + 2.93*max + 1.05*chaos - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*goofy + 2.93*max + 1.05*chaos - 0.30*practice_hours - 0.89*median(sleep_hours)))
# iconic
exp(1.96 - 2.09*goofy - 2.53*max - 1.74*chaos + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*goofy - 2.53*max - 1.74*chaos + 0.22*practice_hours - 0.19*median(sleep_hours)))# messy
# exp(6.59 + 0.52*goofy + 0.69*max + 1.75*chaos - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*goofy + 0.69*max + 1.75*chaos - 0.55*practice_hours - 1.00*median(sleep_hours)))
goofy <- goofy %>%
mutate(messy_g = exp(6.59 + 0.52*1 + 0.69*0 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*1 + 0.69*0 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))),
messy_m = exp(6.59 + 0.52*0 + 0.69*1 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*0 + 0.69*1 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))),
messy_c = exp(6.59 + 0.52*0 + 0.69*0 + 1.75*1 - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*0 + 0.69*0 + 1.75*1 - 0.55*practice_hours - 1.00*median(sleep_hours))),
messy_b = exp(6.59 + 0.52*0 + 0.69*0 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))/(1+exp(6.59 + 0.52*0 + 0.69*0 + 1.75*0 - 0.55*practice_hours - 1.00*median(sleep_hours))))
# disaster
# exp(4.45 + 1.29*goofy + 2.93*max + 1.05*chaos - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*goofy + 2.93*max + 1.05*chaos - 0.30*practice_hours - 0.89*median(sleep_hours)))
goofy <- goofy %>%
mutate(disaster_g = exp(4.45 + 1.29*1 + 2.93*0 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*1 + 2.93*0 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))),
disaster_m = exp(4.45 + 1.29*0 + 2.93*1 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*0 + 2.93*1 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))),
disaster_c = exp(4.45 + 1.29*0 + 2.93*0 + 1.05*1 - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*0 + 2.93*0 + 1.05*1 - 0.30*practice_hours - 0.89*median(sleep_hours))),
disaster_b = exp(4.45 + 1.29*0 + 2.93*0 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))/(1+exp(4.45 + 1.29*0 + 2.93*0 + 1.05*0 - 0.30*practice_hours - 0.89*median(sleep_hours))))
# iconic
# exp(1.96 - 2.09*goofy - 2.53*max - 1.74*chaos + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*goofy - 2.53*max - 1.74*chaos + 0.22*practice_hours - 0.19*median(sleep_hours)))
goofy <- goofy %>%
mutate(iconic_g = exp(1.96 - 2.09*1 - 2.53*0 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*1 - 2.53*0 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))),
iconic_m = exp(1.96 - 2.09*0 - 2.53*1 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*0 - 2.53*1 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))),
iconic_c = exp(1.96 - 2.09*0 - 2.53*0 - 1.74*1 + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*0 - 2.53*0 - 1.74*1 + 0.22*practice_hours - 0.19*median(sleep_hours))),
iconic_b = exp(1.96 - 2.09*0 - 2.53*0 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))/(1+exp(1.96 - 2.09*0 - 2.53*0 - 1.74*0 + 0.22*practice_hours - 0.19*median(sleep_hours))))For this to graph properly, we need to create a variable that “recodes” the outcomes to 0/1.
Use 0 for the reference group.
Use 1 for all other categories.
In our example,
goofy %>%
filter(performance_outcome %in% c("Disaster", "Solid")) %>%
ggplot(aes(x = practice_hours)) +
geom_point(aes(y = y_recode), alpha = 0.2, size = 2) +
geom_line(aes(y = disaster_c, color = "Chaotic"), linewidth = 1) +
geom_line(aes(y = disaster_g, color = "Goofy"), linewidth = 1) +
geom_line(aes(y = disaster_m, color = "Max"), linewidth = 1) +
geom_line(aes(y = disaster_b, color = "Balanced"), linewidth = 1) +
labs(x = "Practice Hours",
y = "Predicted Probability of a Disaster Performance",
color = "Practice Plan") +
xlim(0, 8) +
theme_bw()goofy %>%
filter(plan == "Balanced") %>%
ggplot(aes(x = practice_hours)) +
geom_point(aes(y = y_recode, color = performance_outcome), size = 2, alpha = 0.2) +
geom_line(aes(y = disaster_b, color = "Disaster"), linewidth = 1) +
geom_line(aes(y = messy_b, color = "Messy"), linewidth = 1.2) +
geom_line(aes(y = iconic_b, color = "Iconic"), linewidth = 1.2) +
labs(x = "Practice Hours",
y = "Predicted Probability (vs. Solid Performance)",
color = "Type of Performance") +
xlim(0, 8) +
theme_bw()In this lecture, we learned how to visualize logistic regression models.
Binary outcomes \to binary logistic regression.
Nominal outcomes \to multinomial logistic regression.
Again, we are building upon what we have already learned in this course.
Next week: Poisson and negative binomial regressions.