operations <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W4_daisy.csv")
operations %>% head()In the previous weeks, we have built up what we understand about data visualization.
All three weeks, we were dealing with a continuous outcome that we assumed had a normal distribution.
In this lecture, we will focus on visualizing gamma regression models.
Recall our data for task completion,
We constructed the model,
\begin{align*} \text{logit}(\mu) = \ & 2.38 + 0.83 \text{ Minnie} - 0.04 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} - \\ & 0.05 \text{ Minnie} \times \text{task} \end{align*}
operations <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W4_daisy.csv")
operations %>% head()\begin{align*} \text{logit}(\mu) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \ln\left(\frac{\mu}{1 - \mu}\right) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ \frac{\mu}{1 - \mu} &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \mu &= (1-\mu) e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \mu &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} - \mu(e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) \\ \mu + \mu(e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \mu(1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \end{align*}
\begin{align*} \text{logit}(\mu) &= \beta_0 + \beta_1 x_1 + ... + \beta_k x_k \\ & \ \ \vdots \\ \mu(1+e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}) &= e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k} \\ \mu &= \frac{e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}}{1 + e^{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k}} \end{align*}
Recall our stratified models.
For Minnie,
\begin{align*} \text{logit}(\mu) = \ & 3.21 - 0.09 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} \end{align*}
\begin{align*} \text{logit}(\mu) = \ & 2.38 - 0.04 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} \end{align*}
Let’s have the shift hours on the x-axis, task completion on the y-axis, and lines defined by character and location.
operations <- operations %>%
mutate(logit_minnie_ms = 3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0,
logit_minnie_tt = 3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1,
logit_minnie_ws = 3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0,
logit_daisy_ms = 2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0,
logit_daisy_tt = 2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1,
logit_daisy_ws = 2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0)operations %>%
select(completion_rate,
logit_minnie_ms, logit_minnie_tt, logit_minnie_ws,
logit_daisy_ms, logit_daisy_tt, logit_daisy_ws) %>%
head(n=4)Like in gamma regression, we can see that the scaling is not correct.
exp(),operations <- operations %>%
mutate(minnie_ms = exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0)/(1+exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0)),
minnie_tt = exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1)/(1+exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1)),
minnie_ws = exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0)/(1+exp(3.21 - 0.09*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0)),
daisy_ms = exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0)/(1+exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*1 - 0.31*0)),
daisy_tt = exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1)/(1+exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*1)),
daisy_ws = exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0)/(1+exp(2.38 - 0.04*median(task_load) - 0.08*shift_hours + 0.02*0 - 0.31*0)))operations %>%
select(completion_rate,
minnie_ms, minnie_tt, minnie_ws,
daisy_ms, daisy_tt, daisy_ws) %>%
head(n=4)Now, we can build our visualizations as normal.
For Minnie,
operations %>% ggplot(aes(x = shift_hours)) +
geom_point(aes(y = completion_rate, color = location), alpha = 0.5) +
geom_line(aes(y = minnie_ms), color = "#00BA38", linewidth = 1) +
geom_line(aes(y = minnie_tt), color = "#00BFC4", linewidth = 1) +
geom_line(aes(y = minnie_ws), color = "#F8766D", linewidth = 1) +
labs(x = "Hours on Shift",
y = "Task Completion",
color = "Location") +
theme_bw())Now, we can build our visualizations as normal.
For Minnie,
Now, we can build our visualizations as normal.
For Daisy,
operations %>% ggplot(aes(x = shift_hours)) +
geom_point(aes(y = completion_rate, color = location), alpha = 0.5) +
geom_line(aes(y = daisy_ms), color = "#00BA38", linewidth = 1) +
geom_line(aes(y = daisy_tt), color = "#00BFC4", linewidth = 1) +
geom_line(aes(y = daisy_ws), color = "#F8766D", linewidth = 1) +
labs(x = "Hours on Shift",
y = "Task Completion",
color = "Location") +
theme_bw()Now that we have the visualization, we can link it back to interpretations and inference. Recall,
The interaction between character and task load is significant (p < 0.001).
Shift length is a significant predictor of completion rate (p < 0.001).
Location is a significant predictor of completion rate (p = 0.008).
There is not a difference in the mean completion rate between Epcot’s World Showcase and Main Street (p = 0.867).
There is a difference in the mean completion rate between Epcot’s World Showcase and Toontown (p = 0.010).
In this lecture, we learned how to visualize beta regression models.
Again, we are building upon what we have already learned in this course.
Now that we are venturing outside of the normal distribution, we need to think carefully about how to create predicted values.
Next week: Logistic regression