In the previous weeks, we have built up what we understand about data visualization.
We have seen this week that there are three types of interactions:
Graphing a model with only categorical variables and interactions will follow what we learned for models with only categorical variables in Week 2.
In this lecture we will focus on models with at least one continuous predictor in them.
Recall the Clarabelle data,
We have a dataset with 300 days of operation for:
\hat{\text{wait time}} = 4.88 + 0.08 \text{ orders} + 2.88 \text{ temp.} + 0.06 \text{ orders $\times$ temp.}
\begin{align*} \hat{\text{wait time}|\text{op.}} &= 4.88 + 0.08 \text{ orders} \\ \hat{\text{wait time}|\text{temp.}} &= 7.76 + 0.14 \text{ orders} \end{align*}
We will use the statified models for visualization.
| orders_per_hour | machine_status | y_op | y_temp |
|---|---|---|---|
| 7.645707 | Temperamental | 5.491657 | 8.830399 |
| 24.174138 | Fully Operational | 6.813931 | 11.144379 |
| 24.802865 | Fully Operational | 6.864229 | 11.232401 |
| 11.827346 | Fully Operational | 5.826188 | 9.415828 |
| 26.913342 | Temperamental | 7.033067 | 11.527868 |
| 13.390703 | Fully Operational | 5.951256 | 9.634698 |
clarabelle %>% ggplot(aes(x = orders_per_hour)) +
geom_point(aes(y = avg_wait_time, color = machine_status), alpha = 0.5) +
geom_line(aes(y = y_op), color = "#F8766D", size = 1) +
geom_line(aes(y = y_temp), color = "#00BFC4", size = 1) +
labs(x = "Orders per Hour",
y = "Average Wait Time (minutes)",
color = "Machine Status") +
theme_bw()\begin{align*} \hat{\text{wait time}} = 4.&90 \\ & + 0.08 \text{ orders} \\ & + 1.16 \text{ seasonal} + 3.95 \text{ limited edition} \\ & - 0.01 \text{ orders $\times$ seasonal} + 0.02 \text{ orders $\times$ limited edition} \end{align*}
Although the interaction was not significant (p = 0.7206), we can still provide a data visualization of that specific model.
\begin{align*} \hat{\text{wait time}|\text{no special}} &= 4.90 + 0.08 \text{ orders} \\ \hat{\text{wait time}|\text{seasonal}} &= 6.06 + 0.07 \text{ orders} \\ \hat{\text{wait time}|\text{limited edition}} &= 8.85 + 0.10 \text{ orders} \end{align*}
| orders_per_hour | flavor_special | y_none | y_seasonal | y_limited |
|---|---|---|---|---|
| 7.645707 | Seasonal | 5.511657 | 6.595199 | 9.614571 |
| 24.174138 | None | 6.833931 | 7.752190 | 11.267414 |
| 24.802865 | None | 6.884229 | 7.796201 | 11.330286 |
| 11.827346 | Limited Edition | 5.846188 | 6.887914 | 10.032735 |
| 26.913342 | None | 7.053067 | 7.943934 | 11.541334 |
| 13.390703 | None | 5.971256 | 6.997349 | 10.189070 |
clarabelle %>% ggplot(aes(x = orders_per_hour)) +
geom_point(aes(y = avg_wait_time, color = flavor_special), alpha = 0.5) +
geom_line(aes(y = y_none), color = "#00BA38", size = 1) +
geom_line(aes(y = y_seasonal), color = "#619CFF", size = 1) +
geom_line(aes(y = y_limited), color = "#F8766D", size = 1) +
labs(x = "Orders per Hour",
y = "Average Wait Time (minutes)",
color = "Flavor") +
theme_bw()Wait!!!
The test for interaction was not significant, so why do the slopes look different?
Eh…
Remember that we see parallel lines when we include categorical predictors – the interaction should cause different rates of change. We are not seeing that here.
Thus – we can visualize models with non-significant interactions, but we should be careful about how we interpret them.
This lecture has demonstrated how to visualize models with interaction terms.
To keep it simple, we focused on models with at least one continuous predictor.
We now have the general building blocks for regression analysis.