y = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k + \varepsilon
In the last lectures, we learned about both continuous \times continuous and categorical \times categorical interactions.
In this lecture, we will combine the two concepts to explore continuous \times categorical interactions.
Clarabelle runs a popular milkshake booth in Toontown. On busy days, customers can pile up quickly, and long wait times may affect satisfaction. Clarabelle wants to understand which factors most strongly influence average customer wait time and whether some factors work differently together.
We have a dataset with 300 days of operation for:
We first want to consider average wait time as a function of avg orders per hour, status of milkshake machine, and their interaction.
Let’s examine summary statistics,
# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 8.1 (3.4) 7.4 (4.2)
2 orders_per_hour 25.3 (14.2) 22.5 (18.6)
clarabelle %>% filter(machine_status == "Fully Operational") %>% mean_median(avg_wait_time, orders_per_hour)# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 7.0 (2.7) 6.7 (3.2)
2 orders_per_hour 26.0 (14.5) 23.5 (19.3)
clarabelle %>% filter(machine_status == "Temperamental") %>% mean_median(avg_wait_time, orders_per_hour)# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 11.2 (3.3) 10.8 (4.3)
2 orders_per_hour 23.4 (13.3) 21.6 (15.0)
m1 <- glm(avg_wait_time ~ orders_per_hour + machine_status + orders_per_hour:machine_status,
family = "gaussian",
data = clarabelle)
m1 %>% coefficients() (Intercept)
4.87966729
orders_per_hour
0.08167935
machine_statusTemperamental
2.87831326
orders_per_hour:machine_statusTemperamental
0.06360992
\hat{\text{wait time}} = 4.88 + 0.08 \text{ orders} + 2.88 \text{ temp.} + 0.06 \text{ orders $\times$ temp.}
tidy(),tidy() to determine that yes, the interaction is significant (p = 0.011).\hat{\text{wait time}} = 4.88 + 0.08 \text{ orders} + 2.88 \text{ temp.} + 0.06 \text{ orders $\times$ temp.}
\begin{align*} \hat{\text{wait time}} &= 4.88 + 0.08 \text{ orders} + 2.88(0) + 0.06(0) \text{ orders } \\ &= 4.88 + 0.08 \text{ orders} \end{align*}
\begin{align*} \hat{\text{wait time}} &= 4.88 + 0.08 \text{ orders} + 2.88(1) + 0.06(1) \text{ orders } \\ &= 7.76 + 0.14 \text{ orders} \end{align*}
\begin{align*} \hat{\text{wait time}|\text{operational}} &= 4.88 + 0.08 \text{ orders} \\ \hat{\text{wait time}|\text{temperamental}} &= 7.76 + 0.14 \text{ orders} \end{align*}
When the machine is fully operational, for every additional order per hour, average wait time increases by 0.08 minutes.
When the machine is temperamental, for every additional order per hour, average wait time increases by 0.14 minutes.
Clarabelle now wants to consider average wait time as a function of average orders per hour, if there are special flavors, and their interaction.
Let’s examine summary statistics,
# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 6.8 (2.9) 6.7 (3.3)
2 orders_per_hour 25.3 (14.6) 22.4 (18.9)
clarabelle %>% filter(flavor_special == "Limited Edition") %>% mean_median(avg_wait_time, orders_per_hour)# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 11.4 (3.4) 11.4 (4.0)
2 orders_per_hour 27.6 (15.9) 25.4 (21.0)
# A tibble: 2 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 avg_wait_time 7.7 (2.7) 7.2 (3.3)
2 orders_per_hour 24.0 (12.7) 22.2 (14.7)
m2a <- glm(avg_wait_time ~ orders_per_hour + flavor_special + orders_per_hour:flavor_special,
family = "gaussian",
data = clarabelle)
m2a %>% coefficients() (Intercept) orders_per_hour
8.84927244 0.09105015
flavor_specialNone flavor_specialSeasonal
-3.94666767 -2.78508254
orders_per_hour:flavor_specialNone orders_per_hour:flavor_specialSeasonal
-0.01495659 -0.02459592
m2b <- glm(avg_wait_time ~ orders_per_hour + flavor_special + orders_per_hour:flavor_special,
family = "gaussian",
data = clarabelle)
m2b %>% coefficients() (Intercept)
4.902604769
orders_per_hour
0.076093559
flavor_specialSeasonal
1.161585135
flavor_specialLimited Edition
3.946667673
orders_per_hour:flavor_specialSeasonal
-0.009639335
orders_per_hour:flavor_specialLimited Edition
0.014956586
\hat{\text{wait time}} = 4.88 + 0.08 \text{ orders} + 2.88 \text{ temp.} + 0.06 \text{ orders $\times$ temp.}
tidy(),car::Anova() here.The interaction is not significant (p = 0.721). We do not have justification in stratifying the model by special flavor.
Let’s simplify the model for illustrative purposes.
The overall model,
\begin{align*} \hat{\text{wait time}} = 4.&90 \\ & + 0.08 \text{ orders} \\ & + 1.16 \text{ seasonal} + 3.95 \text{ limited edition} \\ & - 0.01 \text{ orders $\times$ seasonal} + 0.02 \text{ orders $\times$ limited edition} \end{align*}
Let’s simplify the model for illustrative purposes.
No special flavor,
\begin{align*} \hat{\text{wait time}} = 4.&90 \\ & + 0.08 \text{ orders} \\ & + 1.16 (0) + 3.95 (0) \\ & - 0.01 (0) \text{ orders} + 0.02 (0) \text{ orders} \\ = 4.&90 + 0.08 \text{ orders} \end{align*}
Let’s simplify the model for illustrative purposes.
Seasonal,
\begin{align*} \hat{\text{wait time}} = 4.&90 \\ & + 0.08 \text{ orders} \\ & + 1.16 (1) + 3.95 (0) \\ & - 0.01 (1) \text{ orders} + 0.02 (0) \text{ orders} \\ = 6.&06 + 0.07 \text{ orders} \end{align*}
Let’s simplify the model for illustrative purposes.
Limited edition,
\begin{align*} \hat{\text{wait time}} = 4.&90 \\ & + 0.08 \text{ orders} \\ & + 1.16 (0) + 3.95 (1) \\ & - 0.01 (0) \text{ orders} + 0.02 (1) \text{ orders} \\ = 8.&85 + 0.10 \text{ orders} \end{align*}
\begin{align*} \hat{\text{wait time}|\text{no special}} &= 4.90 + 0.08 \text{ orders} \\ \hat{\text{wait time}|\text{seasonal}} &= 6.06 + 0.07 \text{ orders} \\ \hat{\text{wait time}|\text{limited edition}} &= 8.85 + 0.10 \text{ orders} \end{align*}
When there are no special flavors, for every additional order per hour, average wait time increases by 0.08 minutes.
When there are seasonal flavors, for every additional order per hour, average wait time increases by 0.07 minutes.
When there are limited edition flavors, for every additional order per hour, average wait time increases by 0.10 minutes.
Thinking about the practical implications,
These interpretations help us drive home the message that the flavor situation really doesn’t affect the relationship with wait time and number of orders.
This lecture is brief because continuous \times categorical interactions are conceptually similar to both continuous \times continuous and categorical \times categorical interactions.
Remember that everything in this course, especially to this point, are different building blocks.
This lecture in particular begins to combine several concepts.
As we move forward, we will continue to combine concepts to build more complex models.
It is impossible to prepare you for all of the “what if” situations you will encounter in your career.
In the next lecture, we will review how to visualize interactions.