y = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k + \varepsilon
Until now, we have talked about models with only main effects.
Today, we will begin talking about interactions.
Recall interactions from two-way ANOVA:
Interactions work (and are specified) the same way in regression.
The usual caveats apply:
We do not want to load models with too many interactions.
We favor simplicity over interactions that do not add much to the predictive power of the model.
We do not want higher than two-way interactions unless necessary.
In this lecture, we will focus on continuous \times continuous interactions.
We will construct what is called a hierarchical well-formulated (HWF) model.
This means that when a higher-order interaction term is included in the model, all lower-order terms are also included.
e.g., when a two-way interaction is included, we also include the corresponding main effects. y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2
e.g., when a three-way interaction is included, we also include the corresponding main effects and two-way interactions. y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_1 x_2 + \beta_5 x_1 x_3 + \beta_6 x_2 x_3 + \beta_7 x_1 x_2 x_3
On a busy day at the clubhouse, Mickey Mouse wants to understand what drives “happiness” at the end of the day. For each day, he records (in the clubhouse dataset):
m1 <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs + time_with_friends:goofy_laughs,
family = "gaussian",
data = clubhouse)
m1 %>% tidy()\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
How do we interpret the interaction term?
This is the change in the effect of one predictor for a one-unit increase in the other predictor.
In an example without context, suppose we have the model,
\hat{y} = 5.2 + 2.3 x_1 + 4.5 x_2 - 1.2 (x_1 \times x_2)
The coefficient for the interaction term (-1.2) indicates that for every one unit increase in x_2, the effect of x_1 on y decreases by 1.2 units.
\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
The interaction term tells us that:
In the case of a continuous \times continuous interaction, we can use the individual p-value (from tidy()) to determine statistical significance.
Hypotheses
Test Statistic & p-Value
tidy())tidy())The p-value for the interaction term is 0.070.
The interaction is not significant and should not be included in our model.
Hypotheses
Test Statistic & p-Value
Rejection Region
Conclusion & Interpretation
There are two ways we can specify interactions in our model:
: operator to specify each interaction term individually.* operator to specify all main effects and interactions at once.Consider a model with time with friends (x_1), goofy laughs (x_2), Donald grumbles (x_3), and all possible interactions (x_1 x_2, x_1 x_3, x_2 x_3, x_1 x_2 x_3).
: operator,
* operator,
Obviously this is faster, but remember that it gives us less control.
Applying this to our data,
Using the : operator,
* operator,tidy().| term | : operator | * operator |
|---|---|---|
| (Intercept) | 59.34 (4.9, 113.79) | 59.34 (4.9, 113.79) |
| time_with_friends | 1.02 (-12.33, 14.38) | 1.02 (-12.33, 14.38) |
| goofy_laughs | 0.34 (-1.42, 2.1) | 0.34 (-1.42, 2.1) |
| donald_grumbles | -3.8 (-8.75, 1.15) | -3.8 (-8.75, 1.15) |
| time_with_friends:goofy_laughs | 0.07 (-0.36, 0.5) | 0.07 (-0.36, 0.5) |
| time_with_friends:donald_grumbles | 0.64 (-0.56, 1.84) | 0.64 (-0.56, 1.84) |
| goofy_laughs:donald_grumbles | 0.08 (-0.08, 0.24) | 0.08 (-0.08, 0.24) |
| time_with_friends:goofy_laughs:donald_grumbles | -0.02 (-0.06, 0.02) | -0.02 (-0.06, 0.02) |
When do I remove interaction terms from my models?
However, be cautious when removing interaction terms. This may require discussion with the team.
Because I always want the option of removing terms easily, that is why I prefer to use the : operator when specifying interactions.
Hypotheses
Test Statistic & p-Value
Rejection Region
Conclusion & Interpretation
When I have multiple (two-way) interactions in my model, I will sometimes perform a partial F test to assess the significance of all interaction terms at once.
The corresponding hypotheses are:
Then, in R, we take a similar approach to what we saw before,
full <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs + donald_grumbles +
time_with_friends:goofy_laughs + time_with_friends:donald_grumbles + goofy_laughs:donald_grumbles,
family = "gaussian",
data = clubhouse)
reduced <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs + donald_grumbles,
family = "gaussian",
data = clubhouse)
anova(reduced, full, test = "F")True backward selection would involve:
My version of backward selection:
This is not a perfect method, but it is a practical way to simplify models with interaction terms.
Note that because we are working with hierarchical well-formulated models, if main effects are involved in interactions, they must remain in the model as main effects.
y ~ x1 + x2 + x3 + x1:x2 + x1:x3 \to no main effects can be removed.
y ~ x1 + x2 + x1:x2 + x1:x3 \to not a valid model!
If a main effect is not involved in an interaction, it can be removed if desired.
Let’s solidify our understanding of interaction terms.
Let’s return to Example 1,
\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
To better understand the interaction, we can compute simple slopes.
What is the slope for time with friends when goofy laughs = 0? 25? 50?
\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
\begin{align*} \hat{\text{happiness}} &= 20.13 + 7.78 \text{ time} + 1.30(0) - 0.16 \ (\text{time$\times$0}) \\ &= 20.13 + 7.78 \text{ time} \end{align*}
\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
\begin{align*} \hat{\text{happiness}} &= 20.13 + 7.78 \text{ time} + 1.30(25) - 0.16 \ (\text{time$\times$25}) \\ &= 52.63 + 3.78 \text{ time} \end{align*}
Let’s look at these together,
When there are no goofy laughs, the slope for time with friends is 7.78. This means that as time increases by one hour, happiness increases by 7.78 points.
When there are 25 goofy laughs, the slope for time with friends is 3.78. This means that as time increases by one hour, happiness increases by 3.78 points.
Can we make an overall statement?
As the number of goofy laughs increases, the effect of time with friends on happiness decreases. This is consistent with the negative interaction term (-0.16).
How would I explain it to Mickey? When the number of laughs increases, it takes longer to increase the clubhouse happiness level.
\hat{\text{happiness}} = 20.13 + 7.78 \text{ time} + 1.30 \text{ laughs} - 0.16 \ (\text{time$\times$laughs})
To better understand the interaction, we can compute simple slopes.
What is the slope for time with friends when goofy laughs = 0? 25? 50?
Note!
How do I choose what to plug in for?
I look at the research question / narrative that we have to this point. Sometimes it is obvious which one the “story” is about.
The “interesting variable” is the one that we will allow to be modified by the interaction.
What if I can’t tell or both are of interest?
I do not decide. I will ask the research team for clarification.
Sometimes I will compute one simple slope each way to give a concrete example.
Always remember to underscore that this is for interpreting the model – it does not change the model itself!
This lecture introduced the concepts of interaction terms in our models.
We started with the continuous \times continuous interaction type to get comfortable with the idea of interactions in regression.
As we can see, relaying the meaning of interaction terms can be tricky and, depending on number and type of interaction term, can very steeply increase the complexity of the model.
In the next lecture, we will discuss categorical \times categorical interactions.