July 15, 2025
Tuesday
Recall that ANOVA allows us to compare the means of three or more groups.
In one-way ANOVA, we are only considering one factor (grouping variable).
Now, we will discuss two-way ANOVA, which allows us to consider a second factor (grouping variable).
We now partition the SSTrt into the different factors under consideration.
Recall that SSE is the “catch all” for unexplained variance.
Let’s discuss some of the language used in two-way ANOVA.
Factor A has a levels.
Factor B has b levels.
There are ab treatment groups.
Now that we are including two factors, we must consider the interaction term.
In our example, suppose Pinkie Pie is testing a new apple pie recipe and has ponies taste then rate the new recipe,
Source | Sum of Squares | df | Mean Square | F |
---|---|---|---|---|
A | SSA | dfA | MSA | FA |
B | SSB | dfB | MSB | FB |
AB | SSAB | dfAB | MSAB | FAB |
Error | SSE | dfE | MSE | |
Total | SSTot | dfTot |
Let there be a levels of factor A and b levels of factor B.
\text{MS}_{\text{X}} = \frac{\text{SS}_{\text{X}}}{\text{df}_{\text{X}}}
\text{F}_{\text{X}} = \frac{\text{MS}_{\text{X}}}{\text{MS}_{\text{E}}}
two_way_ANOVA_table()
function from library(ssstats)
to construct the two-way ANOVA table.# A tibble: 3 × 4
type variable mean_sd median_iqr
<chr> <chr> <chr> <chr>
1 Earth grade 79.3 (5.4) 79.3 (6.8)
2 Pegasus grade 79.7 (7.1) 79.8 (11.4)
3 Unicorn grade 84.6 (6.6) 84.0 (11.2)
# A tibble: 2 × 4
setting variable mean_sd median_iqr
<chr> <chr> <chr> <chr>
1 Cafe grade 81.9 (5.5) 82.3 (8.8)
2 Library grade 80.5 (7.8) 79.4 (10.7)
# A tibble: 6 × 5
type setting variable mean_sd median_iqr
<chr> <chr> <chr> <chr> <chr>
1 Earth Cafe grade 80.4 (5.5) 80.5 (7.1)
2 Earth Library grade 78.2 (5.0) 78.2 (6.6)
3 Pegasus Cafe grade 84.7 (5.1) 84.8 (6.3)
4 Pegasus Library grade 74.8 (5.0) 73.5 (5.7)
5 Unicorn Cafe grade 80.6 (5.0) 80.6 (7.7)
6 Unicorn Library grade 88.6 (5.5) 89.2 (7.9)
Two-Way ANOVA Table | |||||
---|---|---|---|---|---|
Source | Sum of Squares | df | Mean Squares | F | p |
Regression | 4158.61 | 5 | |||
•type | 1225.90 | 2 | 612.95 | 22.67 | < 0.001 |
•setting | 96.56 | 1 | 96.56 | 3.57 | 0.060 |
•Interaction | 2836.14 | 2 | 1418.07 | 52.45 | < 0.001 |
Error | 5515.86 | 204 | 27.04 | ||
Total | 9674.47 | 209 |
Hypotheses
Test Statistic and p-Value
Rejection Region
two_way_ANOVA_HT()
function from library(ssstats)
to perform the test for the interaction.At Friendship University, college-aged ponies enroll in the introductory STEM course “Applied Equestrian Engineering” (data is collected in grades). Researchers want to know how two factors influence the overall grade (percentage scale):
Determine if there is an interaction between pony type and study setting. Test at the \alpha=0.01 level.
How should we update the following code?
At Friendship University, college-aged ponies enroll in the introductory STEM course “Applied Equestrian Engineering” (data is collected in grades). Researchers want to know how two factors influence the overall grade (percentage scale):
Determine if there is an interaction between pony type and study setting. Test at the \alpha=0.01 level.
Our updated code,
grades %>% two_way_ANOVA(continuous = grade,
A = type,
B = setting,
interaction = TRUE,
alpha = 0.01)
Test for Interaction (type × setting):
H₀: The relationship between grade and type does not depend on setting.
H₁: The relationship between grade and type depends on setting.
Test Statistic: F(2, 204) = 52.45
p-value: p = < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.01)
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
profile_plot()
function from library(ssstats)
to construct basic profile plots.At Friendship University, college-aged ponies enroll in the introductory STEM course “Applied Equestrian Engineering” (data is collected in grades). Researchers want to know how two factors influence the overall grade (percentage scale).
Let’s now construct the profile plot with pony type (type) on the x-axis and create the lines using study setting (setting).
How should we update this code?
At Friendship University, college-aged ponies enroll in the introductory STEM course “Applied Equestrian Engineering” (data is collected in grades). Researchers want to know how two factors influence the overall grade (percentage scale).
Let’s now construct the profile plot with pony type (type) on the x-axis and create the lines using study setting (setting).
Our updated code,
two_way_ANOVA()
function from library(ssstats)
to perform the test for main effects.At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study).
Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Let’s first check for an interaction (\alpha=0.05). How should we change the following code?
At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study).
Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Let’s first check for an interaction (\alpha=0.05). Our updated code,
Test for Interaction (club × study):
H₀: The relationship between score and club does not depend on study.
H₁: The relationship between score and club depends on study.
Test Statistic: F(1, 116) = 0
p-value: p = 0.966
Conclusion: Fail to reject the null hypothesis (p = 0.9656 ≥ α = 0.05)
At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study).
Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Now let’s remove the interaction. How should we update our code?
At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study).
Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Now let’s remove the interaction. Our updated code,
Hypotheses
Test Statistic and p-Value
Test for Main Effect club:
H₀: μ_Club = μ_No Club
H₁: At least one mean is different.
Test Statistic: F(1, 117) = 11.19
p-value: p = 0.001
Conclusion: Reject the null hypothesis (p = 0.0011 < α = 0.05)
Test for Main Effect study:
H₀: μ_Group = μ_Solo
H₁: At least one mean is different.
Test Statistic: F(1, 117) = 11.24
p-value: p = 0.001
Conclusion: Reject the null hypothesis (p = 0.0011 < α = 0.05)
The ANOVA assumptions we learned last week hold true.
We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions:
e_{ij} = y_{ij} - \hat{y}_{ij}
ANOVA2_assumptions()
function from library(ssstats)
to request the graphs necessary to asssess our assumptions.At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study). Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Let’s now check the ANOVA assumptions. How should we edit the following code?
At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study). Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Let’s now check the ANOVA assumptions. How should we edit the following code?
At Ponyville High, students in a general science class (quiz_scores) were divided based on their membership in the Science Club (yes/no; club) and their primary method of studying outside of school (solo/group; study). Researchers want to know whether these factors impact science quiz scores (out of 100 points; score).
Let’s now check the ANOVA assumptions. Our updated code,
STA4173 - Biostatistics - Summer 2025