Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
Previous: two groups, continuous outcome
Now: more than two groups, continuous outcome
One-way ANOVA
Kruskal-Wallis
We have previously discussed testing the difference between two groups.
We will use a method called analysis of variance (ANOVA).
Fun fact: the two-sample t-test is a special case of ANOVA.
Fun fact: the two-sample t-test is a special case of ANOVA.
Two-sample t-test:
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
One-Way ANOVA:
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
The computations for ANOVA are more involved than what we’ve seen before.
An ANOVA table will be constructed in order to perform the hypothesis test.
| Source | Sum of Squares | df | Mean Squares | F |
|---|---|---|---|---|
| Treatment | SSTrt | dfTrt | MSTrt | F0 |
| Error | SSE | dfE | MSE | |
| Total | SSTot | dfTot |
Once this is put together, we can perform the hypothesis test.
\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2
\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}
\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}
Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.
In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}
\text{MS}_X = \frac{\text{SS}_X}{\text{df}_E}
Finally, we have the test statistic.
Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,
F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
| Source | Sum of Squares | df | Mean Squares | F |
|---|---|---|---|---|
| Treatment | SSTrt | dfTrt | MSTrt | F0 |
| Error | SSE | dfE | MSE | |
| Total | SSTot | dfTot |
one_way_ANOVA_table() function from library(ssstats) to construct the ANOVA table.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types.
Let’s first create the ANOVA table. Our updated code:
In one-way ANOVA, hypotheses always take the same form:
Note: you must fill in the “k” when writing hypotheses!
Test statistic:
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
p-Value:
p = P[F_{k-1,n-k} \ge F_0]
one_way_ANOVA() function from library(ssstats) to construct the corresponding hypothesis test.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code,
One-Way ANOVA:
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 1.526
p-value: p = 0.210
Conclusion: Fail to reject the null hypothesis (p = 0.2105 ≥ α = 0.05)
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. Our updated code:
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code:
magical_studies %>% one_way_ANOVA(continuous = coordination_score,
grouping = pony_type,
alpha = 0.05)One-Way ANOVA:
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 18.396
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
There’s a difference, but what is the difference?
Let’s look at summary statistics:
# A tibble: 4 × 4
pony_type variable mean_sd median_iqr
<chr> <chr> <chr> <chr>
1 Alicorn coordination_score 87.7 (8.5) 88.0 (13.6)
2 Earth coordination_score 75.1 (8.4) 75.3 (9.6)
3 Pegasus coordination_score 77.6 (8.4) 78.5 (10.0)
4 Unicorn coordination_score 84.9 (7.7) 84.3 (11.2)
Recall our hypotheses in one-way ANOVA,
The F test does not tell us which mean is different… only that a difference exists.
In theory, we could perform repeated t tests to determine pairwise differences.
Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.
Suppose we are comparing 5 groups.
This is 10 pairwise comparisons!!
If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵
When performing posthoc comparisons, we can choose one of two paths:
Note that controlling the Type I error rate is more conservative than when we do not control it.
Generally, statisticians:
do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.
do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.
The posthoc tests we will learn:
Tukey’s test
Fisher’s least significant difference
Caution: we should only perform posthoc tests if we have determined that a general difference exists!
Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.
Hypotheses
Test Statistic
Q = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \frac{\text{MS}_{\text{E}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
posthoc_tukey() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are adjusted for multiple comparisons).Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. Our updated code:
Fisher’s allows us to test all pairwise comparisons but does not control the \alpha.
Hypotheses:
Test Statistic:
t = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \text{MS}_{\text{E}} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
posthoc_fisher() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are not adjusted for multiple comparisons).Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. Our updated code:
| Pairwise Comparison | \bar{x}_d | Unadjusted p | Adjusted p |
|---|---|---|---|
| Alicorn vs. Earth | 12.63 | < 0.001 | < 0.001 |
| Alicorn vs. Pegasus | 10.18 | < 0.001 | < 0.001 |
| Unicorn vs. Earth | 9.83 | < 0.001 | < 0.001 |
| Unicorn vs. Pegasus | 7.38 | < 0.001 | 0.001 |
| Alicorn vs. Unicorn | 2.80 | 0.151 | 0.487 |
| Pegasus vs. Earth | 2.44 | 0.227 | 0.602 |
STA4173 - Biostatistics - Fall 2025