Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
July 10, 2025
Thursday
Previous: two groups, continuous outcome
Now: more than two groups, continuous outcome
One-way ANOVA
Kruskal-Wallis
We have previously discussed testing the difference between two groups.
We will use a method called analysis of variance (ANOVA).
Fun fact: the two-sample t-test is a special case of ANOVA.
Fun fact: the two-sample t-test is a special case of ANOVA.
Two-sample t-test:
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
One-Way ANOVA:
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
The computations for ANOVA are more involved than what we’ve seen before.
An ANOVA table will be constructed in order to perform the hypothesis test.
Source | Sum of Squares | df | Mean Squares | F |
---|---|---|---|---|
Treatment | SSTrt | dfTrt | MSTrt | F0 |
Error | SSE | dfE | MSE | |
Total | SSTot | dfTot |
Once this is put together, we can perform the hypothesis test.
\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2
\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}
\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}
Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.
In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}
\text{MS}_X = \frac{\text{SS}_X}{\text{df}_E}
Finally, we have the test statistic.
Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,
F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
Source | Sum of Squares | df | Mean Squares | F |
---|---|---|---|---|
Treatment | SSTrt | dfTrt | MSTrt | F0 |
Error | SSE | dfE | MSE | |
Total | SSTot | dfTot |
one_way_ANOVA_table()
function from library(ssstats)
to construct the ANOVA table.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types.
Let’s first create the ANOVA table. Our updated code:
In one-way ANOVA, hypotheses always take the same form:
Note: you must fill in the “k” when writing hypotheses!
Test statistic:
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
p-Value:
p = P[F_{k-1,n-k} \ge F_0]
one_way_ANOVA()
function from library(ssstats)
to construct the corresponding hypothesis test.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code,
One-Way ANOVA:
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 1.526
p-value: p = 0.210
Conclusion: Fail to reject the null hypothesis (p = 0.2105 ≥ α = 0.05)
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. Our updated code:
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code:
magical_studies %>% one_way_ANOVA(continuous = coordination_score,
grouping = pony_type,
alpha = 0.05)
One-Way ANOVA:
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 18.396
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
There’s a difference, but what is the difference?
Let’s look at summary statistics:
# A tibble: 4 × 4
pony_type variable mean_sd median_iqr
<chr> <chr> <chr> <chr>
1 Alicorn coordination_score 87.7 (8.5) 88.0 (13.6)
2 Earth coordination_score 75.1 (8.4) 75.3 (9.6)
3 Pegasus coordination_score 77.6 (8.4) 78.5 (10.0)
4 Unicorn coordination_score 84.9 (7.7) 84.3 (11.2)
Recall our hypotheses in one-way ANOVA,
The F test does not tell us which mean is different… only that a difference exists.
In theory, we could perform repeated t tests to determine pairwise differences.
Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.
Suppose we are comparing 5 groups.
This is 10 pairwise comparisons!!
If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵
When performing posthoc comparisons, we can choose one of two paths:
Note that controlling the Type I error rate is more conservative than when we do not control it.
Generally, statisticians:
do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.
do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.
The posthoc tests we will learn:
Tukey’s test
Fisher’s least significant difference
Caution: we should only perform posthoc tests if we have determined that a general difference exists!
Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.
Hypotheses
Test Statistic
Q = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \frac{\text{MS}_{\text{E}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
posthoc_tukey()
function from library(ssstats)
to perform Tukey’s posthoc test (resulting p are adjusted for multiple comparisons).Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. Our updated code:
Fisher’s allows us to test all pairwise comparisons but does not control the \alpha.
Hypotheses:
Test Statistic:
t = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \text{MS}_{\text{E}} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
posthoc_fisher()
function from library(ssstats)
to perform Tukey’s posthoc test (resulting p are not adjusted for multiple comparisons).Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. How should we update this code?
Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. Our updated code:
Pairwise Comparison | \bar{x}_d | Unadjusted p | Adjusted p |
---|---|---|---|
Alicorn vs. Earth | 12.63 | < 0.001 | < 0.001 |
Alicorn vs. Pegasus | 10.18 | < 0.001 | < 0.001 |
Unicorn vs. Earth | 9.83 | < 0.001 | < 0.001 |
Unicorn vs. Pegasus | 7.38 | < 0.001 | 0.001 |
Alicorn vs. Unicorn | 2.80 | 0.151 | 0.487 |
Pegasus vs. Earth | 2.44 | 0.227 | 0.602 |
We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
Equal variance between groups.
Normal distribution.
We will now extend our knowledge of checking assumptions.
y_{ij} = \mu + \tau_i + \varepsilon_{ij}
where:
We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}
Normality: quantile-quantile plot
Variance: scatterplot of the residuals against the predicted values
Like with t-tests, we will assess these assumptions graphically.
We will use the ANOVA_assumptions()
function from library(ssstats)
to request the graphs necessary to asssess our assumptions.
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:
We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).
Hypotheses
Recall the variances_HT()
function from library(ssstats)
.
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn
Alternative: At least one variance is different
Test statistic: F(3,136) = 0.185
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
We also discussed how to assess the assumptions:
Graphically using the ANOVA_assumptions()
function.
Confirming the variance assumption using the BFL (variances_HT()
).
If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.
If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Hypotheses
\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}
where
kruskal_HT()
function from library(ssstats)
to perform the Kruskal-Wallis test.Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
How should we change the following code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
Our updated code,
z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
!! WAIT !! What about adjusting \alpha?
The function we will be using allows us to turn on/off the adjustment for multiple comparison.
To adjust the p-value directly,
p_{\text{B}} = \min(p \times m,\ 1)
\alpha_{\text{B}} = \frac{\alpha}{m}
We will use the posthoc_dunn()
function from library(ssstats)
to perform Dunn’s posthoc test.
When we want to adjust \alpha (Bonferroni):
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?
Pairwise Comparison | Unadjusted p | Adjusted p |
---|---|---|
Earth vs. Pegasus | 0.027 | 0.080 |
Earth vs. Unicorn | 0.338 | 1.000 |
Pegasus vs. Unicorn | 0.001 | 0.004 |
STA4173 - Biostatistics - Summer 2025