One-Way ANOVA
Kruskal-Wallis

July 10, 2025
Thursday

Introduction: Topics

Previous: two groups, continuous outcome
Now: more than two groups, continuous outcome
One-way ANOVA
- posthoc testing
- assumptions
Kruskal-Wallis
- posthoc testing

Introduction: Analysis of Variance

We have previously discussed testing the difference between two groups.
- What about when there are three or more groups?
We will use a method called analysis of variance (ANOVA).
- This method partitions the variance of the outcome into variance due to the groups and variance due to “other” factors.
Fun fact: the two-sample t-test is a special case of ANOVA.
- If you perform ANOVA when comparing two means, you will obtain the same results as the two-sample t-test.

Introduction: Analysis of Variance

Fun fact: the two-sample t-test is a special case of ANOVA.
Two-sample t-test:

Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)

In ANOVA,

One-Way ANOVA: 
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)

One-Way ANOVA

In one-way ANOVA, we are partitioning the variability in our outcome (SS_Total) into two pieces:
- Variability due to the group (SS_Treatment),
- Variablity due to “other factors” (SS_Error).
  - Think of this like a “catch all” for other sources of error – things we did not adjust for in our model.

One-Way ANOVA: ANOVA Table

The computations for ANOVA are more involved than what we’ve seen before.
An ANOVA table will be constructed in order to perform the hypothesis test.

Source	Sum of Squares	df	Mean Squares	F
Treatment	SS_Trt	df_Trt	MS_Trt	F₀
Error	SS_E	df_E	MS_E
Total	SS_Tot	df_Tot

Once this is put together, we can perform the hypothesis test.
- Our test statistic is the F_0.

One-Way ANOVA: (Hand) Computations

Before we begin our computations, it would be helpful if we know

\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2

where,
- \bar{x} is the overall mean,
- n_i is the sample size for group i,
- \bar{x}_i is the mean for group i, and
- s_i^2 is the variance for group i

One-Way ANOVA: (Hand) Computations

We begin our computations with the sums of squares:

\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}

One-Way ANOVA: (Hand) Computations

Each sum of squares has degrees of freedom:

\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}

One-Way ANOVA: (Hand) Computations

Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.
In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}
- Note that there is no \text{MS}_{\text{Tot}}!

One-Way ANOVA: (Hand) Computations

A note about mean squares: they are almost always constructed the same way.

\text{MS}_X = \frac{\text{SS}_X}{\text{df}_E}

This is important to know for future statistics courses that may have you calculating things by hand.

One-Way ANOVA: (Hand) Computations

Finally, we have the test statistic.
Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,

F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}

In one-way ANOVA, we are only constructing the F for treatment,

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

One-Way ANOVA: (Hand) Computations

We are finally done constructing our ANOVA table! As a reminder,

Source	Sum of Squares	df	Mean Squares	F
Treatment	SS_Trt	df_Trt	MS_Trt	F₀
Error	SS_E	df_E	MS_E
Total	SS_Tot	df_Tot

One-Way ANOVA: ANOVA Table (R)

We will use the one_way_ANOVA_table() function from library(ssstats) to construct the ANOVA table.

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types.
Let’s first create the ANOVA table. Our updated code:

magical_studies %>% one_way_ANOVA_table(continuous = ability_score,
                                        grouping = pony_type)

One-Way ANOVA: ANOVA Table

Running the code,

magical_studies %>% one_way_ANOVA_table(continuous = ability_score,
                                        grouping = pony_type)

One-Way ANOVA Table
Source	Sum of Squares	df	Mean Squares	F
Treatment	503.84	3	167.95	1.53
Error	14,963.20	136	110.02
Total	15,467.04	139

Hypothesis Testing: One-Way-ANOVA

In one-way ANOVA, hypotheses always take the same form:
- H_0: \ \mu_1 = \mu_2 = ... = \mu_k
- H_1: at least one is different
Note: you must fill in the “k” when writing hypotheses!
- e.g., if there are four means, your hypotheses are
  - H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
  - H_1: at least one is different
- e.g., in our MLP example,
  - H_0: \ \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{alicorn}} = \mu_{\text{pegasus}}
  - H_1: at least one is different

Hypothesis Testing: One-Way ANOVA

Test statistic:

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

p-Value:

p = P[F_{k-1,n-k} \ge F_0]

Hypothesis Testing: One-Way ANOVA (R)

We will use the one_way_ANOVA() function from library(ssstats) to construct the corresponding hypothesis test.

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code,

magical_studies %>% one_way_ANOVA(continuous = ability_score,
                                  grouping = pony_type, 
                                  alpha = 0.05)

Hypothesis Testing: One-Way ANOVA

Running the code,

magical_studies %>% one_way_ANOVA(continuous = ability_score,
                                  grouping = pony_type, 
                                  alpha = 0.05)

One-Way ANOVA: 
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 1.526
p-value: p = 0.210
Conclusion: Fail to reject the null hypothesis (p = 0.2105 ≥ α = 0.05)

Hypothesis Testing: One-Way ANOVA

Hypotheses
- H_0: \ \mu_{\text{alicorn}} = \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{pegasus}}
- H_1: at least one \mu_i is different
Test Statistic and p-Value
- F_0 = 1.526
- p = 0.210
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Fail to reject H_0 (p \text{ vs } \alpha \to p = 0.210 > 0.05). There is not sufficient evidence to suggest that there is a difference in magical ability between the types of ponies.

Hypothesis Testing: One-Way ANOVA

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

Hypothesis Testing: One-Way ANOVA

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s first create the ANOVA table. Our updated code:

magical_studies %>% one_way_ANOVA_table(continuous = coordination_score,
                                        grouping = pony_type)

Hypothesis Testing: One-Way ANOVA

Running the code:

magical_studies %>% one_way_ANOVA_table(continuous = coordination_score,
                                        grouping = pony_type)

One-Way ANOVA Table
Source	Sum of Squares	df	Mean Squares	F
Treatment	3,745.44	3	1,248.48	18.40
Error	9,229.82	136	67.87
Total	12,975.26	139

Hypothesis Testing: One-Way ANOVA

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code:

magical_studies %>% one_way_ANOVA(continuous = coordination_score,
                                  grouping = pony_type,
                                  alpha = 0.05)

Hypothesis Testing: One-Way ANOVA

Running the code:

magical_studies %>% one_way_ANOVA(continuous = coordination_score,
                                  grouping = pony_type,
                                  alpha = 0.05)

One-Way ANOVA: 
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 18.396
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Hypothesis Testing: One-Way ANOVA

Hypotheses
- H_0: \ \mu_{\text{alicorn}} = \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{pegasus}}
- H_1: at least one \mu_i is different
Test Statistic and p-Value
- F_0 = 18.396
- p < 0.001
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Reject H_0 (p \text{ vs } \alpha \to p < 0.001 < 0.05). There is sufficient evidence to suggest that there is a difference in magical coordination between the types of ponies.

Introduction: Posthoc Testing

There’s a difference, but what is the difference?
Let’s look at summary statistics:

magical_studies %>% 
  group_by(pony_type) %>%
  mean_median(coordination_score)

# A tibble: 4 × 4
  pony_type variable           mean_sd    median_iqr 
  <chr>     <chr>              <chr>      <chr>      
1 Alicorn   coordination_score 87.7 (8.5) 88.0 (13.6)
2 Earth     coordination_score 75.1 (8.4) 75.3 (9.6) 
3 Pegasus   coordination_score 77.6 (8.4) 78.5 (10.0)
4 Unicorn   coordination_score 84.9 (7.7) 84.3 (11.2)

Can we definitively say which groups are different…?

Introduction: Posthoc Testing

Recall our hypotheses in one-way ANOVA,
- H_0: \mu_1 = \mu_2 = ... = \mu_k
- H_1: at least one \mu_i is different
The F test does not tell us which mean is different… only that a difference exists.
In theory, we could perform repeated t tests to determine pairwise differences.
- Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.
- However, this will increase the Type I error rate (\alpha).

Introduction: Posthoc Testing

Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.
- i.e., we are saying there is a difference between the means when there is actually not a difference.
Suppose we are comparing 5 groups.
- This is 10 pairwise comparisons!!
  - 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
- If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵

Introduction: Posthoc Testing

When performing posthoc comparisons, we can choose one of two paths:
- Control the Type I (familywise) error rate.
- Do not control the Type I error rate.
Note that controlling the Type I error rate is more conservative than when we do not control it.
- “Conservative” = more difficult to reject.
Generally, statisticians:
- do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.
- do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.

Introduction: Posthoc Testing

The posthoc tests we will learn:
- Tukey’s test
  - Performs all pairwise tests and controls the Type I error rate
- Fisher’s least significant difference
  - Performs all pairwise tests but does not control the Type I error rate
Caution: we should only perform posthoc tests if we have determined that a general difference exists!
- i.e., when we reject when looking at the F test in ANOVA

Posthoc Testing: Tukey’s Test

Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.
Hypotheses
- H_0: \ \mu_i = \mu_j
- H_1: \ \mu_i \ne \mu_j
Test Statistic

Q = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \frac{\text{MS}_{\text{E}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Tukey’s Test (R)

We will use the posthoc_tukey() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are adjusted for multiple comparisons).

dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Tukey’s Test

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. How should we update this code?

dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Tukey’s Test

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Tukey’s posthoc test. Our updated code:

magical_studies %>% posthoc_tukey(continuous = coordination_score,
                                  grouping = pony_type)

Posthoc Testing: Tukey’s Test

Running the code,

magical_studies %>% posthoc_tukey(continuous = coordination_score,
                                  grouping = pony_type)

Posthoc Testing: Tukey’s Test

Restating results,
- \mu_{\text{alicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 12.63, p < 0.001)
- \mu_{\text{alicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 10.18, p < 0.001)
- \mu_{\text{unicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 9.83, p < 0.001)
- \mu_{\text{unicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 7.38, p = 0.001)
- \mu_{\text{alicorn}} = \mu_{\text{unicorn}} (\bar{x}_d = 2.80, p = 0.487)
- \mu_{\text{pegasus}} = \mu_{\text{earth}} (\bar{x}_d = 2.44, p = 0.602)

Posthoc Testing: Fisher’s Test

Fisher’s allows us to test all pairwise comparisons but does not control the \alpha.
Hypotheses:
- H_0: \ \mu_i = \mu_j
- H_1: \ \mu_i \ne \mu_j
Test Statistic:

t = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \text{MS}_{\text{E}} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Fisher’s Test (R)

We will use the posthoc_fisher() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are not adjusted for multiple comparisons).

dataset_name %>% posthoc_fisher(continuous = continuous_variable,
                                grouping = grouping_variable)

Posthoc Testing: Fisher’s Test

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. How should we update this code?

dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Fisher’s Test

Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.
Let’s now formulate Fisher’s posthoc test. Our updated code:

magical_studies %>% posthoc_fisher(continuous = coordination_score,
                                   grouping = pony_type)

Posthoc Testing: Fisher’s Test

Running the code,

magical_studies %>% posthoc_fisher(continuous = coordination_score,
                                   grouping = pony_type)

Posthoc Testing: Fisher’s Test

Restating results,
- \mu_{\text{alicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 12.63, p < 0.001)
- \mu_{\text{alicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 10.18, p < 0.001)
- \mu_{\text{unicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 9.83, p < 0.001)
- \mu_{\text{unicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 7.38, p < 0.001)
- \mu_{\text{alicorn}} = \mu_{\text{unicorn}} (\bar{x}_d = 2.80, p = 0.151)
- \mu_{\text{pegasus}} = \mu_{\text{earth}} (\bar{x}_d = 2.44, p = 0.227)

Posthoc Testing: Tukey’s vs. Fisher’s

We have now learned:
- Tukey’s test: Performs all pairwise tests and controls the Type I error rate
- Fisher’s test: Performs all pairwise tests but does not control the Type I error rate
Sometimes Tukey’s and Fisher’s will agree with each other.
- This is the case in our example.
Other times they do not agree – by design, Tukey’s makes it harder to reject.
- We may see more rejections when using Fisher’s.

Posthoc Testing: Tukey’s vs. Fisher’s

Comparing the two side-by-side,

Pairwise Comparison	\bar{x}_d	Unadjusted p	Adjusted p
Alicorn vs. Earth	12.63	< 0.001	< 0.001
Alicorn vs. Pegasus	10.18	< 0.001	< 0.001
Unicorn vs. Earth	9.83	< 0.001	< 0.001
Unicorn vs. Pegasus	7.38	< 0.001	0.001
Alicorn vs. Unicorn	2.80	0.151	0.487
Pegasus vs. Earth	2.44	0.227	0.602

Introduction: ANOVA Assumptions

We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
- Equal variance between groups.
- Normal distribution.
We will now extend our knowledge of checking assumptions.

ANOVA Assumptions: Definition

We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

where:
- y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
- \mu is the overall (grand) mean,
- \tau_i is the treatment effect for group i, and
- \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

Normality: quantile-quantile plot
- Should have points close to the 45^\circ line
- We will focus on the “center” portion of the plot
Variance: scatterplot of the residuals against the predicted values
- Should be “equal spread” between the groups
- No “pattern”

ANOVA Assumptions: Graphical Assessment (R)

Like with t-tests, we will assess these assumptions graphically.
We will use the ANOVA_assumptions() function from library(ssstats) to request the graphs necessary to asssess our assumptions.

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type))

ANOVA Assumptions: Graphical Assessment

Running the code,

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)

Error in qq_plot/rvf_plot: non-numeric argument to binary operator

ANOVA Assumptions: Test for Variance (R)

We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).
Hypotheses
- H_0: \ \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_k
- H_1: at least one \sigma^2_i is different.
Recall the variances_HT() function from library(ssstats).

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

ANOVA Assumptions: Test for Variance

Running the code,

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn 
Alternative: At least one variance is different 
Test statistic: F(3,136) = 0.185 
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)

ANOVA Assumptions: Test for Variance

Hypotheses
- H_0: \ \sigma^2_{\text{unicorn}} = \sigma^2_{\text{pegasus}} = \sigma^2_{\text{earth}} = \sigma^2_{\text{alicorn}}
- H_1: at least one \sigma^2_i is different
Test Statistic and p-Value
- F_0 = 0.734; p = 0.906
Rejection Region
- Reject if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

We also discussed how to assess the assumptions:
- Graphically using the ANOVA_assumptions() function.
- Confirming the variance assumption using the BFL (variances_HT()).
If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Hypothesis Testing: Kruskal-Wallis

If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
- The Kruskal-Wallis is an extension of the Wilcoxon rank sum (as ANOVA is an extension of the two-sample t-test).
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Hypotheses
- H_0: M_1 = ... = M_k
- H_1: at least one M_i is different

Hypothesis Testing: Kruskal-Wallis

Test Statistic

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}

where
- R_i is the sum of the ranks for group i,
- n_i is the sample size for group i,
- n = \sum_{i=1}^k n_i = total sample size,
- k is the number of groups, and
- \text{df} = k-1

Hypothesis Testing: Kruskal-Wallis (R)

We will use the kruskal_HT() function from library(ssstats) to perform the Kruskal-Wallis test.

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.

Hypothesis Testing: Kruskal-Wallis

She collects data (magical_pulse) for each pony and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.

magical_pulse %>% 
  group_by(pony_type) %>%
  mean_median(pulse)

# A tibble: 3 × 4
  pony_type variable mean_sd   median_iqr
  <chr>     <chr>    <chr>     <chr>     
1 Earth     pulse    5.0 (2.3) 4.5 (3.0) 
2 Pegasus   pulse    6.1 (2.2) 5.5 (2.0) 
3 Unicorn   pulse    4.5 (2.2) 4.0 (3.0)

Hypothesis Testing: Kruskal-Wallis

magical_pulse %>% ANOVA_assumptions(continuous = pulse,
                                    grouping = pony_type)

Error in qq_plot/rvf_plot: non-numeric argument to binary operator

Hypothesis Testing: Kruskal-Wallis

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
How should we change the following code?

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
Our updated code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

Running the code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Kruskal–Wallis Rank-Sum Test

H₀: M_Earth = M_Pegasus = M_Unicorn
H₁: At least one group is different

Test Statistic: X(2) = 10.616,
p = 0.005
Conclusion: Reject the null hypothesis (p = 0.005 < α = 0.05)

Hypothesis Testing: Kruskal-Wallis

Hypotheses
- H_0: \ M_{\text{earth}} = M_{\text{pegasus}} = M_{\text{unicorn}}
- H_1: at least one M_i is different
Test Statistic and p-Value
- \chi_0^2 = 10.616; p = 0.005
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Reject H_0 (p \text{ vs } \alpha \to p = 0.005 < 0.05). There is sufficient evidence to suggest that there is a difference in pulse between the pony types.

Posthoc Testing: Dunn’s Test

We can also perform posthoc testing in the Kruskal-Wallis setting using Dunn’s test.
- Rather than compare pairwise means, it compares pairwise average ranks.
Hypotheses:
- H_0: \ M_{i} = M_{j}
- H_1: \ M_{i} \ne M_{j}
Test Statistic:

z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Dunn’s Test

!! WAIT !! What about adjusting \alpha?
The function we will be using allows us to turn on/off the adjustment for multiple comparison.
To adjust the p-value directly,

p_{\text{B}} = \min(p \times m,\ 1)

The adjustment can also be made directly to \alpha (and not p),

\alpha_{\text{B}} = \frac{\alpha}{m}

Posthoc Testing: Dunn’s Test (R)

We will use the posthoc_dunn() function from library(ssstats) to perform Dunn’s posthoc test.
When we want to adjust \alpha (Bonferroni):

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE)

When we do not want to adjust \alpha:

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

Posthoc Testing: Dunn’s Test

Running the code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.027
2   Earth - Unicorn  0.9574435 0.338
3 Pegasus - Unicorn  3.1757915 0.001

Posthoc Testing: Dunn’s Test

Restating results,
- M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.027)
- M_{\text{earth}} = M_{\text{unicorn}} (p = 0.338)
- M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.001)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

Posthoc Testing: Dunn’s Test

Running the code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.080
2   Earth - Unicorn  0.9574435 1.000
3 Pegasus - Unicorn  3.1757915 0.004

Posthoc Testing: Dunn’s Test

Restating results,
- M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.080)
- M_{\text{earth}} = M_{\text{unicorn}} (p = 1)
- M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.004)

Posthoc Testing: Dunn’s Test

Comparing the two side by side,

Pairwise Comparison	Unadjusted p	Adjusted p
Earth vs. Pegasus	0.027	0.080
Earth vs. Unicorn	0.338	1.000
Pegasus vs. Unicorn	0.001	0.004

Wrap Up

Today we have covered one-way ANOVA.
- One-way ANOVA table
- Test for equality among k means
- ANOVA assumptions
- Nonparametric alternative (Kruskal-Wallis)
- Posthoc testing
  - Tukey’s (ANOVA, ajdusted)
  - Fisher’s (ANOVA, unadjusted)
  - Dunn’s (Kruskal-Wallis, adjusted or unadjusted)

Wrap Up

Next class: Two-way ANOVA
- ANOVA table
- Interaction terms
- Main effects
- Profile plots
- Posthoc testing
Thursday next week: Project 2!

Wrap Up

Daily activity: .qmd is available on Canvas.
- Due date: Monday, July 14, 2025.
You will upload the resulting .html file on Canvas.
- Please refer to the help guide on the Biostat website if you need help with submission.
Housekeeping:
- Quiz at 1:15!
- Do you have questions for me?
- Do you need my help with anything from prior lectures? Practices? Project 1?

One-Way ANOVAKruskal-Wallis

Introduction: Topics

Introduction: Analysis of Variance

Introduction: Analysis of Variance

One-Way ANOVA

One-Way ANOVA: ANOVA Table

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: (Hand) Computations

One-Way ANOVA: ANOVA Table (R)

One-Way ANOVA: ANOVA Table

One-Way ANOVA: ANOVA Table

One-Way ANOVA: ANOVA Table

Hypothesis Testing: One-Way-ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA (R)

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Hypothesis Testing: One-Way ANOVA

Introduction: Posthoc Testing

Introduction: Posthoc Testing

Introduction: Posthoc Testing

Introduction: Posthoc Testing

Introduction: Posthoc Testing

Posthoc Testing: Tukey’s Test

Posthoc Testing: Tukey’s Test (R)

Posthoc Testing: Tukey’s Test

Posthoc Testing: Tukey’s Test

Posthoc Testing: Tukey’s Test

Posthoc Testing: Tukey’s Test

Posthoc Testing: Fisher’s Test

Posthoc Testing: Fisher’s Test (R)

Posthoc Testing: Fisher’s Test

Posthoc Testing: Fisher’s Test

Posthoc Testing: Fisher’s Test

Posthoc Testing: Fisher’s Test

Posthoc Testing: Tukey’s vs. Fisher’s

Posthoc Testing: Tukey’s vs. Fisher’s

Introduction: ANOVA Assumptions

ANOVA Assumptions: Definition

ANOVA Assumptions: Definition

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment (R)

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Test for Variance (R)

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

Introduction: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis (R)

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test (R)

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

One-Way ANOVA
Kruskal-Wallis