One-Way ANOVA
Kruskal-Wallis

July 10, 2025
Thursday

Introduction: Topics

  • Previous: two groups, continuous outcome

  • Now: more than two groups, continuous outcome

  • One-way ANOVA

    • posthoc testing
    • assumptions
  • Kruskal-Wallis

    • posthoc testing

Introduction: Analysis of Variance

  • We have previously discussed testing the difference between two groups.

    • What about when there are three or more groups?
  • We will use a method called analysis of variance (ANOVA).

    • This method partitions the variance of the outcome into variance due to the groups and variance due to “other” factors.
  • Fun fact: the two-sample t-test is a special case of ANOVA.

    • If you perform ANOVA when comparing two means, you will obtain the same results as the two-sample t-test.

Introduction: Analysis of Variance

  • Fun fact: the two-sample t-test is a special case of ANOVA.

  • Two-sample t-test:

Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
  • In ANOVA,
One-Way ANOVA: 
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)

One-Way ANOVA

  • In one-way ANOVA, we are partitioning the variability in our outcome (SSTotal) into two pieces:
    • Variability due to the group (SSTreatment),
    • Variablity due to “other factors” (SSError).
      • Think of this like a “catch all” for other sources of error – things we did not adjust for in our model.

One-Way ANOVA: ANOVA Table

  • The computations for ANOVA are more involved than what we’ve seen before.

  • An ANOVA table will be constructed in order to perform the hypothesis test.

Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot
  • Once this is put together, we can perform the hypothesis test.

    • Our test statistic is the F_0.

One-Way ANOVA: (Hand) Computations

  • Before we begin our computations, it would be helpful if we know

\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2

  • where,
    • \bar{x} is the overall mean,
    • n_i is the sample size for group i,
    • \bar{x}_i is the mean for group i, and
    • s_i^2 is the variance for group i

One-Way ANOVA: (Hand) Computations

  • We begin our computations with the sums of squares:

\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}

One-Way ANOVA: (Hand) Computations

  • Each sum of squares has degrees of freedom:

\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}

One-Way ANOVA: (Hand) Computations

  • Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.

  • In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}

    • Note that there is no \text{MS}_{\text{Tot}}!

One-Way ANOVA: (Hand) Computations

  • A note about mean squares: they are almost always constructed the same way.

\text{MS}_X = \frac{\text{SS}_X}{\text{df}_E}

  • This is important to know for future statistics courses that may have you calculating things by hand.

One-Way ANOVA: (Hand) Computations

  • Finally, we have the test statistic.

  • Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,

F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}

  • In one-way ANOVA, we are only constructing the F for treatment,

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

One-Way ANOVA: (Hand) Computations

  • We are finally done constructing our ANOVA table! As a reminder,
Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot

One-Way ANOVA: ANOVA Table (R)

  • We will use the one_way_ANOVA_table() function from library(ssstats) to construct the ANOVA table.
dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s first create the ANOVA table. How should we update this code?

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

One-Way ANOVA: ANOVA Table

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types.

  • Let’s first create the ANOVA table. Our updated code:

magical_studies %>% one_way_ANOVA_table(continuous = ability_score,
                                        grouping = pony_type)

One-Way ANOVA: ANOVA Table

  • Running the code,
magical_studies %>% one_way_ANOVA_table(continuous = ability_score,
                                        grouping = pony_type)
One-Way ANOVA Table
Source Sum of Squares df Mean Squares F
Treatment 503.84 3 167.95 1.53
Error 14,963.20 136 110.02
Total 15,467.04 139

Hypothesis Testing: One-Way-ANOVA

  • In one-way ANOVA, hypotheses always take the same form:

    • H_0: \ \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one is different
  • Note: you must fill in the “k” when writing hypotheses!

    • e.g., if there are four means, your hypotheses are
      • H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
      • H_1: at least one is different
    • e.g., in our MLP example,
      • H_0: \ \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{alicorn}} = \mu_{\text{pegasus}}
      • H_1: at least one is different

Hypothesis Testing: One-Way ANOVA

Test statistic:

F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

p-Value:

p = P[F_{k-1,n-k} \ge F_0]

Hypothesis Testing: One-Way ANOVA (R)

  • We will use the one_way_ANOVA() function from library(ssstats) to construct the corresponding hypothesis test.
dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. How should we update this code?

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

  • In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.

  • To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (ability_score) among the four pony types. We will test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. Our updated code,

magical_studies %>% one_way_ANOVA(continuous = ability_score,
                                  grouping = pony_type, 
                                  alpha = 0.05)

Hypothesis Testing: One-Way ANOVA

  • Running the code,
magical_studies %>% one_way_ANOVA(continuous = ability_score,
                                  grouping = pony_type, 
                                  alpha = 0.05)
One-Way ANOVA: 
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 1.526
p-value: p = 0.210
Conclusion: Fail to reject the null hypothesis (p = 0.2105 ≥ α = 0.05)

Hypothesis Testing: One-Way ANOVA

  • Hypotheses
    • H_0: \ \mu_{\text{alicorn}} = \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{pegasus}}
    • H_1: at least one \mu_i is different
  • Test Statistic and p-Value
    • F_0 = 1.526
    • p = 0.210
  • Rejection Region
    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation
    • Fail to reject H_0 (p \text{ vs } \alpha \to p = 0.210 > 0.05). There is not sufficient evidence to suggest that there is a difference in magical ability between the types of ponies.

Hypothesis Testing: One-Way ANOVA

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.

  • Let’s first create the ANOVA table. How should we update this code?

dataset_name %>% one_way_ANOVA_table(continuous = continuous_variable,
                                     grouping = grouping_variable)

Hypothesis Testing: One-Way ANOVA

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.

  • Let’s first create the ANOVA table. Our updated code:

magical_studies %>% one_way_ANOVA_table(continuous = coordination_score,
                                        grouping = pony_type)

Hypothesis Testing: One-Way ANOVA

  • Running the code:
magical_studies %>% one_way_ANOVA_table(continuous = coordination_score,
                                        grouping = pony_type)
One-Way ANOVA Table
Source Sum of Squares df Mean Squares F
Treatment 3,745.44 3 1,248.48 18.40
Error 9,229.82 136 67.87
Total 12,975.26 139

Hypothesis Testing: One-Way ANOVA

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. How should we update this code?

dataset_name %>% one_way_ANOVA(continuous = continuous_variable,
                               grouping = grouping_variable,
                               alpha = specified_alpha)

Hypothesis Testing: One-Way ANOVA

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types. We will again test at the \alpha=0.05 level.

  • Let’s now formulate the hypothesis test. Our updated code:

magical_studies %>% one_way_ANOVA(continuous = coordination_score,
                                  grouping = pony_type,
                                  alpha = 0.05)

Hypothesis Testing: One-Way ANOVA

  • Running the code:
magical_studies %>% one_way_ANOVA(continuous = coordination_score,
                                  grouping = pony_type,
                                  alpha = 0.05)
One-Way ANOVA: 
H₀: μ_Alicorn = μ_Earth = μ_Pegasus = μ_Unicorn
H₁: At least one group mean is different
Test Statistic: F(3, 136) = 18.396
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Hypothesis Testing: One-Way ANOVA

  • Hypotheses
    • H_0: \ \mu_{\text{alicorn}} = \mu_{\text{unicorn}} = \mu_{\text{earth}} = \mu_{\text{pegasus}}
    • H_1: at least one \mu_i is different
  • Test Statistic and p-Value
    • F_0 = 18.396
    • p < 0.001
  • Rejection Region
    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation
    • Reject H_0 (p \text{ vs } \alpha \to p < 0.001 < 0.05). There is sufficient evidence to suggest that there is a difference in magical coordination between the types of ponies.

Introduction: Posthoc Testing

  • There’s a difference, but what is the difference?

  • Let’s look at summary statistics:

magical_studies %>% 
  group_by(pony_type) %>%
  mean_median(coordination_score)
# A tibble: 4 × 4
  pony_type variable           mean_sd    median_iqr 
  <chr>     <chr>              <chr>      <chr>      
1 Alicorn   coordination_score 87.7 (8.5) 88.0 (13.6)
2 Earth     coordination_score 75.1 (8.4) 75.3 (9.6) 
3 Pegasus   coordination_score 77.6 (8.4) 78.5 (10.0)
4 Unicorn   coordination_score 84.9 (7.7) 84.3 (11.2)
  • Can we definitively say which groups are different…?

Introduction: Posthoc Testing

  • Recall our hypotheses in one-way ANOVA,

    • H_0: \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one \mu_i is different
  • The F test does not tell us which mean is different… only that a difference exists.

  • In theory, we could perform repeated t tests to determine pairwise differences.

    • Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.
    • However, this will increase the Type I error rate (\alpha).

Introduction: Posthoc Testing

  • Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.

    • i.e., we are saying there is a difference between the means when there is actually not a difference.
  • Suppose we are comparing 5 groups.

    • This is 10 pairwise comparisons!!

      • 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
    • If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵

Introduction: Posthoc Testing

  • When performing posthoc comparisons, we can choose one of two paths:

    • Control the Type I (familywise) error rate.
    • Do not control the Type I error rate.
  • Note that controlling the Type I error rate is more conservative than when we do not control it.

    • “Conservative” = more difficult to reject.
  • Generally, statisticians:

    • do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.

    • do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.

Introduction: Posthoc Testing

  • The posthoc tests we will learn:

    • Tukey’s test

      • Performs all pairwise tests and controls the Type I error rate
    • Fisher’s least significant difference

      • Performs all pairwise tests but does not control the Type I error rate
  • Caution: we should only perform posthoc tests if we have determined that a general difference exists!

    • i.e., when we reject when looking at the F test in ANOVA

Posthoc Testing: Tukey’s Test

  • Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.

  • Hypotheses

    • H_0: \ \mu_i = \mu_j
    • H_1: \ \mu_i \ne \mu_j
  • Test Statistic

Q = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \frac{\text{MS}_{\text{E}}}{2} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Tukey’s Test (R)

  • We will use the posthoc_tukey() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are adjusted for multiple comparisons).
dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Tukey’s Test

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.

  • Let’s now formulate Tukey’s posthoc test. How should we update this code?

dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Tukey’s Test

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.

  • Let’s now formulate Tukey’s posthoc test. Our updated code:

magical_studies %>% posthoc_tukey(continuous = coordination_score,
                                  grouping = pony_type)

Posthoc Testing: Tukey’s Test

  • Running the code,
magical_studies %>% posthoc_tukey(continuous = coordination_score,
                                  grouping = pony_type)

Posthoc Testing: Tukey’s Test

  • Restating results,
    • \mu_{\text{alicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 12.63, p < 0.001)
    • \mu_{\text{alicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 10.18, p < 0.001)
    • \mu_{\text{unicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 9.83, p < 0.001)
    • \mu_{\text{unicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 7.38, p = 0.001)
    • \mu_{\text{alicorn}} = \mu_{\text{unicorn}} (\bar{x}_d = 2.80, p = 0.487)
    • \mu_{\text{pegasus}} = \mu_{\text{earth}} (\bar{x}_d = 2.44, p = 0.602)

Posthoc Testing: Fisher’s Test

  • Fisher’s allows us to test all pairwise comparisons but does not control the \alpha.

  • Hypotheses:

    • H_0: \ \mu_i = \mu_j
    • H_1: \ \mu_i \ne \mu_j
  • Test Statistic:

t = \frac{|\bar{y}_i - \bar{y}_j|}{\sqrt{ \text{MS}_{\text{E}} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Fisher’s Test (R)

  • We will use the posthoc_fisher() function from library(ssstats) to perform Tukey’s posthoc test (resulting p are not adjusted for multiple comparisons).
dataset_name %>% posthoc_fisher(continuous = continuous_variable,
                                grouping = grouping_variable)

Posthoc Testing: Fisher’s Test

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.

  • Let’s now formulate Fisher’s posthoc test. How should we update this code?

dataset_name %>% posthoc_tukey(continuous = continuous_variable,
                               grouping = grouping_variable)

Posthoc Testing: Fisher’s Test

  • Twilight Sparkle is now leading a study to understand whether ponies from different types specialize in different areas of magical strength. She develops a “magical coordination” score, which reflects how well a pony can use magic to interact with objects in motion (e.g., catching falling books).

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know what differences exist in their magical coordination scores (coordination_score). We will again test at the \alpha=0.05 level.

  • Let’s now formulate Fisher’s posthoc test. Our updated code:

magical_studies %>% posthoc_fisher(continuous = coordination_score,
                                   grouping = pony_type)

Posthoc Testing: Fisher’s Test

  • Running the code,
magical_studies %>% posthoc_fisher(continuous = coordination_score,
                                   grouping = pony_type)

Posthoc Testing: Fisher’s Test

  • Restating results,
    • \mu_{\text{alicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 12.63, p < 0.001)
    • \mu_{\text{alicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 10.18, p < 0.001)
    • \mu_{\text{unicorn}} \ne \mu_{\text{earth}} (\bar{x}_d = 9.83, p < 0.001)
    • \mu_{\text{unicorn}} \ne \mu_{\text{pegasus}} (\bar{x}_d = 7.38, p < 0.001)
    • \mu_{\text{alicorn}} = \mu_{\text{unicorn}} (\bar{x}_d = 2.80, p = 0.151)
    • \mu_{\text{pegasus}} = \mu_{\text{earth}} (\bar{x}_d = 2.44, p = 0.227)

Posthoc Testing: Tukey’s vs. Fisher’s

  • We have now learned:
    • Tukey’s test: Performs all pairwise tests and controls the Type I error rate
    • Fisher’s test: Performs all pairwise tests but does not control the Type I error rate
  • Sometimes Tukey’s and Fisher’s will agree with each other.
    • This is the case in our example.
  • Other times they do not agree – by design, Tukey’s makes it harder to reject.
    • We may see more rejections when using Fisher’s.

Posthoc Testing: Tukey’s vs. Fisher’s

  • Comparing the two side-by-side,
Pairwise Comparison \bar{x}_d Unadjusted p Adjusted p
Alicorn vs. Earth 12.63 < 0.001 < 0.001
Alicorn vs. Pegasus 10.18 < 0.001 < 0.001
Unicorn vs. Earth 9.83 < 0.001 < 0.001
Unicorn vs. Pegasus 7.38 < 0.001 0.001
Alicorn vs. Unicorn 2.80 0.151 0.487
Pegasus vs. Earth 2.44 0.227 0.602

Introduction: ANOVA Assumptions

  • We previously discussed testing three or more means using ANOVA.

  • We also discussed that ANOVA is an extension of the two-sample t-test.

  • Recall that the t-test has two assumptions:

    • Equal variance between groups.

    • Normal distribution.

  • We will now extend our knowledge of checking assumptions.

ANOVA Assumptions: Definition

  • We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

  • where:

    • y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
    • \mu is the overall (grand) mean,
    • \tau_i is the treatment effect for group i, and
    • \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

  • We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • Very important note: the assumption is on the error term and NOT on the outcome!

  • We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

  • Normality: quantile-quantile plot

    • Should have points close to the 45^\circ line
    • We will focus on the “center” portion of the plot
  • Variance: scatterplot of the residuals against the predicted values

    • Should be “equal spread” between the groups
    • No “pattern”

ANOVA Assumptions: Graphical Assessment (R)

  • Like with t-tests, we will assess these assumptions graphically.

  • We will use the ANOVA_assumptions() function from library(ssstats) to request the graphs necessary to asssess our assumptions.

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type))

ANOVA Assumptions: Graphical Assessment

  • Running the code,
magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)
Error in qq_plot/rvf_plot: non-numeric argument to binary operator

ANOVA Assumptions: Test for Variance (R)

  • We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).

  • Hypotheses

    • H_0: \ \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_k
    • H_1: at least one \sigma^2_i is different.
  • Recall the variances_HT() function from library(ssstats).

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

ANOVA Assumptions: Test for Variance

  • Running the code,
magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn 
Alternative: At least one variance is different 
Test statistic: F(3,136) = 0.185 
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_{\text{unicorn}} = \sigma^2_{\text{pegasus}} = \sigma^2_{\text{earth}} = \sigma^2_{\text{alicorn}}
    • H_1: at least one \sigma^2_i is different
  • Test Statistic and p-Value

    • F_0 = 0.734; p = 0.906
  • Rejection Region

    • Reject if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation

    • Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

  • We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • We also discussed how to assess the assumptions:

    • Graphically using the ANOVA_assumptions() function.

    • Confirming the variance assumption using the BFL (variances_HT()).

  • If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Hypothesis Testing: Kruskal-Wallis

  • If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.

    • The Kruskal-Wallis is an extension of the Wilcoxon rank sum (as ANOVA is an extension of the two-sample t-test).
  • The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.

  • Hypotheses

    • H_0: M_1 = ... = M_k
    • H_1: at least one M_i is different

Hypothesis Testing: Kruskal-Wallis

  • Test Statistic

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}

  • where

    • R_i is the sum of the ranks for group i,
    • n_i is the sample size for group i,
    • n = \sum_{i=1}^k n_i = total sample size,
    • k is the number of groups, and
    • \text{df} = k-1

Hypothesis Testing: Kruskal-Wallis (R)

  • We will use the kruskal_HT() function from library(ssstats) to perform the Kruskal-Wallis test.
dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

  • Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.

  • Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.

Hypothesis Testing: Kruskal-Wallis

  • She collects data (magical_pulse) for each pony and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
magical_pulse %>% 
  group_by(pony_type) %>%
  mean_median(pulse)
# A tibble: 3 × 4
  pony_type variable mean_sd   median_iqr
  <chr>     <chr>    <chr>     <chr>     
1 Earth     pulse    5.0 (2.3) 4.5 (3.0) 
2 Pegasus   pulse    6.1 (2.2) 5.5 (2.0) 
3 Unicorn   pulse    4.5 (2.2) 4.0 (3.0) 

Hypothesis Testing: Kruskal-Wallis

magical_pulse %>% ANOVA_assumptions(continuous = pulse,
                                    grouping = pony_type)
Error in qq_plot/rvf_plot: non-numeric argument to binary operator

Hypothesis Testing: Kruskal-Wallis

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.

  • How should we change the following code?

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.

  • Our updated code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

  • Running the code,
magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)
Kruskal–Wallis Rank-Sum Test

H₀: M_Earth = M_Pegasus = M_Unicorn
H₁: At least one group is different

Test Statistic: X(2) = 10.616,
p = 0.005
Conclusion: Reject the null hypothesis (p = 0.005 < α = 0.05)

Hypothesis Testing: Kruskal-Wallis

  • Hypotheses
    • H_0: \ M_{\text{earth}} = M_{\text{pegasus}} = M_{\text{unicorn}}
    • H_1: at least one M_i is different
  • Test Statistic and p-Value
    • \chi_0^2 = 10.616; p = 0.005
  • Rejection Region
    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation
    • Reject H_0 (p \text{ vs } \alpha \to p = 0.005 < 0.05). There is sufficient evidence to suggest that there is a difference in pulse between the pony types.

Posthoc Testing: Dunn’s Test

  • We can also perform posthoc testing in the Kruskal-Wallis setting using Dunn’s test.
    • Rather than compare pairwise means, it compares pairwise average ranks.
  • Hypotheses:
    • H_0: \ M_{i} = M_{j}
    • H_1: \ M_{i} \ne M_{j}
  • Test Statistic:

z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Dunn’s Test

  • !! WAIT !! What about adjusting \alpha?

  • The function we will be using allows us to turn on/off the adjustment for multiple comparison.

  • To adjust the p-value directly,

p_{\text{B}} = \min(p \times m,\ 1)

  • The adjustment can also be made directly to \alpha (and not p),

\alpha_{\text{B}} = \frac{\alpha}{m}

Posthoc Testing: Dunn’s Test (R)

  • We will use the posthoc_dunn() function from library(ssstats) to perform Dunn’s posthoc test.

  • When we want to adjust \alpha (Bonferroni):

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE)
  • When we do not want to adjust \alpha:
dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

Posthoc Testing: Dunn’s Test

  • Running the code,
magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)
         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.027
2   Earth - Unicorn  0.9574435 0.338
3 Pegasus - Unicorn  3.1757915 0.001

Posthoc Testing: Dunn’s Test

  • Restating results,
    • M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.027)
    • M_{\text{earth}} = M_{\text{unicorn}} (p = 0.338)
    • M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.001)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

Posthoc Testing: Dunn’s Test

  • Running the code,
magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)
         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.080
2   Earth - Unicorn  0.9574435 1.000
3 Pegasus - Unicorn  3.1757915 0.004

Posthoc Testing: Dunn’s Test

  • Restating results,
    • M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.080)
    • M_{\text{earth}} = M_{\text{unicorn}} (p = 1)
    • M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.004)

Posthoc Testing: Dunn’s Test

  • Comparing the two side by side,
Pairwise Comparison Unadjusted p Adjusted p
Earth vs. Pegasus 0.027 0.080
Earth vs. Unicorn 0.338 1.000
Pegasus vs. Unicorn 0.001 0.004

Wrap Up

  • Today we have covered one-way ANOVA.
    • One-way ANOVA table
    • Test for equality among k means
    • ANOVA assumptions
    • Nonparametric alternative (Kruskal-Wallis)
    • Posthoc testing
      • Tukey’s (ANOVA, ajdusted)
      • Fisher’s (ANOVA, unadjusted)
      • Dunn’s (Kruskal-Wallis, adjusted or unadjusted)

Wrap Up

  • Next class: Two-way ANOVA
    • ANOVA table
    • Interaction terms
    • Main effects
    • Profile plots
    • Posthoc testing
  • Thursday next week: Project 2!

Wrap Up

  • Daily activity: .qmd is available on Canvas.
    • Due date: Monday, July 14, 2025.
  • You will upload the resulting .html file on Canvas.
    • Please refer to the help guide on the Biostat website if you need help with submission.
  • Housekeeping:
    • Quiz at 1:15!
    • Do you have questions for me?
    • Do you need my help with anything from prior lectures? Practices? Project 1?