ANOVA Assumptions and
Kruskal-Wallis

STA4173: Biostatistics
Spring 2025

Introduction: ANOVA Assumptions

  • We previously discussed testing three or more means using ANOVA.

  • We also discussed that ANOVA is an extension of the two-sample t-test.

  • Recall that the t-test has two assumptions:

    • Equal variance between groups.

    • Normal distribution.

  • We will extend our knowledge of checking assumptions today.

ANOVA Assumptions: Definition

  • We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

  • where:

    • y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
    • \mu is the overall (grand) mean,
    • \tau_i is the treatment effect for group i, and
    • \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

  • We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • Very important note: the assumption is on the error term and NOT on the outcome!

  • We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

  • Normality: quantile-quantile plot

    • Should have points close to the 45^\circ line
    • We will focus on the “center” portion of the plot
  • Variance: scatterplot of the residuals against the predicted values

    • Should be “equal spread” between the groups
    • No “pattern”

ANOVA Assumptions: Graphical Assessment

  • Like with t-tests, we will assess these assumptions graphically.

  • We will return to the classpackage package and use the anova_check() function.

library(classpackage) 
anova_check(m)

ANOVA Assumptions: Graphical Assessment

  • Recall the dental example from last week,
library(tidyverse)
strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
              17.2, 14.3, 17.6, 21.6, 17.5,
               5.5,  7.7, 12.2, 11.4, 16.4,
              11.0, 12.4, 13.5,  8.9,  8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)
m <- aov(strength ~ system, data = data)
summary(m)
            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA Assumptions: Assessing Graphically

  • Let’s assess the assumptions,
library(classpackage)
anova_check(m)

ANOVA Assumptions: Test for Variance

  • We can formally check the variance assumption with the Brown-Forsythe-Levine test.

    • This test transforms the data and then performs ANOVA!
  • The test statistic is calculated as follows, F_0 = \frac{\sum_{i=1}^k n_i (\bar{z}_i - \bar{z})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_j}(z_{ij}-\bar{z}_i)^2/(n-k) }, where

    • k is the number of groups,
    • n_i is the sample size of group i,
    • n = \sum_{i=1}^k n_i, and
    • z_{ij} = |y_{ij} - \text{median}(y_i)|

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_1 = ... = \sigma^2_k
    • H_1: at least one \sigma^2_i is different
  • Test Statistic

    • F_0 (take from resulting ANOVA table)
  • p-Value

    • p = P[F_{\text{df}_{\text{Trt}}, \text{df}_{\text{E}}} \ge F_0]
  • Rejection Region

    • Reject if p < \alpha.

ANOVA Assumptions: Test for Variance

  • We will use the leveneTest() function from the car package.

    • Note: I do not load the car package because it overwrites a necessary function in tidyverse.
car::leveneTest(model_results)
  • In our dental example,
car::leveneTest(m)

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_1 = \sigma^2_2 = \sigma^2_3 = \sigma^2_4
    • H_1: at least one \sigma^2_i is different
  • Test Statistic and p-Value

    • F_0 = 0.734
    • p = 0.547
  • Rejection Region

    • Reject if p < \alpha; \alpha=0.01.
  • Conclusion/Interpretation

    • Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

  • We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • We also discussed how to assess the assumptions:

    • Graphically using the anova_check() function.

    • Confirming the variance assumption using the BFL.

  • If we break an assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Kruskal-Wallis Test

  • If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.

  • The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.

  • Our new hypotheses are

    • H_0: M_1 = ... = M_k
    • H_1: at least one M_i is different

Kruskal-Wallis Test

  • The test statistic is as follows:

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1),

  • where

    • R_i is the sum of the ranks for group i,
    • n_i is the sample size for group i,
    • n = \sum_{i=1}^k n_i = total sample size, and
    • k is the number of groups.
  • H follows a \chi^2 distribution with k-1 degrees of freedom.

Kruskal-Wallis Test

  • Hypotheses

    • H_0: \ M_1 = ... = M_k
    • H_1: at least one M_i is different
  • Test Statistic

    • \chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1)
  • p-Value

    • p = P[\chi^2_{k-1} \ge \chi^2_0]
  • Rejection Region

    • Reject H_0 if p < \alpha

Kruskal-Wallis Test

  • We will use the kruskal.test() function to perform the Kruskal-Wallis test.
kruskal.test(continuous_variable ~ grouping_variable, 
             data = dataset_name)
  • Applying this to our dental dataset,
kruskal.test(strength ~ system, data = data)

    Kruskal-Wallis rank sum test

data:  strength by system
Kruskal-Wallis chi-squared = 12.515, df = 3, p-value = 0.005812

Example

  • Hypotheses

    • H_0: \ M_1 = M_2 = M_3 = M_4
    • H_1: at least one M_i is different
  • Test Statistic and p-Value

    • \chi_0^2 = 12.515
    • p = 0.006
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.01.
  • Conclusion/Interpretation

    • Reject H_0. There is sufficient evidence to suggest that there is a difference in strength between the four systems.

Kruskal-Wallis: Posthoc Testing

  • We can also perform posthoc testing in the Kruskal-Wallis setting.

  • The set up is just like Tukey’s – we can perform all pairwise comparisons and control for the Type I error rate.

  • Instead of using |\bar{y}_i - \bar{y}_j|, we will use |\bar{R}_i - \bar{R}_j|, where \bar{R}_i is the average rank of group i.

  • The comparison we are making:

    • We declare M_i \ne M_j if |\bar{R}_i - \bar{R}_j| \ge KW, where KW = \frac{q_{\alpha}(k, \infty)}{\sqrt{2}} \sqrt{\frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} and q_{\alpha}(k, \infty) is the critical value from the Studentized range distribution.

Kruskal-Wallis: Posthoc Testing

kruskalmc(continuous_variable ~ grouping_variable, 
          data = dataset_name)
  • In our example,
library(pgirmess) 
kruskalmc(strength ~ system, 
          alpha = 0.01,
          data = data)
Multiple comparison test after Kruskal-Wallis 
alpha: 0.01 
Comparisons
                 obs.dif critical.dif stat.signif
Ceramic-Cimara       0.2      11.7637       FALSE
Ceramic-Cojet        7.9      11.7637       FALSE
Ceramic-Silistor    10.3      11.7637       FALSE
Cimara-Cojet         8.1      11.7637       FALSE
Cimara-Silistor     10.5      11.7637       FALSE
Cojet-Silistor       2.4      11.7637       FALSE
  • Which pairs are significantly different?

Wrap Up

  • Today we have talked about assessing ANOVA assumptions and performing the nonparametric alternative, the Kruskal-Wallis.

  • Per usual, we should only look at posthoc testing when we’ve detected an overall difference with the Kruskal-Wallis.

  • Next lecture: two-way ANOVA.