ANOVA Assumptions
Kruskal-Wallis

Introduction: Topics

  • We have discussed one-way ANOVA.
    • This allows us to compare a continuous variable across multiple groups.
    • One-way ANOVA with two groups is a fancy two-sample t-test.
  • Today, we will continue on with one-way ANOVA:
    • ANOVA assumptions
    • Nonparametric alternative: Kruskal-Wallis
      • posthoc testing

Introduction: ANOVA Assumptions

  • We previously discussed testing three or more means using ANOVA.

  • We also discussed that ANOVA is an extension of the two-sample t-test.

  • Recall that the t-test has two assumptions:

    • Equal variance between groups.

    • Normal distribution.

  • We will now extend our knowledge of checking assumptions.

ANOVA Assumptions: Definition

  • We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

  • where:

    • y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
    • \mu is the overall (grand) mean,
    • \tau_i is the treatment effect for group i, and
    • \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

  • We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • Very important note: the assumption is on the error term and NOT on the outcome!

  • We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

  • Normality: quantile-quantile plot

    • Should have points close to the 45^\circ line
    • We will focus on the “center” portion of the plot
  • Variance: scatterplot of the residuals against the predicted values

    • Should be “equal spread” between the groups
    • No “pattern”

ANOVA Assumptions: Graphical Assessment (R)

  • Like with t-tests, we will assess these assumptions graphically.

  • We will use the ANOVA_assumptions() function from library(ssstats) to request the graphs necessary to asssess our assumptions.

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)

ANOVA Assumptions: Graphical Assessment

  • Running the code,
magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)

ANOVA Assumptions: Test for Variance (R)

  • We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).

  • Hypotheses

    • H_0: \ \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_k
    • H_1: at least one \sigma^2_i is different.
  • Recall the variances_HT() function from library(ssstats).

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

  • To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.

  • Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

ANOVA Assumptions: Test for Variance

  • Running the code,
magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn 
Alternative: At least one variance is different 
Test statistic: F(3,136) = 0.185 
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)

ANOVA Assumptions: Test for Variance

  • Hypotheses

    • H_0: \ \sigma^2_{\text{unicorn}} = \sigma^2_{\text{pegasus}} = \sigma^2_{\text{earth}} = \sigma^2_{\text{alicorn}}
    • H_1: at least one \sigma^2_i is different
  • Test Statistic and p-Value

    • F_0 = 0.734; p = 0.906
  • Rejection Region

    • Reject if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation

    • Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

  • We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

  • We also discussed how to assess the assumptions:

    • Graphically using the ANOVA_assumptions() function.

    • Confirming the variance assumption using the BFL (variances_HT()).

  • If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Hypothesis Testing: Kruskal-Wallis

  • If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.

    • The Kruskal-Wallis is an extension of the Wilcoxon rank sum (as ANOVA is an extension of the two-sample t-test).
  • The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.

  • Hypotheses

    • H_0: M_1 = ... = M_k
    • H_1: at least one M_i is different

Hypothesis Testing: Kruskal-Wallis

  • Test Statistic

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}

  • where

    • R_i is the sum of the ranks for group i,
    • n_i is the sample size for group i,
    • n = \sum_{i=1}^k n_i = total sample size,
    • k is the number of groups, and
    • \text{df} = k-1

Hypothesis Testing: Kruskal-Wallis (R)

  • We will use the kruskal_HT() function from library(ssstats) to perform the Kruskal-Wallis test.
dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

  • Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.

  • Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.

Hypothesis Testing: Kruskal-Wallis

  • She collects data (magical_pulse) for each pony and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
magical_pulse %>% 
  group_by(pony_type) %>%
  mean_median(pulse)
# A tibble: 3 × 4
  pony_type variable mean_sd   median_iqr
  <chr>     <chr>    <chr>     <chr>     
1 Earth     pulse    5.0 (2.3) 4.5 (3.0) 
2 Pegasus   pulse    6.1 (2.2) 5.5 (2.0) 
3 Unicorn   pulse    4.5 (2.2) 4.0 (3.0) 

Hypothesis Testing: Kruskal-Wallis

magical_pulse %>% ANOVA_assumptions(continuous = pulse,
                                    grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.

  • How should we change the following code?

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.

  • Our updated code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

  • Running the code,
magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)
Kruskal–Wallis Rank-Sum Test

H₀: M_Earth = M_Pegasus = M_Unicorn
H₁: At least one group is different

Test Statistic: X(2) = 10.616,
p = 0.005
Conclusion: Reject the null hypothesis (p = 0.005 < α = 0.05)

Hypothesis Testing: Kruskal-Wallis

  • Hypotheses
    • H_0: \ M_{\text{earth}} = M_{\text{pegasus}} = M_{\text{unicorn}}
    • H_1: at least one M_i is different
  • Test Statistic and p-Value
    • \chi_0^2 = 10.616; p = 0.005
  • Rejection Region
    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion/Interpretation
    • Reject H_0 (p \text{ vs } \alpha \to p = 0.005 < 0.05). There is sufficient evidence to suggest that there is a difference in pulse between the pony types.

Posthoc Testing: Dunn’s Test

  • We can also perform posthoc testing in the Kruskal-Wallis setting using Dunn’s test.
    • Rather than compare pairwise means, it compares pairwise average ranks.
  • Hypotheses:
    • H_0: \ M_{i} = M_{j}
    • H_1: \ M_{i} \ne M_{j}
  • Test Statistic:

z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Dunn’s Test

  • !! WAIT !! What about adjusting \alpha?

  • The function we will be using allows us to turn on/off the adjustment for multiple comparison.

  • To adjust the p-value directly,

p_{\text{B}} = \min(p \times m,\ 1)

  • The adjustment can also be made directly to \alpha (and not p),

\alpha_{\text{B}} = \frac{\alpha}{m}

Posthoc Testing: Dunn’s Test (R)

  • We will use the posthoc_dunn() function from library(ssstats) to perform Dunn’s posthoc test.

  • When we want to adjust \alpha (Bonferroni):

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE)
  • When we do not want to adjust \alpha:
dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

Posthoc Testing: Dunn’s Test

  • Running the code,
magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)
         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.027
2   Earth - Unicorn  0.9574435 0.338
3 Pegasus - Unicorn  3.1757915 0.001

Posthoc Testing: Dunn’s Test

  • Restating results,
    • M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.027)
    • M_{\text{earth}} = M_{\text{unicorn}} (p = 0.338)
    • M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.001)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

  • For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.

  • Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

Posthoc Testing: Dunn’s Test

  • Running the code,
magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)
         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.080
2   Earth - Unicorn  0.9574435 1.000
3 Pegasus - Unicorn  3.1757915 0.004

Posthoc Testing: Dunn’s Test

  • Restating results,
    • M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.080)
    • M_{\text{earth}} = M_{\text{unicorn}} (p = 1)
    • M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.004)

Posthoc Testing: Dunn’s Test

  • Comparing the two side by side,
Pairwise Comparison Unadjusted p Adjusted p
Earth vs. Pegasus 0.027 0.080
Earth vs. Unicorn 0.338 1.000
Pegasus vs. Unicorn 0.001 0.004

Wrap Up

  • Today we have covered one-way ANOVA.
    • One-way ANOVA table
    • Test for equality among k means
    • ANOVA assumptions
    • Nonparametric alternative (Kruskal-Wallis)
    • Posthoc testing
      • Tukey’s (ANOVA, ajdusted)
      • Fisher’s (ANOVA, unadjusted)
      • Dunn’s (Kruskal-Wallis, adjusted or unadjusted)

Wrap Up

  • Next class:
    • Lab: One-way ANOVA and Kruskal-Wallis
    • Quiz: One-way ANOVA and Kruskal-Wallis
  • Next week:
    • Monday: No class (Columbus Day)
    • Tuesday: Catch up period, 4/305, 9:30-10:45
    • Meeting 2: Two-way ANOVA lecture