ANOVA Assumptions
Kruskal-Wallis

Introduction: Topics

We have discussed one-way ANOVA.
- This allows us to compare a continuous variable across multiple groups.
- One-way ANOVA with two groups is a fancy two-sample t-test.
Today, we will continue on with one-way ANOVA:
- ANOVA assumptions
- Nonparametric alternative: Kruskal-Wallis
  - posthoc testing

Introduction: ANOVA Assumptions

We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
- Equal variance between groups.
- Normal distribution.
We will now extend our knowledge of checking assumptions.

ANOVA Assumptions: Definition

We can represent ANOVA with the following model:

y_{ij} = \mu + \tau_i + \varepsilon_{ij}

where:
- y_{ij} is the j^{\text{th}} observation in the i^{\text{th}} group,
- \mu is the overall (grand) mean,
- \tau_i is the treatment effect for group i, and
- \varepsilon_{ij} is the error term for the j^{\text{th}} observation in the i^{\text{th}} group.

ANOVA Assumptions: Definition

We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}

ANOVA Assumptions: Graphical Assessment

Normality: quantile-quantile plot
- Should have points close to the 45^\circ line
- We will focus on the “center” portion of the plot
Variance: scatterplot of the residuals against the predicted values
- Should be “equal spread” between the groups
- No “pattern”

ANOVA Assumptions: Graphical Assessment (R)

Like with t-tests, we will assess these assumptions graphically.
We will use the ANOVA_assumptions() function from library(ssstats) to request the graphs necessary to asssess our assumptions.

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% ANOVA_assumptions(continuous = continuous_variable,
                                   grouping = grouping_variable)

ANOVA Assumptions: Graphical Assessment

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)

ANOVA Assumptions: Graphical Assessment

Running the code,

magical_studies %>% ANOVA_assumptions(continuous = coordination_score,
                                      grouping = pony_type)

ANOVA Assumptions: Test for Variance (R)

We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).
Hypotheses
- H_0: \ \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_k
- H_1: at least one \sigma^2_i is different.
Recall the variances_HT() function from library(ssstats).

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?

dataset_name %>% variances_HT(continuous = continuous_variable,
                              grouping = grouping_variable)

ANOVA Assumptions: Test for Variance

To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

ANOVA Assumptions: Test for Variance

Running the code,

magical_studies %>% variances_HT(continuous = coordination_score,
                                 grouping = pony_type)

Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn 
Alternative: At least one variance is different 
Test statistic: F(3,136) = 0.185 
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)

ANOVA Assumptions: Test for Variance

Hypotheses
- H_0: \ \sigma^2_{\text{unicorn}} = \sigma^2_{\text{pegasus}} = \sigma^2_{\text{earth}} = \sigma^2_{\text{alicorn}}
- H_1: at least one \sigma^2_i is different
Test Statistic and p-Value
- F_0 = 0.734; p = 0.906
Rejection Region
- Reject if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Fail to reject H_0. There is not sufficient evidence to suggest that the variances are different (i.e., the variance assumption is not broken).

Introduction: Kruskal-Wallis

We just discussed the ANOVA assumptions.

\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)

We also discussed how to assess the assumptions:
- Graphically using the ANOVA_assumptions() function.
- Confirming the variance assumption using the BFL (variances_HT()).
If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.

Hypothesis Testing: Kruskal-Wallis

If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
- The Kruskal-Wallis is an extension of the Wilcoxon rank sum (as ANOVA is an extension of the two-sample t-test).
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Hypotheses
- H_0: M_1 = ... = M_k
- H_1: at least one M_i is different

Hypothesis Testing: Kruskal-Wallis

Test Statistic

\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}

where
- R_i is the sum of the ranks for group i,
- n_i is the sample size for group i,
- n = \sum_{i=1}^k n_i = total sample size,
- k is the number of groups, and
- \text{df} = k-1

Hypothesis Testing: Kruskal-Wallis (R)

We will use the kruskal_HT() function from library(ssstats) to perform the Kruskal-Wallis test.

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.

Hypothesis Testing: Kruskal-Wallis

She collects data (magical_pulse) for each pony and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.

magical_pulse %>% 
  group_by(pony_type) %>%
  mean_median(pulse)

# A tibble: 3 × 4
  pony_type variable mean_sd   median_iqr
  <chr>     <chr>    <chr>     <chr>     
1 Earth     pulse    5.0 (2.3) 4.5 (3.0) 
2 Pegasus   pulse    6.1 (2.2) 5.5 (2.0) 
3 Unicorn   pulse    4.5 (2.2) 4.0 (3.0)

Hypothesis Testing: Kruskal-Wallis

magical_pulse %>% ANOVA_assumptions(continuous = pulse,
                                    grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
How should we change the following code?

dataset_name %>% kruskal_HT(continuous = continuous_variable,
                            grouping = grouping_variable,
                            alpha = specified_alpha)

Hypothesis Testing: Kruskal-Wallis

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
Our updated code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Hypothesis Testing: Kruskal-Wallis

Running the code,

magical_pulse %>% kruskal_HT(continuous = pulse,
                             grouping = pony_type)

Kruskal–Wallis Rank-Sum Test

H₀: M_Earth = M_Pegasus = M_Unicorn
H₁: At least one group is different

Test Statistic: X(2) = 10.616,
p = 0.005
Conclusion: Reject the null hypothesis (p = 0.005 < α = 0.05)

Hypothesis Testing: Kruskal-Wallis

Hypotheses
- H_0: \ M_{\text{earth}} = M_{\text{pegasus}} = M_{\text{unicorn}}
- H_1: at least one M_i is different
Test Statistic and p-Value
- \chi_0^2 = 10.616; p = 0.005
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion/Interpretation
- Reject H_0 (p \text{ vs } \alpha \to p = 0.005 < 0.05). There is sufficient evidence to suggest that there is a difference in pulse between the pony types.

Posthoc Testing: Dunn’s Test

We can also perform posthoc testing in the Kruskal-Wallis setting using Dunn’s test.
- Rather than compare pairwise means, it compares pairwise average ranks.
Hypotheses:
- H_0: \ M_{i} = M_{j}
- H_1: \ M_{i} \ne M_{j}
Test Statistic:

z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}

Posthoc Testing: Dunn’s Test

!! WAIT !! What about adjusting \alpha?
The function we will be using allows us to turn on/off the adjustment for multiple comparison.
To adjust the p-value directly,

p_{\text{B}} = \min(p \times m,\ 1)

The adjustment can also be made directly to \alpha (and not p),

\alpha_{\text{B}} = \frac{\alpha}{m}

Posthoc Testing: Dunn’s Test (R)

We will use the posthoc_dunn() function from library(ssstats) to perform Dunn’s posthoc test.
When we want to adjust \alpha (Bonferroni):

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE)

When we do not want to adjust \alpha:

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

Posthoc Testing: Dunn’s Test

Running the code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = FALSE)

         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.027
2   Earth - Unicorn  0.9574435 0.338
3 Pegasus - Unicorn  3.1757915 0.001

Posthoc Testing: Dunn’s Test

Restating results,
- M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.027)
- M_{\text{earth}} = M_{\text{unicorn}} (p = 0.338)
- M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.001)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

dataset_name %>% posthoc_dunn(continuous = continuous_variable,
                              grouping = grouping_variable,
                              adjust = TRUE OR FALSE)

Posthoc Testing: Dunn’s Test

For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

Posthoc Testing: Dunn’s Test

Running the code,

magical_pulse %>% posthoc_dunn(continuous = pulse,
                               grouping = pony_type,
                               adjust = TRUE)

         Comparison          Z     p
1   Earth - Pegasus -2.2183479 0.080
2   Earth - Unicorn  0.9574435 1.000
3 Pegasus - Unicorn  3.1757915 0.004

Posthoc Testing: Dunn’s Test

Restating results,
- M_{\text{earth}} \ne M_{\text{pegasus}} (p = 0.080)
- M_{\text{earth}} = M_{\text{unicorn}} (p = 1)
- M_{\text{pegasus}} \ne M_{\text{unicorn}} (p = 0.004)

Posthoc Testing: Dunn’s Test

Comparing the two side by side,

Pairwise Comparison	Unadjusted p	Adjusted p
Earth vs. Pegasus	0.027	0.080
Earth vs. Unicorn	0.338	1.000
Pegasus vs. Unicorn	0.001	0.004

Wrap Up

Today we have covered one-way ANOVA.
- One-way ANOVA table
- Test for equality among k means
- ANOVA assumptions
- Nonparametric alternative (Kruskal-Wallis)
- Posthoc testing
  - Tukey’s (ANOVA, ajdusted)
  - Fisher’s (ANOVA, unadjusted)
  - Dunn’s (Kruskal-Wallis, adjusted or unadjusted)

Wrap Up

Next class:
- Lab: One-way ANOVA and Kruskal-Wallis
- Quiz: One-way ANOVA and Kruskal-Wallis
Next week:
- Monday: No class (Columbus Day)
- Tuesday: Catch up period, 4/305, 9:30-10:45
- Meeting 2: Two-way ANOVA lecture

ANOVA AssumptionsKruskal-Wallis

Introduction: Topics

Introduction: ANOVA Assumptions

ANOVA Assumptions: Definition

ANOVA Assumptions: Definition

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment (R)

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Graphical Assessment

ANOVA Assumptions: Test for Variance (R)

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

ANOVA Assumptions: Test for Variance

Introduction: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis (R)

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Hypothesis Testing: Kruskal-Wallis

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test (R)

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Posthoc Testing: Dunn’s Test

Wrap Up

Wrap Up

ANOVA Assumptions
Kruskal-Wallis