We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
Equal variance between groups.
Normal distribution.
We will now extend our knowledge of checking assumptions.
y_{ij} = \mu + \tau_i + \varepsilon_{ij}
where:
We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}
Normality: quantile-quantile plot
Variance: scatterplot of the residuals against the predicted values
Like with t-tests, we will assess these assumptions graphically.
We will use the ANOVA_assumptions() function from library(ssstats) to request the graphs necessary to asssess our assumptions.
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:
We can formally check the variance assumption with the Brown-Forsythe-Levene test (yes, from Module 1!).
Hypotheses
Recall the variances_HT() function from library(ssstats).
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. How should we change the following code?
To investigate this, data is collected (magical_studies) on a random sample of ponies from each pony type (pony_type). Twilight Sparkle wants to know if there is difference in magical coordination scores (coordination_score) among the four pony types.
Let’s now check the ANOVA assumptions. Our updated code:
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Unicorn = σ²_Pegasus = σ²_Earth = σ²_Alicorn
Alternative: At least one variance is different
Test statistic: F(3,136) = 0.185
p-value: p = 0.906
Conclusion: Fail to reject the null hypothesis (p = 0.9063 ≥ α = 0.05)
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
We also discussed how to assess the assumptions:
Graphically using the ANOVA_assumptions() function.
Confirming the variance assumption using the BFL (variances_HT()).
If we break either assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.
If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Hypotheses
\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \sim \chi^2_{\text{df}}
where
kruskal_HT() function from library(ssstats) to perform the Kruskal-Wallis test.Twilight Sparkle is now conducting an experiment to evaluate the magical pulse activity of a new alchemical potion. She hypothesizes that the potion may affect ponies depending type. To investigate, she carefully measures the number of magical pulses emitted per minute after administering the potion to different ponies.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful.
Let’s explore the data first. Due to number of groups, we know either ANOVA or Kruskal-Wallis is required.
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
How should we change the following code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Let’s now perform the appropriate hypothesis test. Test at the \alpha=0.05 level.
Our updated code,
z_0 = \frac{|\bar{R}_i - \bar{R}_j|}{\sqrt{ \frac{n(n+1)}{12} \left( \frac{1}{n_i} + \frac{1}{n_j} \right) }}
!! WAIT !! What about adjusting \alpha?
The function we will be using allows us to turn on/off the adjustment for multiple comparison.
To adjust the p-value directly,
p_{\text{B}} = \min(p \times m,\ 1)
\alpha_{\text{B}} = \frac{\alpha}{m}
We will use the posthoc_dunn() function from library(ssstats) to perform Dunn’s posthoc test.
When we want to adjust \alpha (Bonferroni):
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). How should we change this code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we are not interested in adjusting for multiple comparisons (this is an exploratory study). Our updated code,
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?
For each pony, she collects data (magical_pulse) and records the number of magical pulses observed during a one-minute interval (pulse). Twilight suspects that the average number of pulses might differ slightly between groups (pony_type), but she is unsure whether any differences are meaningful. Test at the \alpha=0.05 level.
Let’s now examine posthoc testing. Suppose we do want toadjust for multiple comparisons (this is a confirmatory study). How should we change this code?
| Pairwise Comparison | Unadjusted p | Adjusted p |
|---|---|---|
| Earth vs. Pegasus | 0.027 | 0.080 |
| Earth vs. Unicorn | 0.338 | 1.000 |
| Pegasus vs. Unicorn | 0.001 | 0.004 |
STA4173 - Biostatistics - Fall 2025