One-Way Analysis of Variance

STA4173: Biostatistics
Spring 2025

Introduction: Analysis of Variance

We have previously discussed testing the difference between two groups.
- What about when there are three or more groups?
We will use a method called analysis of variance (ANOVA).
- This method partitions the variance of the outcome into variance due to the groups and variance due to “other” factors.
Fun fact: the two-sample t-test is a special case of ANOVA.
- If you perform ANOVA when comparing two means, you will obtain the same results as the two-sample t-test.

Hypotheses

Hypotheses all take the same form:
- H_0: \ \mu_1 = \mu_2 = ... = \mu_k
- H_1: at least one is different
Note 1: you must fill in the “k” when writing hypotheses!
- e.g., if there are four means, your hypotheses are
  - H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
  - H_1: at least one is different
Note 2: ANOVA does not tell us which means are different, just if a general difference exists!

ANOVA Table

The computations for ANOVA are more involved than what we’ve seen before.
An ANOVA table will be constructed in order to perform the hypothesis test.

Source	Sum of Squares	df	Mean Squares	F
Treatment	SS_Trt	df_Trt	MS_Trt	F₀
Error	SS_E	df_E	MS_E
Total	SS_Tot	df_Tot

Once this is put together, we can perform the hypothesis test.
- Our test statistic is the F₀.

The F Distribution

The F distribution is derived as the ratio of two variances.
- The variances each have degrees of freedom: df_numerator and df_denominator
The F distribution’s shape depends on the df,

ANOVA Computations

Again, here’s where we are headed with our computations:

Source	Sum of Squares	df	Mean Squares	F
Treatment	SS_Trt	df_Trt	MS_Trt	F₀
Error	SS_E	df_E	MS_E
Total	SS_Tot	df_Tot

We are partitioning the variance of our outcome into:
- Variance due to the grouping (treatment)
- Variance due to “other” factors (error)
  - Think of this like a “catch all” for other sources of error – things we did not adjust for in our model.

ANOVA Computations

Before we begin our computations, it would be helpful if we know

\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2

where,
- \bar{x} is the overall mean,
- n_i is the sample size for group i,
- \bar{x}_i is the mean for group i, and
- s_i^2 is the variance for group i

ANOVA Computations

We begin our computations with the sums of squares:

\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}

and each sum of squares has degrees of freedom:
- \text{df}_{\text{Trt}} = k-1 (number of groups – 1)
- \text{df}_{\text{E}} = n-k (overall sample size – number of groups)
- \text{df}_{\text{Tot}} = n-1 (overall sample size – 1) = \text{df}_{\text{Trt}} + \text{df}_{\text{E}}

ANOVA Computations

Once we have the sum of squares and corresponding degrees of freedom, we have the mean squares.
Generally, mean squares are the sum of square divided by the df, \text{MS}_X = \frac{\text{SS}_X}{\text{df}_X}
In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}
- Note that there is no \text{MS}_{\text{Tot}}!

ANOVA Computations

Finally, we have the test statistic.
Generally, we construct an F for ANOVA by dividing the MS of interest by MS_{\text{E}}, F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}
In one-way ANOVA, we are only constructing the F for treatment, F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

ANOVA Computations

We are finally done constructing our ANOVA table! As a reminder,

Source	Sum of Squares	df	Mean Squares	F
Treatment	SS_Trt	df_Trt	MS_Trt	F₀
Error	SS_E	df_E	MS_E
Total	SS_Tot	df_Tot

ANOVA: R Syntax

We can use the aov() and summary() functions.

m <- aov(continuous_variable ~ grouping_variable,
         data = dataset_name)
summary(m)

However, note that ANOVA is regression (and regression is ANOVA).
- We can also use lm() to define the model and anova() to construct the ANOVA table.

m <- lm(continuous_variable ~ grouping_variable,
         data = dataset_name)
anova(m)

Example - Dental

Prosthodontists specialize in the restoration of oral function, including the use of dental implants, veneers, dentures, and crowns. A researcher wanted to compare the shear bond strength of different repair kits for repairs of chipped porcelain veneer.
He randomly divided 20 porcelain specimens into four treatment groups: group 1 used the Cojet system, group 2 used the Silistor system, group 3 used the Cimara system, and group 4 used the Ceramic Repair system.
At the conclusion of the study, shear bond strength (in megapascals, MPa) was measured according to ISO 10477. The data are as follows,

strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
              17.2, 14.3, 17.6, 21.6, 17.5,
               5.5,  7.7, 12.2, 11.4, 16.4,
              11.0, 12.4, 13.5,  8.9,  8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)

Example - Dental

What is the continuous variable?
What is the grouping variable?

Example - Dental

Our first step will be to construct an ANOVA table for the data.

m1 <- aov(strength ~ system, data = data)
summary(m1)

            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

m2 <- lm(strength ~ system, data = data)
anova(m2)

Hypothesis Testing

Hypotheses
- H_0: \ \mu_1 = \mu_2 = ... = \mu_k
- H_1: at least one mean is different
Test Statistic
- F_0 (pulled from the ANOVA table)
p-Value
- p = P[F_0 \ge F_{\text{df}_{\text{Trt}}, \text{df}_{\text{E}}}]
Rejection Region
- Reject H_0 if p<\alpha.

Example - Dental

Using the dental data:

summary(m1)

            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Determine if there is a difference in average strength between the groups. Test at the \alpha=0.01 level.

Example - Dental

Hypotheses
- H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
- H_1: at least one mean is different
Test Statistic and p-Value
- F_0 = 7.545
- p = 0.002
Rejection Region
- Reject H_0 if p<\alpha; \alpha = 0.01.
Conclusion/Interpretation
- Reject H_0. There is sufficient evidence to suggest that there is a difference in average strength between the four groups.

Introduction: Posthoc Testing

Today we have introduced ANOVA. Recall the hypotheses,
- H_0: \mu_1 = \mu_2 = ... = \mu_k
- H_1: at least one \mu_i is different
The F test does not tell us which mean is different… only that a difference exists.
In theory, we could perform repeated t tests to determine pairwise differences.
- Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.
- However, this will increase the Type I error rate (\alpha).

Introduction: Posthoc Testing

Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.
- i.e., we are saying there is a difference between the means when there is actually not a difference.
Suppose we are comparing 5 groups.
- This is 10 pairwise comparisons!!
  - 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
- If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵

Introduction: Posthoc Testing

When performing posthoc comparisons, we can choose one of two paths:
- Control the Type I (familywise) error rate.
- Do not control the Type I error rate.
Note that controlling the Type I error rate is more conservative than when we do not control it.
- “Conservative” = more difficult to reject.
Generally, statisticians:
- do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.
- do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.

Introduction: Posthoc Testing

The posthoc tests we will learn:
- Tukey’s test
  - Performs all pairwise tests and controls the Type I error rate
- Fisher’s least significant difference
  - Performs all pairwise tests but does not control the Type I error rate
- Dunnett’s test
  - Compares each group to a control group and controls the Type I error rate
Caution: we should only perform posthoc tests if we have determined that a general difference exists!
- i.e., we rejected when looking at the F test in ANOVA

Example

Recall the dental example from earlier,

m1 <- aov(strength ~ system, data = data)
summary(m1)

            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Are we justified in posthoc testing? (Recall: \alpha=0.01).

Tukey’s Test

Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.
The underlying idea of the comparison:
- We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge W, where W = \frac{q_{\alpha}(k, \text{df}_{\text{E}})}{\sqrt{2}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}
  - q_{\alpha}(k, \text{df}_{\text{E}}) is the critical value from the Studentized range distribution.
We will use the TukeyHSD() function.
- Note that this requires us to have created our model using the aov() function.

m <- aov(continuous_variable ~ grouping_variable, data = dataset_name)
TukeyHSD(m)$grouping_variable

Tukey’s Test

Let’s apply Tukey’s to the dental data.

m <- aov(strength ~ system, data = data)
TukeyHSD(m, conf.level = 0.99)$system

                  diff         lwr       upr       p adj
Cimara-Ceramic   -0.14 -7.04151507  6.761515 0.999845202
Cojet-Ceramic     5.50 -1.40151507 12.401515 0.044147158
Silistor-Ceramic  6.86 -0.04151507 13.761515 0.010458208
Cojet-Cimara      5.64 -1.26151507 12.541515 0.038206781
Silistor-Cimara   7.00  0.09848493 13.901515 0.008990873
Silistor-Cojet    1.36 -5.54151507  8.261515 0.886304336

Which are significantly different at the \alpha=0.01 level?

Fisher’s Test

Fisher’s allows us to test all pairwise comparisons but control the \alpha.
The underlying idea of the comparison:
- We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge \text{LSD}, where \text{LSD} = t_{1-\alpha/2, \text{df}_\text{E}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}
We will use the LSD.test() function from the agricolae package.
- Note that, like Tukey’s, this requires us to have created our model using the aov() function.

library(agricolae)
results <- summary(m)
(LSD.test(dataset_name$continuous_variable, # continuous outcome
          dataset_name$grouping_variable, # grouping variable
          results[[1]]$Df[2], # df_E
          results[[1]]$`Mean Sq`[2], # MSE
          alpha = alpha_level) # can omit if alpha = 0.05
  )[5] # limit to only the pairwise comparison results

Fisher’s Test

Let’s apply Fisher’s to the dental data.

library(agricolae)
results <- summary(m)
LSD.test(data$strength, 
         data$system, 
         results[[1]]$Df[2], 
         results[[1]]$`Mean Sq`[2],
         alpha = 0.01)[5]

$groups
         data$strength groups
Silistor         17.64      a
Cojet            16.28      a
Ceramic          10.78      b
Cimara           10.64      b

Which are significantly different at the \alpha=0.01 level?

Dunnett’s Test

Dunnett’s test allows us to do all pairwise comparisons against only the control, while controlling \alpha.
- This has fewer comparisons than Tukey’s because we are not comparing non-control groups to one another.
- i.e., we are sharing the \alpha between fewer comparisons now, which is preferred if we are not interested in the comparisons between non-control groups.
The underlying idea of the comparison:
- We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge D, where D = d_{\alpha}(k-1, \text{df}_{\text{E}}) \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_c} \right)},
  - d_{\alpha}(k-1, \text{df}_{\text{E}}) is the critical value from Dunnett’s table.

Dunnett’s Test

We will use the DunnettTest() function from the DescTools package to perform Dunnett’s test.

library(DescTools)
DunnettTest(x=dataset_name$continuous_variable, 
            g=dataset_name$grouping_variable, 
            control = "name of control group")

The p-values are adjusted, so you can directly compare them to the specified \alpha.

Dunnett’s Test

Let’s apply Dunnett’s to the dental data.
- We will treat “Ceramic” as the control group.

library(DescTools)
DunnettTest(x=data$strength, 
            g=data$system, 
            control = "Ceramic")


  Dunnett's test for comparing several treatments with a control :  
    95% family-wise confidence level

$Ceramic
                  diff     lwr.ci    upr.ci   pval    
Cimara-Ceramic   -0.14 -5.0138317  4.733832 0.9997    
Cojet-Ceramic     5.50  0.6261683 10.373832 0.0258 *  
Silistor-Ceramic  6.86  1.9861683 11.733832 0.0058 ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which are significantly different at the \alpha=0.01 level?