One-Way Analysis of Variance

STA4173: Biostatistics
Spring 2025

Introduction: Analysis of Variance

  • We have previously discussed testing the difference between two groups.

    • What about when there are three or more groups?
  • We will use a method called analysis of variance (ANOVA).

    • This method partitions the variance of the outcome into variance due to the groups and variance due to “other” factors.
  • Fun fact: the two-sample t-test is a special case of ANOVA.

    • If you perform ANOVA when comparing two means, you will obtain the same results as the two-sample t-test.

Hypotheses

  • Hypotheses all take the same form:

    • H_0: \ \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one is different
  • Note 1: you must fill in the “k” when writing hypotheses!

    • e.g., if there are four means, your hypotheses are

      • H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
      • H_1: at least one is different
  • Note 2: ANOVA does not tell us which means are different, just if a general difference exists!

ANOVA Table

  • The computations for ANOVA are more involved than what we’ve seen before.

  • An ANOVA table will be constructed in order to perform the hypothesis test.

Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot
  • Once this is put together, we can perform the hypothesis test.

    • Our test statistic is the F0.

The F Distribution

  • The F distribution is derived as the ratio of two variances.

    • The variances each have degrees of freedom: dfnumerator and dfdenominator
  • The F distribution’s shape depends on the df,

ANOVA Computations

  • Again, here’s where we are headed with our computations:
Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot
  • We are partitioning the variance of our outcome into:

    • Variance due to the grouping (treatment)

    • Variance due to “other” factors (error)

      • Think of this like a “catch all” for other sources of error – things we did not adjust for in our model.

ANOVA Computations

  • Before we begin our computations, it would be helpful if we know

\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2

  • where,
    • \bar{x} is the overall mean,
    • n_i is the sample size for group i,
    • \bar{x}_i is the mean for group i, and
    • s_i^2 is the variance for group i

ANOVA Computations

  • We begin our computations with the sums of squares:

\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}

  • and each sum of squares has degrees of freedom:
    • \text{df}_{\text{Trt}} = k-1 (number of groups – 1)
    • \text{df}_{\text{E}} = n-k (overall sample size – number of groups)
    • \text{df}_{\text{Tot}} = n-1 (overall sample size – 1) = \text{df}_{\text{Trt}} + \text{df}_{\text{E}}

ANOVA Computations

  • Once we have the sum of squares and corresponding degrees of freedom, we have the mean squares.

  • Generally, mean squares are the sum of square divided by the df, \text{MS}_X = \frac{\text{SS}_X}{\text{df}_X}

  • In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}

    • Note that there is no \text{MS}_{\text{Tot}}!

ANOVA Computations

  • Finally, we have the test statistic.

  • Generally, we construct an F for ANOVA by dividing the MS of interest by MS_{\text{E}}, F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}

  • In one-way ANOVA, we are only constructing the F for treatment, F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}

ANOVA Computations

  • We are finally done constructing our ANOVA table! As a reminder,
Source Sum of Squares df Mean Squares F
Treatment SSTrt dfTrt MSTrt F0
Error SSE dfE MSE
Total SSTot dfTot

ANOVA: R Syntax

  • We can use the aov() and summary() functions.
m <- aov(continuous_variable ~ grouping_variable,
         data = dataset_name)
summary(m)
  • However, note that ANOVA is regression (and regression is ANOVA).
    • We can also use lm() to define the model and anova() to construct the ANOVA table.
m <- lm(continuous_variable ~ grouping_variable,
         data = dataset_name)
anova(m)

Example - Dental

  • Prosthodontists specialize in the restoration of oral function, including the use of dental implants, veneers, dentures, and crowns. A researcher wanted to compare the shear bond strength of different repair kits for repairs of chipped porcelain veneer.

  • He randomly divided 20 porcelain specimens into four treatment groups: group 1 used the Cojet system, group 2 used the Silistor system, group 3 used the Cimara system, and group 4 used the Ceramic Repair system.

  • At the conclusion of the study, shear bond strength (in megapascals, MPa) was measured according to ISO 10477. The data are as follows,

strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
              17.2, 14.3, 17.6, 21.6, 17.5,
               5.5,  7.7, 12.2, 11.4, 16.4,
              11.0, 12.4, 13.5,  8.9,  8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)

Example - Dental

  • What is the continuous variable?

  • What is the grouping variable?

Example - Dental

  • Our first step will be to construct an ANOVA table for the data.
m1 <- aov(strength ~ system, data = data)
summary(m1)
            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2 <- lm(strength ~ system, data = data)
anova(m2)

Hypothesis Testing

  • Hypotheses

    • H_0: \ \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one mean is different
  • Test Statistic

    • F_0 (pulled from the ANOVA table)
  • p-Value

    • p = P[F_0 \ge F_{\text{df}_{\text{Trt}}, \text{df}_{\text{E}}}]
  • Rejection Region

    • Reject H_0 if p<\alpha.

Example - Dental

  • Using the dental data:
summary(m1)
            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Determine if there is a difference in average strength between the groups. Test at the \alpha=0.01 level.

Example - Dental

  • Hypotheses

    • H_0: \ \mu_1 = \mu_2 = \mu_3 = \mu_4
    • H_1: at least one mean is different
  • Test Statistic and p-Value

    • F_0 = 7.545
    • p = 0.002
  • Rejection Region

    • Reject H_0 if p<\alpha; \alpha = 0.01.
  • Conclusion/Interpretation

    • Reject H_0. There is sufficient evidence to suggest that there is a difference in average strength between the four groups.

Introduction: Posthoc Testing

  • Today we have introduced ANOVA. Recall the hypotheses,

    • H_0: \mu_1 = \mu_2 = ... = \mu_k
    • H_1: at least one \mu_i is different
  • The F test does not tell us which mean is different… only that a difference exists.

  • In theory, we could perform repeated t tests to determine pairwise differences.

    • Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.

    • However, this will increase the Type I error rate (\alpha).

Introduction: Posthoc Testing

  • Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.

    • i.e., we are saying there is a difference between the means when there is actually not a difference.
  • Suppose we are comparing 5 groups.

    • This is 10 pairwise comparisons!!

      • 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
    • If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵

Introduction: Posthoc Testing

  • When performing posthoc comparisons, we can choose one of two paths:

    • Control the Type I (familywise) error rate.
    • Do not control the Type I error rate.
  • Note that controlling the Type I error rate is more conservative than when we do not control it.

    • “Conservative” = more difficult to reject.
  • Generally, statisticians:

    • do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.

    • do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.

Introduction: Posthoc Testing

  • The posthoc tests we will learn:

    • Tukey’s test

      • Performs all pairwise tests and controls the Type I error rate
    • Fisher’s least significant difference

      • Performs all pairwise tests but does not control the Type I error rate
    • Dunnett’s test

      • Compares each group to a control group and controls the Type I error rate
  • Caution: we should only perform posthoc tests if we have determined that a general difference exists!

    • i.e., we rejected when looking at the F test in ANOVA

Example

  • Recall the dental example from earlier,
m1 <- aov(strength ~ system, data = data)
summary(m1)
            Df Sum Sq Mean Sq F value  Pr(>F)   
system       3  200.0   66.66   7.545 0.00229 **
Residuals   16  141.4    8.84                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Are we justified in posthoc testing? (Recall: \alpha=0.01).

Tukey’s Test

  • Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge W, where W = \frac{q_{\alpha}(k, \text{df}_{\text{E}})}{\sqrt{2}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}

      • q_{\alpha}(k, \text{df}_{\text{E}}) is the critical value from the Studentized range distribution.
  • We will use the TukeyHSD() function.

    • Note that this requires us to have created our model using the aov() function.
m <- aov(continuous_variable ~ grouping_variable, data = dataset_name)
TukeyHSD(m)$grouping_variable

Tukey’s Test

  • Let’s apply Tukey’s to the dental data.
m <- aov(strength ~ system, data = data)
TukeyHSD(m, conf.level = 0.99)$system
                  diff         lwr       upr       p adj
Cimara-Ceramic   -0.14 -7.04151507  6.761515 0.999845202
Cojet-Ceramic     5.50 -1.40151507 12.401515 0.044147158
Silistor-Ceramic  6.86 -0.04151507 13.761515 0.010458208
Cojet-Cimara      5.64 -1.26151507 12.541515 0.038206781
Silistor-Cimara   7.00  0.09848493 13.901515 0.008990873
Silistor-Cojet    1.36 -5.54151507  8.261515 0.886304336
  • Which are significantly different at the \alpha=0.01 level?

Fisher’s Test

  • Fisher’s allows us to test all pairwise comparisons but control the \alpha.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge \text{LSD}, where \text{LSD} = t_{1-\alpha/2, \text{df}_\text{E}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}
  • We will use the LSD.test() function from the agricolae package.

    • Note that, like Tukey’s, this requires us to have created our model using the aov() function.
library(agricolae)
results <- summary(m)
(LSD.test(dataset_name$continuous_variable, # continuous outcome
          dataset_name$grouping_variable, # grouping variable
          results[[1]]$Df[2], # df_E
          results[[1]]$`Mean Sq`[2], # MSE
          alpha = alpha_level) # can omit if alpha = 0.05
  )[5] # limit to only the pairwise comparison results

Fisher’s Test

  • Let’s apply Fisher’s to the dental data.
library(agricolae)
results <- summary(m)
LSD.test(data$strength, 
         data$system, 
         results[[1]]$Df[2], 
         results[[1]]$`Mean Sq`[2],
         alpha = 0.01)[5]
$groups
         data$strength groups
Silistor         17.64      a
Cojet            16.28      a
Ceramic          10.78      b
Cimara           10.64      b
  • Which are significantly different at the \alpha=0.01 level?

Dunnett’s Test

  • Dunnett’s test allows us to do all pairwise comparisons against only the control, while controlling \alpha.

    • This has fewer comparisons than Tukey’s because we are not comparing non-control groups to one another.

    • i.e., we are sharing the \alpha between fewer comparisons now, which is preferred if we are not interested in the comparisons between non-control groups.

  • The underlying idea of the comparison:

    • We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge D, where D = d_{\alpha}(k-1, \text{df}_{\text{E}}) \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_c} \right)},

      • d_{\alpha}(k-1, \text{df}_{\text{E}}) is the critical value from Dunnett’s table.

Dunnett’s Test

  • We will use the DunnettTest() function from the DescTools package to perform Dunnett’s test.
library(DescTools)
DunnettTest(x=dataset_name$continuous_variable, 
            g=dataset_name$grouping_variable, 
            control = "name of control group")
  • The p-values are adjusted, so you can directly compare them to the specified \alpha.

Dunnett’s Test

  • Let’s apply Dunnett’s to the dental data.

    • We will treat “Ceramic” as the control group.
library(DescTools)
DunnettTest(x=data$strength, 
            g=data$system, 
            control = "Ceramic")

  Dunnett's test for comparing several treatments with a control :  
    95% family-wise confidence level

$Ceramic
                  diff     lwr.ci    upr.ci   pval    
Cimara-Ceramic   -0.14 -5.0138317  4.733832 0.9997    
Cojet-Ceramic     5.50  0.6261683 10.373832 0.0258 *  
Silistor-Ceramic  6.86  1.9861683 11.733832 0.0058 ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Which are significantly different at the \alpha=0.01 level?