Module 2 Review

Putting It All Together

  • One-way ANOVA / Kruskal-Wallis:
    • Continuous outcome.
    • Only one factor (grouping variable).
      • More than two groups.
  • Two-way ANOVA:
    • Continuous outcome.
    • Two factors (grouping variables), A & B.
      • More than two groups in each factor.
    • Interaction between factors A & B.
  • ANOVA assumptions:
    • Normality of residuals.
    • Equal variances.

Example 1

  • Leif has started a veggie-gardening club in Animal Crossing. Four popular fertilizer options are being tried on carrot plots:

    1. Nook’s GreenGro
    2. Leif’s Compost Tea
    3. Brewster’s Grounds
    4. No fertilizer (control)
  • Each option is applied to several 1 m2 carrot plots (one fertilizer per plot). At harvest, villagers record the carrot yield (kg) from each plot.

  • The research question is, do mean carrot yields differ among the four fertilizer options?

Example 1

  • Looking at summary statistics,
# A tibble: 4 × 4
  fertilizer         variable mean_sd   median_iqr
  <fct>              <chr>    <chr>     <chr>     
1 No Fertilizer      yield_kg 1.5 (0.3) 1.5 (0.5) 
2 Nook's GreenGro    yield_kg 1.8 (0.3) 1.7 (0.5) 
3 Leif's Compost Tea yield_kg 2.1 (0.3) 2.2 (0.3) 
4 Brewster's Grounds yield_kg 2.4 (0.3) 2.4 (0.3) 

Example 1

  • Checking the ANOVA assumptions,
  • Normality (qq plots) looks okay.

  • Variance looks okay.

Example 1

  • Checking the ANOVA assumptions,
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Brewster's Grounds = σ²_Leif's Compost Tea = σ²_No Fertilizer = σ²_Nook's GreenGro 
Alternative: At least one variance is different 
Test statistic: F(3,76) = 0.093 
p-value: p = 0.964
Conclusion: Fail to reject the null hypothesis (p = 0.964 ≥ α = 0.05)
  • Is the variance assumption broken?

Example 1

  • Constructing the one-way ANOVA test,
One-Way ANOVA: 
H₀: μ_No Fertilizer = μ_Nook's GreenGro = μ_Leif's Compost Tea = μ_Brewster's Grounds
H₁: At least one group mean is different
Test Statistic: F(3, 76) = 28.785
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
  • Is there a significant difference between the fertilizers?

Example 1

  • Suppose we want to look at posthoc tests to see which fertilizers differ.

  • If we want to adjust, we will use Tukey’s,

  • Which fertilizers differ significantly?

Example 1

  • Suppose we want to look at posthoc tests to see which fertilizers differ.

  • If we want to adjust, we will use Fisher’s,

  • Which fertilizers differ significantly?

Example 2

  • Leif notices some strange patterns this season. We now are testing our four fertilizer options on 1 m2 tomato plots:

    1. Nook’s GreenGro
    2. Leif’s Compost Tea
    3. Brewster’s Grounds
    4. No fertilizer (control)
  • Each option is applied to several 1 m2 tomato plots (one fertilizer per plot). At harvest, villagers record the tomato yield (kg) from each plot.

  • The research question is, do mean tomato yields differ among the four fertilizer options?

Example 2

  • Looking at summary statistics,
# A tibble: 4 × 4
  fertilizer         variable mean_sd   median_iqr
  <fct>              <chr>    <chr>     <chr>     
1 No Fertilizer      yield_kg 1.7 (0.2) 1.6 (0.3) 
2 Nook's GreenGro    yield_kg 1.9 (0.3) 2.0 (0.6) 
3 Leif's Compost Tea yield_kg 2.3 (0.9) 1.8 (1.1) 
4 Brewster's Grounds yield_kg 2.3 (0.5) 2.3 (0.5) 

Example 2

  • Checking the ANOVA assumptions,
  • Normality (qq plots) looks okay…?

  • Variance is questionable…

Example 2

  • Checking the ANOVA assumptions,
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_No Fertilizer = σ²_Nook's GreenGro = σ²_Leif's Compost Tea = σ²_Brewster's Grounds 
Alternative: At least one variance is different 
Test statistic: F(3,68) = 3.908 
p-value: p = 0.012
Conclusion: Reject the null hypothesis (p = 0.012 < α = 0.05)
  • Is the variance assumption broken?

Example 2

  • Constructing the Kruskal-Wallis test,
Kruskal–Wallis Rank-Sum Test

H₀: M_No Fertilizer = M_Nook's GreenGro = M_Leif's Compost Tea = M_Brewster's Grounds
H₁: At least one group is different

Test Statistic: X(3) = 16.959,
 p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
  • Is there a significant difference between the fertilizers?

Example 2

  • Suppose we want to look at posthoc tests to see which fertilizers differ.

  • If we want to adjust, we will use the adjusted version of Dunn’s test,

                               Comparison          Z     p
1 Brewster's Grounds - Leif's Compost Tea  1.4414191 0.897
2      Brewster's Grounds - No Fertilizer  4.0136753 0.000
3      Leif's Compost Tea - No Fertilizer  2.5722562 0.061
4    Brewster's Grounds - Nook's GreenGro  2.3492742 0.113
5    Leif's Compost Tea - Nook's GreenGro  0.9078551 1.000
6         No Fertilizer - Nook's GreenGro -1.6644010 0.576
  • Which fertilizers differ significantly?

Example 2

  • Suppose we want to look at posthoc tests to see which fertilizers differ.

  • If we do not want to adjust, we will use the unadjusted version of Dunn’s test,

                               Comparison          Z     p
1 Brewster's Grounds - Leif's Compost Tea  1.4414191 0.149
2      Brewster's Grounds - No Fertilizer  4.0136753 0.000
3      Leif's Compost Tea - No Fertilizer  2.5722562 0.010
4    Brewster's Grounds - Nook's GreenGro  2.3492742 0.019
5    Leif's Compost Tea - Nook's GreenGro  0.9078551 0.364
6         No Fertilizer - Nook's GreenGro -1.6644010 0.096
  • Which fertilizers differ significantly?

Example 3

  • Leif is now testing the four fertilizer options on both carrot and tomato plots:
    1. Nook’s GreenGro
    2. Leif’s Compost Tea
    3. Brewster’s Grounds
    4. No fertilizer (control)
  • The research questions are:
    1. Is there an interaction between fertilizer type and crop type on yield?
    2. If no interaction, do mean crop yields differ among the four fertilizer options?
    3. If no interaction, do mean crop yields differ between carrot and tomato plots?

Example 3

  • Looking at summary statistics,
# A tibble: 8 × 5
  crop   fertilizer         variable mean_sd   median_iqr
  <fct>  <fct>              <chr>    <chr>     <chr>     
1 Carrot No Fertilizer      yield_kg 1.7 (0.3) 1.7 (0.3) 
2 Carrot Nook's GreenGro    yield_kg 1.8 (0.3) 1.7 (0.3) 
3 Carrot Leif's Compost Tea yield_kg 2.2 (0.4) 2.0 (0.6) 
4 Carrot Brewster's Grounds yield_kg 2.4 (0.3) 2.5 (0.5) 
5 Tomato No Fertilizer      yield_kg 1.7 (0.3) 1.7 (0.4) 
6 Tomato Nook's GreenGro    yield_kg 1.9 (0.3) 2.0 (0.3) 
7 Tomato Leif's Compost Tea yield_kg 2.4 (0.3) 2.3 (0.4) 
8 Tomato Brewster's Grounds yield_kg 2.5 (0.4) 2.5 (0.5) 

Example 3

  • Checking the two-way ANOVA assumptions,

Example 3

  • Examining the two-way ANOVA (remember to always check for the interaction first!),
Two-Way ANOVA Table
Source Sum of Squares df Mean Squares F p
Regression 12.24 7

•fertilizer 11.65 3 3.88 35.09 < 0.001
•crop 0.28 1 0.28 2.49 0.117
•Interaction 0.31 3 0.10 0.93 0.431
Error 12.40 112 0.11
Total 24.63 119

  • Is the interaction term significant?

  • Should we keep or remove the interaction term?

Example 3

  • Removing the interaction term,
Two-Way ANOVA Table
Source Sum of Squares df Mean Squares F p
Regression 11.93 4

•fertilizer 11.65 3 3.88 35.16 < 0.001
•crop 0.28 1 0.28 2.50 0.117
Error 12.71 115 0.11
Total 24.63 119

  • Are there signfiicant main effects?

Example 3

  • Looking at the tests for main effects,
Test for Main Effect fertilizer:

H₀: μ_No Fertilizer = μ_Nook's GreenGro = μ_Leif's Compost Tea = μ_Brewster's Grounds
H₁: At least one mean is different.
Test Statistic: F(3, 115) = 35.16
p-value: p = < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Test for Main Effect crop:

H₀: μ_Carrot = μ_Tomato
H₁: At least one mean is different.
Test Statistic: F(1, 115) = 2.5
p-value: p = 0.117
Conclusion: Fail to reject the null hypothesis (p = 0.117 ≥ α = 0.05)
  • Is fertilizer a significant factor for crop yield?

  • Is crop type a significant factor for crop yield?

Example 3

  • Looking at the profile plot,
  • What statements can we make about this analysis?

Example 4

  • Leif is now testing the four fertilizer options on both pumpkin, wheat, and potato plots:
    1. Nook’s GreenGro
    2. Leif’s Compost Tea
    3. Brewster’s Grounds
    4. No fertilizer (control)
  • The research questions are:
    1. Is there an interaction between fertilizer type and crop type on yield?
    2. If no interaction, do mean crop yields differ among the four fertilizer options?
    3. If no interaction, do mean crop yields differ among the three crop types?

Example 4

  • Looking at summary statistics,
# A tibble: 12 × 5
   crop    fertilizer         variable mean_sd   median_iqr
   <fct>   <fct>              <chr>    <chr>     <chr>     
 1 Pumpkin No Fertilizer      yield_kg 2.0 (0.2) 1.9 (0.2) 
 2 Pumpkin Nook's GreenGro    yield_kg 2.3 (0.4) 2.1 (0.5) 
 3 Pumpkin Leif's Compost Tea yield_kg 2.5 (0.4) 2.5 (0.6) 
 4 Pumpkin Brewster's Grounds yield_kg 2.8 (0.3) 2.6 (0.3) 
 5 Wheat   No Fertilizer      yield_kg 1.8 (0.2) 1.8 (0.3) 
 6 Wheat   Nook's GreenGro    yield_kg 1.9 (0.2) 1.9 (0.1) 
 7 Wheat   Leif's Compost Tea yield_kg 2.4 (0.3) 2.4 (0.3) 
 8 Wheat   Brewster's Grounds yield_kg 2.3 (0.3) 2.3 (0.4) 
 9 Potato  No Fertilizer      yield_kg 2.3 (0.3) 2.2 (0.3) 
10 Potato  Nook's GreenGro    yield_kg 2.2 (0.3) 2.2 (0.5) 
11 Potato  Leif's Compost Tea yield_kg 2.3 (0.3) 2.4 (0.3) 
12 Potato  Brewster's Grounds yield_kg 2.4 (0.3) 2.4 (0.5) 

Example 4

  • Checking the two-way ANOVA assumptions,

Example 4

  • Examining the two-way ANOVA (remember to always check for the interaction first!),
Two-Way ANOVA Table
Source Sum of Squares df Mean Squares F p
Regression 10.00 11

•fertilizer 6.00 3 2.00 22.72 < 0.001
•crop 1.92 2 0.96 10.92 < 0.001
•Interaction 2.08 6 0.35 3.94 0.001
Error 11.62 132 0.09
Total 21.62 143

  • Is the interaction term significant?

  • Should we keep or remove the interaction term?

Example 4

  • Looking at the test for the interaction,
Test for Interaction (fertilizer × crop):

H₀: The relationship between yield_kg and fertilizer does not depend on crop.
H₁: The relationship between yield_kg and fertilizer depends on crop.
Test Statistic: F(6, 132) = 3.94
p-value: p = 0.001
Conclusion: Reject the null hypothesis (p = 0.001 < α = 0.05)
  • Is there a significant interaction?

Example 4

  • Looking at the profile plot,
  • What statements can we make about this analysis?

Wrap Up

  • Today’s review:
    • One-way ANOVA
    • Kruskal-Wallis
    • Posthoc testing
    • Two-way ANOVA
    • Interaction terms
    • Profile plots
    • ANOVA Assumptions
  • Next class:
    • Regression!!!!! 💕