All t-tests assume approximate normality of the data.
In the case of one-sample t-tests, the measure of interest must somewhat follow a normal distribution.
In the case of two-sample t-tests, the measure of interest in each group must somewhat follow a normal distribution.
Note that a paired t-test is technically a one-sample t-test, so we will examine normality of the difference.
There are formal tests for normality (see article here), however, we will not use them.
Instead, we will assess normality using a quantile-quantile (Q-Q) plot.
A Q-Q plot helps us visually check if our data follows a specific distribution (here, the normal).
How do we read Q-Q plots?
wing_flap %>% independent_mean_HT(grouping = target,
continuous = apples,
mu = 5,
alternative = "greater",
alpha = 0.05)Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ ≤ 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
independent_qq() function from library(ssstats) to assess normality.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
In addition to normality, the two-sample t-test assumes equal variance between groups.
We can check this assumption and easily adjust if the assumption is broken.
Graphical method: scatterplot of residuals
Formal method: test for equal variances (Brown-Forsythe-Levine)
plot_residuals() function from library(ssstats) to graphically assess the assumption of equal variance.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
Our updated code:
If we believe the assumption may be violated, we can test for equal variance using the Brown-Forsythe-Levine (BFL) test.
This test is valid for more than two groups (read: we will see it again!)
Hypotheses
F_0 = \frac{\sum_{i=1}^k n_i(\bar{z}_{i.}-\bar{z}_{..})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_i} (z_{ij}-\bar{z}_{i.})^2/(N-k)} \sim F_{\text{df}_{\text{num}}, \text{df}_{\text{den}}}
Note that the BFL is a one-tailed test, which is different than when we are testing means using the t distribution.
p-value:
p = P\left[F_{\text{df}_{\text{num}}, \text{df}_{\text{den}}} \ge F_0\right]
variances_HT() function from library(ssstats).Let’s now test the variance assumption for our example.
How should we change the code for our dataset?
Let’s now test the variance assumption for our example.
Our updated code is:
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Above = σ²_Below
Alternative: At least one variance is different
Test statistic: F(1,23) = 0.063
p-value: p = 0.804
Conclusion: Fail to reject the null hypothesis (p = 0.8045 ≥ α = 0.05)
What do we do if we have actually broken the variance assumption?
If the normality assumption holds, we can use Satterthwaite’s approximation for degrees of freedom.
\text{df}=\frac{ \left( \frac{s^2_1}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1} }
We will still use the independent_mean_CI function from library(ssstats) to find the confidence interval…
Generic syntax:
variance argument at the end.We will use the independent_mean_HT function from library(ssstats) to perform the necessary calculations for the hypothesis test.
Generic syntax:
variance argument at the end.Let’s look at the 90% CI for our example under both ways of calculating degrees of freedom:
Assuming equal variance:
Let’s look at the 90% CI for our example under both ways of calculating degrees of freedom:
Assuming equal variance:
The point estimate for the difference in means is x̄₁ − x̄₂ = 10.0556
The 90% confidence interval for μ₁ − μ₂ is (8.4642, 11.647)
The point estimate for the difference in means is x̄₁ − x̄₂ = 10.0556
The 90% confidence interval for μ₁ − μ₂ is (8.3697, 11.7415)
Let’s look at the hypothesis test results for our example under both ways of calculating degrees of freedom (\alpha = 0.10):
Assuming equal variance,
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(23) = 10.829
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.1)
Two-sample t-test for two independent means and unequal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(15.06) = 10.454
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.1)
Paired t-test for the mean of differences:
Null: H₀: μ_d = 0
Alternative: H₁: μ_d ≠ 0
Test statistic: t(24) = -0.859
p-value: p = 0.399
Conclusion: Fail to reject the null hypothesis (p = 0.3991 ≥ α = 0.01)
dependent_qq() function from library(ssstats) to assess normality.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
Our updated code,
STA4173 - Biostatistics - Fall 2025