July 1, 2025
Tuesday
All t-tests assume approximate normality of the data.
In the case of one-sample t-tests, the measure of interest must somewhat follow a normal distribution.
In the case of two-sample t-tests, the measure of interest in each group must somewhat follow a normal distribution.
Note that a paired t-test is technically a one-sample t-test, so we will examine normality of the difference.
There are formal tests for normality (see article here), however, we will not use them.
Instead, we will assess normality using a quantile-quantile (Q-Q) plot.
A Q-Q plot helps us visually check if our data follows a specific distribution (here, the normal).
How do we read Q-Q plots?
wing_flap %>% independent_mean_HT(grouping = target,
continuous = apples,
mu = 5,
alternative = "greater",
alpha = 0.05)
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
independent_qq()
function from library(ssstats)
to assess normality.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
In addition to normality, the two-sample t-test assumes equal variance between groups.
We can check this assumption and easily adjust if the assumption is broken.
Graphical method: scatterplot of residuals
Formal method: test for equal variances (Brown-Forsythe-Levine)
plot_residuals()
function from library(ssstats)
to graphically assess the assumption of equal variance.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
Our updated code:
If we believe the assumption may be violated, we can test for equal variance using the Brown-Forsythe-Levine (BFL) test.
This test is valid for more than two groups (read: we will see it again!)
Hypotheses
F_0 = \frac{\sum_{i=1}^k n_i(\bar{z}_{i.}-\bar{z}_{..})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_i} (z_{ij}-\bar{z}_{i.})^2/(N-k)} \sim F_{\text{df}_{\text{num}}, \text{df}_{\text{den}}}
Note that the BFL is a one-tailed test, which is different than when we are testing means using the t distribution.
p-value:
p = P\left[F_{\text{df}_{\text{num}}, \text{df}_{\text{den}}} \ge F_0\right]
variances_HT()
function from library(ssstats)
.Let’s now test the variance assumption for our example.
How should we change the code for our dataset?
Let’s now test the variance assumption for our example.
Our updated code is:
Brown-Forsythe-Levene test for equality of variances:
Null: σ²_Above = σ²_Below
Alternative: At least one variance is different
Test statistic: F(1,23) = 0.063
p-value: p = 0.804
Conclusion: Fail to reject the null hypothesis (p = 0.8045 ≥ α = 0.05)
What do we do if we have actually broken the variance assumption?
If the normality assumption holds, we can use Satterthwaite’s approximation for degrees of freedom.
\text{df}=\frac{ \left( \frac{s^2_1}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1} }
We will still use the independent_mean_CI
function from library(ssstats)
to find the confidence interval…
Generic syntax:
variance
argument at the end.We will use the independent_mean_HT
function from library(ssstats)
to perform the necessary calculations for the hypothesis test.
Generic syntax:
variance
argument at the end.Let’s look at the 90% CI for our example under both ways of calculating degrees of freedom:
Assuming equal variance:
Let’s look at the 90% CI for our example under both ways of calculating degrees of freedom:
Assuming equal variance:
The point estimate for the difference in means is x̄₁ − x̄₂ = 10.0556
The 90% confidence interval for μ₁ − μ₂ is (8.4642, 11.647)
The point estimate for the difference in means is x̄₁ − x̄₂ = 10.0556
The 90% confidence interval for μ₁ − μ₂ is (8.3697, 11.7415)
Let’s look at the hypothesis test results for our example under both ways of calculating degrees of freedom (\alpha = 0.10):
Assuming equal variance,
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(23) = 10.829
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.1)
Two-sample t-test for two independent means and unequal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(15.06) = 10.454
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.1)
Paired t-test for the mean of differences:
Null: H₀: μ_d = 0
Alternative: H₁: μ_d ≠ 0
Test statistic: t(24) = -0.859
p-value: p = 0.399
Conclusion: Fail to reject the null hypothesis (p = 0.3991 ≥ α = 0.01)
dependent_qq()
function from library(ssstats)
to assess normality.Let’s now look at the normality assumption for our example.
How should we change the code for our dataset?
Let’s now look at the normality assumption for our example.
Our updated code,
The t-tests we have already learned are considered parametric methods.
Nonparametric methods do not have distributional assumptions.
Why don’t we always use nonparametric methods?
They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.
They discard useful information :(
The Wilcoxon Rank Sum test is a nonparametric alternative to the two-sample t-test.
Instead of comparing group means, we will now turn to comparing the ranks of the data.
Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8
Our first step is to reorder the data: x: \ 1, 2, 6, 7, 8, 10
Then, we replace with the ranks: R: \ 1, 2, 3, 4, 5, 6
What if all data values are not unique? We will assign the average rank for that group.
For example, x: \ 9, 8, 8, 0, 3, 4, 4, 8
Let’s reorder:x: \ 0, 3, 4, 4, 8, 8, 8, 9
Rank ignoring ties:R: \ 1, 2, 3, 4, 5, 6, 7, 8
Now, the final rank:R: \ 1, 2, 3.5, 3.5, 6, 6, 6, 8
T_0 = \sum R_{\text{1}} - \frac{n_1(n_1+1)}{2}
T = \sum R_{\text{sample 1}} - \frac{n_1(n_1+1)}{2}
where
Note that p = (calculated by R :))
We will use the independent_median_HT
function from library(ssstats)
to perform the necessary calculations for the hypothesis test.
Generic syntax:
wing_flap %>% independent_median_HT(continuous = apples,
grouping = target,
alternative = "greater",
m = 5,
alpha = 0.05)
Wilcoxon Rank Sum Test:
Null: H₀: M₁ - M₂ ≤ 5
Alternative: H₁: M₁ - M₂ > 5
Test statistic: T = 135.5
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
Wilcoxon Rank Sum Test:
Null: H₀: M₁ - M₂ ≤ 5
Alternative: H₁: M₁ - M₂ > 5
Test statistic: T = 135.5
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
The Wilcoxon Signed Rank test is a nonparametric alternative to the dependent t-test.
Instead of examining the mean of the difference, we will now turn to examining the ranks of the differences.
Before ranking, we will find the difference between the paired observations and eliminate any 0 differences.
When ranking, we the differences are ranked based on the absolute value of the difference.
Then, ranks can be identified as “positive” or “negative” based on the direction of the difference.
X | Y | D | |D| | Rank |
---|---|---|---|---|
5 | 8 | -3 | 3 | - 1.5 |
8 | 5 | 3 | 3 | + 1.5 |
4 | 4 | 0 | 0 | ——— |
T_0 = \begin{cases} R_+ = \text{sum of positive ranks} & \text{if left-tailed} \\ R_- = \text{sum of negative ranks} & \text{if right-tailed} \\ \min(R_+, R_-) & \text{if two-tailed} \end{cases}
We will use the dependent_median_HT
function from library(ssstats)
to perform the necessary calculations for the hypothesis test.
Generic syntax:
Perform the appropriate hypothesis test to determine if there is a difference in wing-flap rate pre- and post-training. Test at the \alpha=0.01 level.
How should we change the following code?
Perform the appropriate hypothesis test to determine if there is a difference in wing-flap rate pre- and post-training. Test at the \alpha=0.01 level.
Our updated code,
wing_flap %>% dependent_median_HT(col1 = pre_training_wfr,
col2 = post_training_wfr,
alternative = "two",
m = 0,
alpha = 0.01)
Wilcoxon Signed-Rank Test for the median of differences:
Null: H₀: M_d = 0
Alternative: H₁: M_d ≠ 0
Test statistic: T = 188
p-value: p = 0.501
Conclusion: Fail to reject the null hypothesis (p = 0.5011 ≥ α = 0.01)
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
Wilcoxon Rank Sum Test:
Null: H₀: M₁ - M₂ ≤ 5
Alternative: H₁: M₁ - M₂ > 5
Test statistic: T = 135.5
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
STA4173 - Biostatistics - Summer 2025