Two-Sample Medians

Introduction: Topics

Last week, we talked about checking the assumptions on t-tests.
- If we break the variance assumption of the two-sample t-test \to use the two-sample t-test for unequal variances.
- What happens when we break the normality assumption?
Nonparametric alternatives
- Independent medians (M_1-M_2)
- Dependent medians (M_d)

Introduction: Nonparametrics

The t-tests we have already learned are considered parametric methods.
- There is a distributional assumption on the test.
Nonparametric methods do not have distributional assumptions.
- We typically transform the data to their ranks and then perform calculations.
Why don’t we always use nonparametric methods?
- They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.
- They discard useful information :(

Introduction: Two Independent Medians

The Wilcoxon Rank Sum test is a nonparametric alternative to the two-sample t-test.
Instead of comparing group means, we will now turn to comparing the ranks of the data.
Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8
Our first step is to reorder the data: x: \ 1, 2, 6, 7, 8, 10
Then, we replace with the ranks: R: \ 1, 2, 3, 4, 5, 6

Two Independent Medians: Ranking Data

What if all data values are not unique? We will assign the average rank for that group.
For example, x: 9, 8, 8, 0, 3, 4, 4, 8
Let’s reorder: x: 0, 3, 4, 4, 8, 8, 8, 9
Rank ignoring ties: R: 1, 2, 3, 4, 5, 6, 7, 8
Now, the final rank: R: 1, 2, 3.5, 3.5, 6, 6, 6, 8

Hypothesis Testing: Two Independent Medians

Hypotheses: Two Tailed
- H_0: \ M_1-M_2 = M_0
- H_1: \ M_1-M_2 \ne M_0
Hypotheses: Left Tailed
- H_0: \ M_1-M_2 \ge M_0
- H_1: \ M_1-M_2 < M_0
Hypotheses: Right Tailed
- H_0: \ M_1-M_2 \le M_0
- H_1: \ M_1-M_2 > M_0

Hypothesis Testing: Two Independent Medians

T_0 = \sum R_{\text{1}} - \frac{n_1(n_1+1)}{2}

T = \sum R_{\text{sample 1}} - \frac{n_1(n_1+1)}{2}
where
- \sum R_1 is the sum of the ranks for the first group
- n_1 is the sample size of the first group
Note that p = (calculated by R :))

Hypothesis Testing: Two Independent Medians (R)

We will use the independent_median_HT function from library(ssstats) to perform the necessary calculations for the hypothesis test.
Generic syntax:

dataset_name %>% independent_median_HT(continuous = continuous_variable,
                                       grouping = grouping_variable,
                                       alternative = "alternative_direction",
                                       m = hypothesized difference,
                                       alpha = specified_alpha)

Hypothesis Testing: Two Independent Medians

Recall our example:
- Perform the appropriate hypothesis test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi. Test at the \alpha=0.05 level.
How should we change the following code?

dataset_name %>% independent_median_HT(continuous = continuous_variable,
                                       grouping = grouping_variable,
                                       alternative = "alternative_direction",
                                       m = hypothesized_difference,
                                       alpha = specified_alpha)

Hypothesis Testing: Two Independent Medians

Recall our example:
- Perform the appropriate hypothesis test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi. Test at the \alpha=0.05 level.
Our updated code,

wing_flap %>% independent_median_HT(continuous = apples,
                                    grouping = target,
                                    alternative = "greater",
                                    m = 5,
                                    alpha = 0.05)

Hypothesis Testing: Two Independent Medians

Running the code,

wing_flap %>% independent_median_HT(continuous = apples,
                                    grouping = target,
                                    alternative = "greater",
                                    m = 5,
                                    alpha = 0.05)

Wilcoxon Rank Sum Test for two independent medians
Null: H₀: M₁ - M₂ ≤ 5
Alternative: H₁: M₁ - M₂ > 5
Test statistic: T = 135.5
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Hypothesis Testing: Two Independent Medians

Hypotheses:
- H_0: \ M_{\text{above}} - M_{\text{below}} \le 5
- H_1: \ M_{\text{above}} - M_{\text{below}} > 5
Test Statistic and p-Value
- T_0 = 135.5, p < 0.001
Rejection Region
- Reject H_0 if p < \alpha; \alpha = 0.05
Conclusion and interpretation
- Reject H_0 (p \text{ vs } \alpha \to p < 0.001 < 0.05). There is sufficient evidence to suggest that ponies above target eat 5 more apples than those below target.

Two Independent Groups: Means vs. Medians

t-test for independent means:

wing_flap %>% independent_mean_HT(continuous = apples, 
                                  grouping = target,
                                  mu = 5, 
                                  alternative = "greater", 
                                  alpha = 0.05)

Wilcoxon rank sum for independent medians:

wing_flap %>% independent_median_HT(continuous = apples,
                                    grouping = target,
                                    m = 5,
                                    alternative = "greater",
                                    alpha = 0.05)

Two Independent Groups: Means vs. Medians

t-test for independent means:

Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ ≤ 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Wilcoxon rank sum for independent medians:

Wilcoxon Rank Sum Test for two independent medians
Null: H₀: M₁ - M₂ ≤ 5
Alternative: H₁: M₁ - M₂ > 5
Test statistic: T = 135.5
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)

Introduction: Two Dependent Medians

The Wilcoxon Signed Rank test is a nonparametric alternative to the dependent t-test.
Instead of examining the mean of the difference, we will now turn to examining the ranks of the differences.
Before ranking, we will find the difference between the paired observations and eliminate any 0 differences.
- Note 1: elimniating 0 differences is the big difference between the other tests!
- Note 2: because we are eliminating 0 differences, this means that our sample size will update to the number of pairs with a non-0 difference.

Two Dependent Medians: Ranking Data

When ranking, we the differences are ranked based on the absolute value of the difference.
Then, ranks can be identified as “positive” or “negative” based on the direction of the difference.

X	Y	D	\|D\|	Rank
5	8	-3	3	- 1.5
8	5	3	3	+ 1.5
4	4	0	0	———

In this (very basic) example, we started with n=3, but reduced to n=2 due to the 0 difference of the third observation.

Hypothesis Testing: Two Dependent Medians

Hypotheses: Two Tailed
- H_0: \ M_d=M_0
- H_1: \ M_d \ne M_0
Hypotheses: Left Tailed
- H_0: \ M_d \ge M_0
- H_1: \ M_d < M_0
Hypotheses: Right Tailed
- H_0: \ M_d \le M_0
- H_1: \ M_d > M_0
Note! M_d = M_1 - M_2

Hypothesis Testing: Two Dependent Medians

Test Statistic

T_0 = \begin{cases} R_+ = \text{sum of positive ranks} & \text{if left-tailed} \\ R_- = \text{sum of negative ranks} & \text{if right-tailed} \\ \min(R_+, R_-) & \text{if two-tailed} \end{cases}

p-Value is calculated by R.

Hypothesis Testing: Two Dependent Medians (R)

We will use the dependent_median_HT function from library(ssstats) to perform the necessary calculations for the hypothesis test.
Generic syntax:

dataset_name %>% dependent_median_HT(col1 = first_group,
                                     col2 = second_group,
                                     alternative = "alternative_direction",
                                     m = hypothesized_value,
                                     alpha = specified_alpha)

Hypothesis Testing: Two Dependent Medians

Perform the appropriate hypothesis test to determine if there is a difference in wing-flap rate pre- and post-training. Test at the \alpha=0.01 level.
How should we change the following code?

dataset_name %>% dependent_median_HT(col1 = first_group,
                                     col2 = second_group,
                                     alternative = "alternative_direction",
                                     m = hypothesized_value,
                                     alpha = specified_alpha)

Hypothesis Testing: Two Dependent Medians

Perform the appropriate hypothesis test to determine if there is a difference in wing-flap rate pre- and post-training. Test at the \alpha=0.01 level.
Our updated code,

wing_flap %>% dependent_median_HT(col1 = pre_training_wfr,
                                  col2 = post_training_wfr,
                                  alternative = "two",
                                  m = 0,
                                  alpha = 0.01)

Hypothesis Testing: Two Dependent Medians

Running the code,

wing_flap %>% dependent_median_HT(col1 = pre_training_wfr,
                                  col2 = post_training_wfr,
                                  alternative = "two",
                                  m = 0,
                                  alpha = 0.01)

Wilcoxon Signed-Rank Test for the median of differences:
Null: H₀: M_d = 0
Alternative: H₁: M_d ≠ 0
Test statistic: T = 188
p-value: p = 0.501
Conclusion: Fail to reject the null hypothesis (p = 0.5011 ≥ α = 0.01)

Hypothesis Testing: Two Dependent Medians

Hypotheses:
- H_0: \ M_{\text{d}} = 0, where M_{\text{d}} = M_{\text{pre}}-M_{\text{post}}
- H_1: \ M_{\text{d}} \ne 0
Test Statistic and p-Value
- T_0 = 188, p = 0.501
Rejection Region
- Reject H_0 if p < \alpha; \alpha = 0.01
Conclusion and interpretation
- Fail to reject H_0 (p \text{ vs } \alpha \to p = 0.501 > 0.01). There is not sufficient evidence to suggest that there is a difference in wing-flap rate.

Two Dependent Groups: Means vs. Medians

t-test for dependent means:

wing_flap %>% dependent_mean_HT(col1 = pre_training_wfr,
                                col2 = post_training_wfr,
                                alternative = "two",
                                mu = 0,
                                alpha = 0.01)

Wilcoxon signed for dependent medians:

wing_flap %>% dependent_median_HT(col1 = pre_training_wfr,
                                  col2 = post_training_wfr,
                                  alternative = "two",
                                  m = 0,
                                  alpha = 0.01)

Two Dependent Groups: Means vs. Medians

t-test for dependent means:

Paired t-test for the mean of differences:
Null: H₀: μ_d = 0
Alternative: H₁: μ_d ≠ 0
Test statistic: t(24) = 0.859
p-value: p = 0.399
Conclusion: Fail to reject the null hypothesis (p = 0.3991 ≥ α = 0.01)

Wilcoxon signed for dependent medians:

Wilcoxon Signed-Rank Test for the median of differences:
Null: H₀: M_d = 0
Alternative: H₁: M_d ≠ 0
Test statistic: T = 188
p-value: p = 0.501
Conclusion: Fail to reject the null hypothesis (p = 0.5011 ≥ α = 0.01)

Wrap Up

Today’s lecture:
- Wilcoxon rank sum (nonparametric equivalent to independent t)
- Wilcoxon signed rank (nonparametric equivalent to dependent t)
Next class:
- R lab: Wilcoxons
- Quiz: Wilcoxons
Next week:
- Review of Module 1
  - Quiz for Module 1!
- Project 1 work time