Wilcoxon Rank Sum

STA4173: Biostatistics
Spring 2025

Introduction

We last discussed assumptions on t-tests
- Dependent / paired t-test: normality
- Independent two-sample t-test: normality and variance
If we break the normality assumption, we must look to nonparametric methods.

Introduction

The t-tests we have already learned are considered parametric methods.
- There is a distributional assumption on the test.
Nonparametric methods do not have distributional assumptions.
- We typically transform the data to their ranks and then perform calculations.
Why don’t we always use nonparametric methods?
- They are often less efficient: a larger sample size is required to achieve the same probability of a Type I error.
- They discard useful information :(

Ranking Data

In the nonparametric tests we will be learning, the data will be ranked.
Let us first consider a simple example, x: \ 1, 7, 10, 2, 6, 8
Our first step is to reorder the data:x: \ 1, 2, 6, 7, 8, 10
Then, we replace with the ranks:R: \ 1, 2, 3, 4, 5, 6

Ranking Data

What if all data values are not unique?
- We will assign the average ranks.
For example, x: \ 9, 8, 8, 0, 3, 4, 4, 8
Let’s reorder:x: \ 0, 3, 4, 4, 8, 8, 8, 9
Rank ignoring ties:R: \ 1, 2, 3, 4, 5, 6, 7, 8
Now, the final rank:R: \ 1, 2, 3.5, 3.5, 6, 6, 6, 8

Wilcoxon Rank Sum

Hypotheses

H_0: M_1 - M_2 = M_0 | H_0: M_1 - M_2 \le M_0 | H_0: M_1 - M_2 \ge M_0
H_1: M_1 - M_2 \ne M_0 | H_1: M_1 - M_2 > M_0 | H_1: M_1 - M_2 < M_0

Test Statistic & p-Value

T = \sum R_{\text{sample 1}} - \frac{n_1(n_1+1)}{2}
p = (calculated by R :))

Rejection Region

Reject H_0 if p < \alpha.

Conclusion/Interpretation

[Reject or fail to reject] H_0.
There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Wilcoxon Rank Sum

We will use the wilcox.test() function to perform the test,

wilcox.test(continuous_variable ~ grouping_variable,
            data = dataset_name,
            alternative = "alternative",
            mu = hypothesized_value,
            exact = FALSE)

Like before, R will use the group that is “first” in the grouping variable.
- “First” is in terms of numeric or alphabetical.

Wilcoxon Rank Sum

When exposed to an infection, a person typically develops antibodies. The extent to which the antibodies respond can be measured by looking at a person’s titer, which is a measure of the number of antibodies present. The higher the titer is, the more antibodies that are present.
The following data represent the titers of 11 ill people and 11 healthy people exposed to the tularemia virus in Vermont.
Is the level of titer in the ill group greater than the level of titer in the healthy group? Use the \alpha = 0.10 level of significance.

titer_levels <- tibble(level = c(640, 160, 1280, 320, 80, 640, 640, 160, 1280, 640, 160, 
                                  10, 320, 160, 160, 320, 320, 10, 320, 320, 80, 640),
                       group = c(rep("ill",11), rep("healthy",11)))

Recall the R syntax,

wilcox.test(continuous_variable ~ grouping_variable,
            data = dataset_name,
            alternative = "alternative",
            mu = hypothesized_value,
            exact = FALSE)

Wilcoxon Rank Sum

Is the level of titer in the ill group greater than the level of titer in the healthy group?

wilcox.test(level ~ group, 
            data = titer_levels,
            alternative = "less",
            exact = FALSE)


    Wilcoxon rank sum test with continuity correction

data:  level by group
W = 35, p-value = 0.04657
alternative hypothesis: true location shift is less than 0

Wilcoxon Rank Sum

Hypotheses

H_0: \ M_{\text{ill}} \le M_{\text{healthy}}
H_1: \ M_{\text{ill}} > M_{\text{healthy}}

Test Statistic and p-Value

W_0 = 35
p = 0.047

Rejection Region

Reject H_0 if p < \alpha; \alpha = 0.10.

Conclusion/Interpretation

Reject H_0.
There is sufficient evidence to suggest that the level of titer in the ill group is greater than the level of titer in the healthy group.

Wrap Up

Today we reviewed the Wilcoxon rank sum test.
- Nonparametric alternative to the two-sample t-test.
Next lecture:
- Wilcoxon signed rank