Tests for Proportions

STA4173: Biostatistics
Spring 2025

Introduction

Before today, we have focused on continuous outcomes.
Now we will focus on categorical (or qualitative) outcomes.
Today, we will review how to test one and two sample proportions.
We will estimate a proportion using \hat{p},

\hat{p} = \frac{x}{n}

We will estimate the difference between two proportions using \hat{p}_1 - \hat{p}_2,

\hat{p_1}- \hat{p_2} = \frac{x_1}{n_1} - \frac{x_2}{n_2}

One-Sample Proportions

Hypotheses
- H_0: p \ge p_0 | H_0: p \le p_0 | H_0: p = p_0
- H_1: p < p_0 | H_0: p > p_0 | H_1: p \ne p_0
Test Statistic

z_0 = \frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}

p-Value
- p = P[z \le z_0] | p = P[z \ge z_0] | p = 2P[z \ge |z_0|]
Rejection Region
- Reject H_0 if p < \alpha.
Conclusion/Interpretation
- [Reject or fail to reject] H_0.
- There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

One-Sample Proportions

(1–\alpha)100% CI for a population proportion, p

\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

To construct this interval, we require both:
- n\hat{p}(1-\hat{p}) \ge 10 and
- n \le 0.05N

One-Sample Proportions

We will use either the binom.test() function or the prop.test() function.
If we have n \le 30,

binom.test(x = num_success, 
           n = sample_size, 
           p = hypothesized_value, 
           alternative = "alternative")

If we have n > 30,

prop.test(x = num_success, 
          n = sample_size, 
          p = hypothesized_value, 
          alternative = "alternative",
          correct=FALSE)

One-Sample Proportions

Humira is a medication used to treat rheumatoid arthritis (RA). In clinical trials of Humira, 705 subjects diagnosed with RA were administered 40 mg of Humira every other week. Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08. Does the sample evidence represent significant evidence that a higher proportion of subjects receiving Humira experience nausea as a side effect than those taking a placebo? Test at the \alpha = 0.05 level of significance.
What are the important pieces?

One-Sample Proportions

Humira is a medication used to treat rheumatoid arthritis (RA). In clinical trials of Humira, 705 subjects diagnosed with RA were administered 40 mg of Humira every other week. Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08. Does the sample evidence represent significant evidence that a higher proportion of subjects receiving Humira experience nausea as a side effect than those taking a placebo? Test at the \alpha = 0.05 level of significance.
What is the point estimate, \hat{p}?
What is the 95% confidence interval for p?
Are there a higher proportion of subjects taking Humira experiencing nausea as a side effect than those taking a placebo?

One-Sample Proportions

Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08.

prop.test(x = 66, 
          n = 705,
          alternative = "two",
          correct=FALSE)


    1-sample proportions test without continuity correction

data:  66 out of 705, null probability 0.5
X-squared = 465.71, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.07426251 0.11737620
sample estimates:
         p 
0.09361702

Thus, the point estimate is \hat{p} = 0.094 and the 95% CI for p is (0.07, 0.12).

One-Sample Proportions

Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08.

prop.test(x = 66, 
          n = 705,
          p = 0.08,
          alternative = "greater",
          correct=FALSE)


    1-sample proportions test without continuity correction

data:  66 out of 705, null probability 0.08
X-squared = 1.7761, df = 1, p-value = 0.09131
alternative hypothesis: true p is greater than 0.08
95 percent confidence interval:
 0.07709288 1.00000000
sample estimates:
         p 
0.09361702

One-Sample Proportions

Hypotheses:
- H_0: \ p \le 0.08
- H_1: \ p > 0.08
Test Statistic and p-Value
- \chi^2 = 1.77
- p= 0.091
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion / Interpretation
- Fail to reject H_0.
- There is not sufficient evidence to suggest that the proportion of subjects taking Humira who experience nausea is greater than 0.08.

One-Sample Proportions

Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.
What are the important pieces?

One-Sample Proportions

Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.
What are the important pieces?
- Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.

One-Sample Proportions

prop.test(x = 408, 
          n = 1500,
          p = 0.24,
          correct=FALSE)


    1-sample proportions test without continuity correction

data:  408 out of 1500, null probability 0.24
X-squared = 8.4211, df = 1, p-value = 0.003709
alternative hypothesis: true p is not equal to 0.24
95 percent confidence interval:
 0.2500845 0.2950804
sample estimates:
    p 
0.272

Thus, the point estimate is \hat{p} = 0.272 and the 95% CI for p is (0.25, 0.30).

One-Sample Proportions

Hypotheses
- H_0: \ p = 0.24
- H_1: \ p \ne 0.24
Test Statistic and p-Value
- \chi^2 = 8.42
- p= 0.004
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.10.
Conclusion / Interpretation
- Reject H_0.
- There is sufficient evidence to suggest that the proportion of adult Americans who believe that it is easier to raise a girl has changed since 1947.

Two-Sample Proportions

Hypotheses
- H_0: p_1 - p_2 \ge d_0 | H_0: p_1 -p_2 \le d_0 | H_0: p_1-p_2 = d_0
- H_1: p_1-p_2 < d_0 | H_0: p_1 - p_2 > d_0 | H_1: p_1-p_2 \ne d_0
Test Statistic

z_0 = \frac{\left( \hat{p}_1 - \hat{p}_2 \right)- d_0}{\sqrt{\hat{p}\left(1-\hat{p}\right)\left( \frac{1}{n_1}+\frac{1}{n_2} \right)}}

where

\hat{p}_1 = \frac{x_1}{n_1}, \ \ \ \hat{p}_2 = \frac{x_2}{n_2}, \ \ \ \hat{p} = \frac{x_1+x_2}{n_1+n_2}

p-Value
- p = P[z \le z_0] | p = P[z \ge z_0] | p = 2P[z \ge |z_0|]
Rejection Region
- Reject H_0 if p < \alpha.

Two-Sample Proportions

(1–\alpha)100% CI for p_1-p_2

(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1 (1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

where

\hat{p}_1-\hat{p}_2=\frac{x_1}{n_1} - \frac{x_2}{n_2}

and
- x_i is the number of individuals in group i with a specified characteristic
- n_i is the sample size for group i
To construct this interval, we require:
- n_1 \hat{p}_1(1-\hat{p}_1) \ge 10 and
- n_2 \hat{p}_2(1-\hat{p}_2) \ge 10.

Two-Sample Proportions

In R, we again use the prop.test() function.

prop.test(x = c(count_1, count_2), 
          n = c(n_1, n_2), 
          alternative = "alternative",
          correct=FALSE)

Two-Sample Proportions

In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups.
- The patients in group 1 (experimental group) received 200 \mug of Nasonex.
  - Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.
- The patients in group 2 (control group) received a placebo.
  - Of the 1671 patients in the control group, 368 reported headaches as a side effect.
Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?
Test at the \alpha= 0.05 level of significance.
What are the important pieces?

Two-Sample Proportions

In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups.
- The patients in group 1 (experimental group) received 200 \mug of Nasonex.
  - Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.
- The patients in group 2 (control group) received a placebo.
  - Of the 1671 patients in the control group, 368 reported headaches as a side effect.
Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?
Test at the \alpha= 0.05 level of significance.
What are the important pieces?

Two-Sample Proportions

Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.
Of the 1671 patients in the control group, 368 reported headaches as a side effect.

prop.test(x = c(547, 368), 
          n = c(2103, 1671), 
          correct=FALSE)


    2-sample test for equality of proportions without continuity correction

data:  c(547, 368) out of c(2103, 1671)
X-squared = 8.0618, df = 1, p-value = 0.004521
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01255827 0.06719613
sample estimates:
   prop 1    prop 2 
0.2601046 0.2202274

Thus, \hat{p}_{\text{Exp}} = 0.260, \hat{p}_{\text{Ctrl}} = 0.220 and \hat{p}_{\text{Exp}} - \hat{p}_{\text{Ctrl}} = 0.04.
The 95% CI for \hat{p}_{\text{Exp}} - \hat{p}_{\text{Ctrl}} is (0.013, 0.067).

Two-Sample Proportions

prop.test(x = c(547, 368), 
          n = c(2103, 1671), 
          alternative = "greater",
          correct=FALSE)


    2-sample test for equality of proportions without continuity correction

data:  c(547, 368) out of c(2103, 1671)
X-squared = 8.0618, df = 1, p-value = 0.00226
alternative hypothesis: greater
95 percent confidence interval:
 0.01695043 1.00000000
sample estimates:
   prop 1    prop 2 
0.2601046 0.2202274

Two-Sample Proportions

Hypotheses
- H_0: \ p_{\text{Exp}} \le p_{\text{Ctrl}}
- H_1: \ p_{\text{Exp}} > p_{\text{Ctrl}}
Test Statistic and p-value
- \chi^2 = 8.06 or z_0 = \sqrt{8.0618} = 2.839
- p= 0.002
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05.
Conclusion / Interpretation
- Reject H_0.
- There is sufficient evidence to suggest that the proportion of Nasonex users who experienced headaches as a side effect is greater than that of the control group.

Goodness-of-Fit

The goodness-of-fit test allows us to determine if a frequency distribution follows a specific distribution.
- This could be a named distribution (e.g., normal)
- It could also be a distribution without a name (e.g., the probabilities are specified)
Before we can perform the goodness-of-fit test, we must compute expected counts.E_i = n p_i
- e.g., suppose that we expect 25% of Skittles to be red; if we have 100 Skittles, we then expect 25 of them to be red.

Goodness-of-Fit

Hypotheses
- H_0: The random variable follows the specified distribution.
- H_1: The random variable does not follow the specified distribution.
Test Statistic
- \chi^2_0 = \sum_{i=1}^k \frac{(O_i-E_i)^2}{E_i}, where O_i = observed and E_i = expected
p-Value
- p = P[\chi^2_{k-1} \ge \chi^2_0], where k = number of categories
Rejection Region
- Reject H_0 if p < \alpha

Goodness-of-Fit

In R, we will use the chisq.test() function and plug in both the counts and the expected probabilities

counts <- c(count_1, count_2, ... , count_k) # create O_i vector
probs <- c(prob_1, prob_2, ..., prob_k) # create p_i vector
chisq.test(counts, p = probs)

Goodness-of-Fit

Using the economy data, below (based on the 2017 Current Population Survey, adjusted for inflation), determine if there is evidence to suggest that the distribution of income has changed since 2000.
Test at the \alpha = 0.05 level of significance.

Income	Observed	Probability
Under $15,000	161	0.099
$15,000 - $24,999	144	0.098
$25,000 - $34,999	138	0.093
$35,000 - $49,999	184	0.135
$50,000 - $74,999	247	0.179
$75,000 - $99,999	188	0.131
$100,000 - $149,999	217	0.149
$150,000 - $199,999	105	0.061
Over $200,000	116	0.055

Goodness-of-Fit

counts <- c(161, 144, 138, 184, 247, 188, 217, 105, 116) # create O_i vector
probs <- c(0.099, 0.098, 0.093, 0.135, 0.179, 0.131, 0.149, 0.061, 0.055) # create p_i vector
chisq.test(counts, p = probs)


    Chi-squared test for given probabilities

data:  counts
X-squared = 20.693, df = 8, p-value = 0.00801

Goodness-of-Fit

Hypotheses
- H_0: Distribution of income in 2017 follows the same distribution as 2000.
- H_1: Distribution of income in 2017 does not follow the same distribution as 2000.
Test Statistic and p-Value
- \chi^2_0 = 20.693
- p = 0.008
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05
Conclusion/Interpretation
- Reject H_0.
- There is sufficient evidence to suggest that the distribution of income in 2017 does not follow the same distribution as in 2000.

Goodness-of-Fit

An obstetrician wants to know whether the proportion of children born on each day of the week is the same.
She randomly selects 500 birth records and obtains the data shown in the table below (based on data obtained from Vital Statistics of the United States, 2016).
Is there reason to believe that the day on which a child is born does not occur with equal frequency at the \alpha = 0.01 level of significance?

Sun	Mon	Tues	Weds	Thurs	Fri	Sat
46	76	83	81	81	80	53

What about expected probabilities?

Goodness-of-Fit

counts <- c(46, 76, 83, 81, 81, 80, 53) # create O_i vector
probs <- c(1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7) # create p_i vector
chisq.test(counts, p = probs)


    Chi-squared test for given probabilities

data:  counts
X-squared = 19.568, df = 6, p-value = 0.003305

Goodness-of-Fit

Hypotheses
- H_0: \ p_M = p_T = ... = p_{Su} = \frac{1}{7}.
- H_1: At least one proportion is different.
Test Statistic and p-Value
- \chi^2_0 = 19.568
- p = 0.003
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.01
Conclusion/Interpretation
- Reject H_0.
- There is sufficient evidence to suggest that at least one proportion is different.

Test for Independence

Let us now discuss testing two categorical variables to determine if a relationship exists.
Take, for example, this data:
We will use the \chi^2 test for independence to determine if happiness depends on marital status.

Test for Independence

Like in the goodness-of-fit test, we will first compute expected values.

We find the expected values,

E_{ij} = \frac{R_i C_j}{n}

where R_i is the total for row i, C_j is the total for column j, and n is the total sample size

Test for Independence

Hypotheses
- H_0: There is not a relationship between [var 1] and [var 2].
- H_1: There is a relationship between [var 1] and [var 2].
Test Statistic
- \chi_0^2 = \sum_{i=1}^k \frac{(O_i-E_i)^2}{E_i}
p-Value
- p = \text{P}[\chi^2_{(r-1)(c-1)} \ge \chi^2_0]
Rejection Region
- Reject H_0 if p < \alpha

Test for Independence

If given the contingency table, we can enter it in a matrix() (see example) and use the chisq.test() function.

chisq.test(matrix_name)

If given raw data, we can use the CrossTable() function in the gmodels package.
- Note: this function replicates PROC FREQ from SAS :)

CrossTable(dataset_name$row_name, dataset_name$column_name], 
           prop.chisq= FALSE,  # turn off proportion contributed to chi-square statistic
           prop.t = FALSE, # turn off total proportions
           chisq = TRUE) # request chi-square test

Test for Independence

In our example,

observed_table <- matrix(c(600, 63, 112, 144,
                           720, 142, 355, 459,
                           93, 51, 119, 127), 
                         nrow = 3, ncol = 4, byrow = T)
# I prefer to include breaks to make it look like the table given just for checking purposes
# make sure you edit the number of rows (nrow) and columns (ncol)!

rownames(observed_table) <- c("Very Happy", "Pretty Happy", "Not Too Happy") # name rows
colnames(observed_table) <- c("Married", "Widowed", "Divorced/Separated", "Never Married") # name cols

observed_table # print table to make sure it is what we want

              Married Widowed Divorced/Separated Never Married
Very Happy        600      63                112           144
Pretty Happy      720     142                355           459
Not Too Happy      93      51                119           127

chisq.test(observed_table) # chi-squared test for independence


    Pearson's Chi-squared test

data:  observed_table
X-squared = 224.12, df = 6, p-value < 2.2e-16

Test for Independence

Hypotheses
- H_0: Happiness does not depend on marital status.
- H_1: Happiness depends on marital status.
Test Statistic and p-Value
- \chi^2_0 = 224.116
- p < 0.001
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05
Conclusion/Interpretation
- Reject H_0.
- There is sufficient evidence to suggest that happiness depends on marital status.

Test for Independence

To see how the CrossTable() function works, let’s explore the Palmer penguin dataset.

library(gmodels)
penguins <- palmerpenguins::penguins 
CrossTable(penguins$species, penguins$sex, 
           prop.chisq= FALSE,  # turn off proportion contributed to chi-square statistic
           prop.t = FALSE, # turn off total proportions
           chisq = TRUE) # request chi-square test


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  333 

 
                 | penguins$sex 
penguins$species |    female |      male | Row Total | 
-----------------|-----------|-----------|-----------|
          Adelie |        73 |        73 |       146 | 
                 |     0.500 |     0.500 |     0.438 | 
                 |     0.442 |     0.435 |           | 
-----------------|-----------|-----------|-----------|
       Chinstrap |        34 |        34 |        68 | 
                 |     0.500 |     0.500 |     0.204 | 
                 |     0.206 |     0.202 |           | 
-----------------|-----------|-----------|-----------|
          Gentoo |        58 |        61 |       119 | 
                 |     0.487 |     0.513 |     0.357 | 
                 |     0.352 |     0.363 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |       165 |       168 |       333 | 
                 |     0.495 |     0.505 |           | 
-----------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  0.04860717     d.f. =  2     p =  0.9759894

Test for Independence

Hypotheses
- H_0: Biological sex of penguin depends on penguin species.
- H_1: Biological sex of penguin does not depend on penguin species.
Test Statistic and p-Value
- \chi^2_0 = 0.049
- p = 0.976
Rejection Region
- Reject H_0 if p < \alpha; \alpha=0.05
Conclusion/Interpretation
- Fail to reject H_0. There is not sufficient evidence to suggest that the biological sex of penguins depends on species of penguin.