Tests for Proportions

STA4173: Biostatistics
Spring 2025

Introduction

  • Before today, we have focused on continuous outcomes.

  • Now we will focus on categorical (or qualitative) outcomes.

  • Today, we will review how to test one and two sample proportions.

  • We will estimate a proportion using \hat{p},

\hat{p} = \frac{x}{n}

  • We will estimate the difference between two proportions using \hat{p}_1 - \hat{p}_2,

\hat{p_1}- \hat{p_2} = \frac{x_1}{n_1} - \frac{x_2}{n_2}

One-Sample Proportions

  • Hypotheses
    • H_0: p \ge p_0 | H_0: p \le p_0 | H_0: p = p_0
    • H_1: p < p_0 | H_0: p > p_0 | H_1: p \ne p_0
  • Test Statistic

z_0 = \frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}

  • p-Value
    • p = P[z \le z_0] | p = P[z \ge z_0] | p = 2P[z \ge |z_0|]
  • Rejection Region
    • Reject H_0 if p < \alpha.
  • Conclusion/Interpretation
    • [Reject or fail to reject] H_0.
    • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

One-Sample Proportions

  • (1–\alpha)100% CI for a population proportion, p

\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

  • To construct this interval, we require both:
    • n\hat{p}(1-\hat{p}) \ge 10 and
    • n \le 0.05N

One-Sample Proportions

binom.test(x = num_success, 
           n = sample_size, 
           p = hypothesized_value, 
           alternative = "alternative")
  • If we have n > 30,
prop.test(x = num_success, 
          n = sample_size, 
          p = hypothesized_value, 
          alternative = "alternative",
          correct=FALSE)

One-Sample Proportions

  • Humira is a medication used to treat rheumatoid arthritis (RA). In clinical trials of Humira, 705 subjects diagnosed with RA were administered 40 mg of Humira every other week. Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08. Does the sample evidence represent significant evidence that a higher proportion of subjects receiving Humira experience nausea as a side effect than those taking a placebo? Test at the \alpha = 0.05 level of significance.

  • What are the important pieces?

One-Sample Proportions

  • Humira is a medication used to treat rheumatoid arthritis (RA). In clinical trials of Humira, 705 subjects diagnosed with RA were administered 40 mg of Humira every other week. Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08. Does the sample evidence represent significant evidence that a higher proportion of subjects receiving Humira experience nausea as a side effect than those taking a placebo? Test at the \alpha = 0.05 level of significance.

  • What is the point estimate, \hat{p}?

  • What is the 95% confidence interval for p?

  • Are there a higher proportion of subjects taking Humira experiencing nausea as a side effect than those taking a placebo?

One-Sample Proportions

  • Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08.
prop.test(x = 66, 
          n = 705,
          alternative = "two",
          correct=FALSE)

    1-sample proportions test without continuity correction

data:  66 out of 705, null probability 0.5
X-squared = 465.71, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.07426251 0.11737620
sample estimates:
         p 
0.09361702 
  • Thus, the point estimate is \hat{p} = 0.094 and the 95% CI for p is (0.07, 0.12).

One-Sample Proportions

  • Of the 705 subjects, 66 reported nausea as a side effect. It is known that the proportion of RA subjects in similar studies receiving a placebo who report nausea as a side effect is 0.08.
prop.test(x = 66, 
          n = 705,
          p = 0.08,
          alternative = "greater",
          correct=FALSE)

    1-sample proportions test without continuity correction

data:  66 out of 705, null probability 0.08
X-squared = 1.7761, df = 1, p-value = 0.09131
alternative hypothesis: true p is greater than 0.08
95 percent confidence interval:
 0.07709288 1.00000000
sample estimates:
         p 
0.09361702 

One-Sample Proportions

  • Hypotheses:

    • H_0: \ p \le 0.08
    • H_1: \ p > 0.08
  • Test Statistic and p-Value

    • \chi^2 = 1.77
    • p= 0.091
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion / Interpretation

    • Fail to reject H_0.

    • There is not sufficient evidence to suggest that the proportion of subjects taking Humira who experience nausea is greater than 0.08.

One-Sample Proportions

  • Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.

  • What are the important pieces?

One-Sample Proportions

  • Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.

  • What are the important pieces?

    • Which do you think is easier to raise – a boy or a girl? When asked this question in 1947, 24% of all Americans said raising a girl was easier. In June 2018, the Gallup Organization surveyed 1500 adult Americans, of which 408 felt it was easier to raise a girl. Does this result suggest the proportion of adult Americans who believe it is easier to raise a girl has changed since 1947? Test at the \alpha=0.10 level.

One-Sample Proportions

prop.test(x = 408, 
          n = 1500,
          p = 0.24,
          correct=FALSE)

    1-sample proportions test without continuity correction

data:  408 out of 1500, null probability 0.24
X-squared = 8.4211, df = 1, p-value = 0.003709
alternative hypothesis: true p is not equal to 0.24
95 percent confidence interval:
 0.2500845 0.2950804
sample estimates:
    p 
0.272 
  • Thus, the point estimate is \hat{p} = 0.272 and the 95% CI for p is (0.25, 0.30).

One-Sample Proportions

  • Hypotheses

    • H_0: \ p = 0.24
    • H_1: \ p \ne 0.24
  • Test Statistic and p-Value

    • \chi^2 = 8.42
    • p= 0.004
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.10.
  • Conclusion / Interpretation

    • Reject H_0.

    • There is sufficient evidence to suggest that the proportion of adult Americans who believe that it is easier to raise a girl has changed since 1947.

Two-Sample Proportions

  • Hypotheses
    • H_0: p_1 - p_2 \ge d_0 | H_0: p_1 -p_2 \le d_0 | H_0: p_1-p_2 = d_0
    • H_1: p_1-p_2 < d_0 | H_0: p_1 - p_2 > d_0 | H_1: p_1-p_2 \ne d_0
  • Test Statistic

z_0 = \frac{\left( \hat{p}_1 - \hat{p}_2 \right)- d_0}{\sqrt{\hat{p}\left(1-\hat{p}\right)\left( \frac{1}{n_1}+\frac{1}{n_2} \right)}}

  • where

\hat{p}_1 = \frac{x_1}{n_1}, \ \ \ \hat{p}_2 = \frac{x_2}{n_2}, \ \ \ \hat{p} = \frac{x_1+x_2}{n_1+n_2}

  • p-Value
    • p = P[z \le z_0] | p = P[z \ge z_0] | p = 2P[z \ge |z_0|]
  • Rejection Region
    • Reject H_0 if p < \alpha.

Two-Sample Proportions

  • (1–\alpha)100% CI for p_1-p_2

(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1 (1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

  • where

\hat{p}_1-\hat{p}_2=\frac{x_1}{n_1} - \frac{x_2}{n_2}

  • and

    • x_i is the number of individuals in group i with a specified characteristic
    • n_i is the sample size for group i
  • To construct this interval, we require:

    • n_1 \hat{p}_1(1-\hat{p}_1) \ge 10 and
    • n_2 \hat{p}_2(1-\hat{p}_2) \ge 10.

Two-Sample Proportions

prop.test(x = c(x_1, x_2), 
          n = c(n_1, n_2), 
          alternative = [alternative],
          correct=FALSE)

Two-Sample Proportions

  • In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups.

    • The patients in group 1 (experimental group) received 200 \mug of Nasonex.

      • Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.
    • The patients in group 2 (control group) received a placebo.

      • Of the 1671 patients in the control group, 368 reported headaches as a side effect.
  • Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?

  • Test at the \alpha= 0.05 level of significance.

  • What are the important pieces?

Two-Sample Proportions

  • In clinical trials of Nasonex, 3774 adult and adolescent allergy patients (patients 12 years and older) were randomly divided into two groups.

    • The patients in group 1 (experimental group) received 200 \mug of Nasonex.

      • Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.
    • The patients in group 2 (control group) received a placebo.

      • Of the 1671 patients in the control group, 368 reported headaches as a side effect.
  • Is there evidence to conclude that the proportion of Nasonex users who experienced headaches as a side effect is greater than the proportion in the control group?

  • Test at the \alpha= 0.05 level of significance.

  • What are the important pieces?

Two-Sample Proportions

  • Of the 2103 patients in the experimental group, 547 reported headaches as a side effect.

  • Of the 1671 patients in the control group, 368 reported headaches as a side effect.

prop.test(x = c(547, 368), 
          n = c(2103, 1671), 
          correct=FALSE)

    2-sample test for equality of proportions without continuity correction

data:  c(547, 368) out of c(2103, 1671)
X-squared = 8.0618, df = 1, p-value = 0.004521
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01255827 0.06719613
sample estimates:
   prop 1    prop 2 
0.2601046 0.2202274 
  • Thus, \hat{p}_{\text{Exp}} = 0.260, \hat{p}_{\text{Ctrl}} = 0.220 and \hat{p}_{\text{Exp}} - \hat{p}_{\text{Ctrl}} = 0.04.

  • The 95% CI for \hat{p}_{\text{Exp}} - \hat{p}_{\text{Ctrl}} is (0.013, 0.067).

Two-Sample Proportions

prop.test(x = c(547, 368), 
          n = c(2103, 1671), 
          alternative = "greater",
          correct=FALSE)

    2-sample test for equality of proportions without continuity correction

data:  c(547, 368) out of c(2103, 1671)
X-squared = 8.0618, df = 1, p-value = 0.00226
alternative hypothesis: greater
95 percent confidence interval:
 0.01695043 1.00000000
sample estimates:
   prop 1    prop 2 
0.2601046 0.2202274 

Two-Sample Proportions

  • Hypotheses

    • H_0: \ p_{\text{Exp}} \le p_{\text{Ctrl}}
    • H_1: \ p_{\text{Exp}} > p_{\text{Ctrl}}
  • Test Statistic and p-value

    • \chi^2 = 8.06 or z_0 = \sqrt{8.0618} = 2.839
    • p= 0.002
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.05.
  • Conclusion / Interpretation

    • Reject H_0.

    • There is sufficient evidence to suggest that the proportion of Nasonex users who experienced headaches as a side effect is greater than that of the control group.

Goodness-of-Fit

  • The goodness-of-fit test allows us to determine if a frequency distribution follows a specific distribution.

    • This could be a named distribution (e.g., normal)

    • It could also be a distribution without a name (e.g., the probabilities are specified)

  • Before we can perform the goodness-of-fit test, we must compute expected counts.E_i = n p_i

    • e.g., suppose that we expect 25% of Skittles to be red; if we have 100 Skittles, we then expect 25 of them to be red.

Goodness-of-Fit

  • Hypotheses

    • H_0: The random variable follows the specified distribution.
    • H_1: The random variable does not follow the specified distribution.
  • Test Statistic

    • \chi^2_0 = \sum_{i=1}^k \frac{(O_i-E_i)^2}{E_i}, where O_i = observed and E_i = expected
  • p-Value

    • p = P[\chi^2_{k-1} \ge \chi^2_0], where k = number of categories
  • Rejection Region

    • Reject H_0 if p < \alpha

Goodness-of-Fit

  • In R, we will use the chisq.test() function and plug in both the counts and the expected probabilities
counts <- c(O_1, O_2, ..., O_k) # create O_i vector
probs <- c(p_1, p_2, ..., p_k) # create p_i vector
chisq.test(counts, p = probs)

Goodness-of-Fit

  • Using the economy data, below (based on the 2017 Current Population Survey, adjusted for inflation), determine if there is evidence to suggest that the distribution of income has changed since 2000.

  • Test at the \alpha = 0.05 level of significance.

Income Observed Probability
Under $15,000 161 0.099
$15,000 - $24,999 144 0.098
$25,000 - $34,999 138 0.093
$35,000 - $49,999 184 0.135
$50,000 - $74,999 247 0.179
$75,000 - $99,999 188 0.131
$100,000 - $149,999 217 0.149
$150,000 - $199,999 105 0.061
Over $200,000 116 0.055

Goodness-of-Fit

counts <- c(161, 144, 138, 184, 247, 188, 217, 105, 116) # create O_i vector
probs <- c(0.099, 0.098, 0.093, 0.135, 0.179, 0.131, 0.149, 0.061, 0.055) # create p_i vector
chisq.test(counts, p = probs)

    Chi-squared test for given probabilities

data:  counts
X-squared = 20.693, df = 8, p-value = 0.00801

Goodness-of-Fit

  • Hypotheses

    • H_0: Distribution of income in 2017 follows the same distribution as 2000.
    • H_1: Distribution of income in 2017 does not follow the same distribution as 2000.
  • Test Statistic and p-Value

    • \chi^2_0 = 20.693
    • p = 0.008
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.05
  • Conclusion/Interpretation

    • Reject H_0.

    • There is sufficient evidence to suggest that the distribution of income in 2017 does not follow the same distribution as in 2000.

Goodness-of-Fit

  • An obstetrician wants to know whether the proportion of children born on each day of the week is the same.

  • She randomly selects 500 birth records and obtains the data shown in the table below (based on data obtained from Vital Statistics of the United States, 2016).

  • Is there reason to believe that the day on which a child is born does not occur with equal frequency at the \alpha = 0.01 level of significance?

Sun Mon Tues Weds Thurs Fri Sat
46 76 83 81 81 80 53
  • What about expected probabilities?

Goodness-of-Fit

counts <- c(46, 76, 83, 81, 81, 80, 53) # create O_i vector
probs <- c(1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7) # create p_i vector
chisq.test(counts, p = probs)

    Chi-squared test for given probabilities

data:  counts
X-squared = 19.568, df = 6, p-value = 0.003305

Goodness-of-Fit

  • Hypotheses

    • H_0: \ p_M = p_T = ... = p_{Su} = \frac{1}{7}.
    • H_1: At least one proportion is different.
  • Test Statistic and p-Value

    • \chi^2_0 = 19.568
    • p = 0.003
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.01
  • Conclusion/Interpretation

    • Reject H_0.
    • There is sufficient evidence to suggest that at least one proportion is different.

Test for Independence

  • Let us now discuss testing two categorical variables to determine if a relationship exists.

  • Take, for example, this data:

  • We will use the \chi^2 test for independence to determine if happiness depends on marital status.

Test for Independence

  • Like in the goodness-of-fit test, we will first compute expected values.
  • We find the expected values,

E_{ij} = \frac{R_i C_j}{n}

  • where R_i is the total for row i, C_j is the total for column j, and n is the total sample size

Test for Independence

  • Hypotheses

    • H_0: There is not a relationship between [var 1] and [var 2].
    • H_1: There is a relationship between [var 1] and [var 2].
  • Test Statistic

    • \chi_0^2 = \sum_{i=1}^k \frac{(O_i-E_i)^2}{E_i}
  • p-Value

    • p = \text{P}[\chi^2_{(r-1)(c-1)} \ge \chi^2_0]
  • Rejection Region

    • Reject H_0 if p < \alpha

Test for Independence

  • If given the contingency table, we can enter it in a matrix() (see example) and use the chisq.test() function.
chisq.test([matrix name])
  • If given raw data, we can use the CrossTable() function in the gmodels package.

    • Note: this function replicates PROC FREQ from SAS :)
CrossTable(dataset_name$row_variable, 
           dataset_name$col_variable, 
           prop.chisq= FALSE,  # turn off proportion contributed to chi-square statistic
           prop.t = FALSE, # turn off total proportions
           chisq = TRUE) # request chi-square test

Test for Independence

  • In our example,
observed_table <- matrix(c(600, 63, 112, 144,
                           720, 142, 355, 459,
                           93, 51, 119, 127), 
                         nrow = 3, ncol = 4, byrow = T)
# I prefer to include breaks to make it look like the table given just for checking purposes
# make sure you edit the number of rows (nrow) and columns (ncol)!

rownames(observed_table) <- c("Very Happy", "Pretty Happy", "Not Too Happy") # name rows
colnames(observed_table) <- c("Married", "Widowed", "Divorced/Separated", "Never Married") # name cols

observed_table # print table to make sure it is what we want
              Married Widowed Divorced/Separated Never Married
Very Happy        600      63                112           144
Pretty Happy      720     142                355           459
Not Too Happy      93      51                119           127
chisq.test(observed_table) # chi-squared test for independence

    Pearson's Chi-squared test

data:  observed_table
X-squared = 224.12, df = 6, p-value < 2.2e-16

Test for Independence

  • Hypotheses

    • H_0: Happiness does not depend on marital status.
    • H_1: Happiness depends on marital status.
  • Test Statistic and p-Value

    • \chi^2_0 = 224.116
    • p < 0.001
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.05
  • Conclusion/Interpretation

    • Reject H_0.

    • There is sufficient evidence to suggest that happiness depends on marital status.

Test for Independence

  • To see how the CrossTable() function works, let’s explore the Palmer penguin dataset.
library(gmodels)
penguins <- palmerpenguins::penguins 
CrossTable(penguins$species, 
           penguins$sex, 
           prop.chisq= FALSE,  # turn off proportion contributed to chi-square statistic
           prop.t = FALSE, # turn off total proportions
           chisq = TRUE) # request chi-square test

 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  333 

 
                 | penguins$sex 
penguins$species |    female |      male | Row Total | 
-----------------|-----------|-----------|-----------|
          Adelie |        73 |        73 |       146 | 
                 |     0.500 |     0.500 |     0.438 | 
                 |     0.442 |     0.435 |           | 
-----------------|-----------|-----------|-----------|
       Chinstrap |        34 |        34 |        68 | 
                 |     0.500 |     0.500 |     0.204 | 
                 |     0.206 |     0.202 |           | 
-----------------|-----------|-----------|-----------|
          Gentoo |        58 |        61 |       119 | 
                 |     0.487 |     0.513 |     0.357 | 
                 |     0.352 |     0.363 |           | 
-----------------|-----------|-----------|-----------|
    Column Total |       165 |       168 |       333 | 
                 |     0.495 |     0.505 |           | 
-----------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  0.04860717     d.f. =  2     p =  0.9759894 


 

Test for Independence

  • Hypotheses

    • H_0: Biological sex of penguin depends on penguin species.
    • H_1: Biological sex of penguin does not depend on penguin species.
  • Test Statistic and p-Value

    • \chi^2_0 = 0.049
    • p = 0.976
  • Rejection Region

    • Reject H_0 if p < \alpha; \alpha=0.05
  • Conclusion/Interpretation

    • Fail to reject H_0. There is not sufficient evidence to suggest that the biological sex of penguins depends on species of penguin.