nose_bright n (pct)
No 15 (12.5%)
Yes 105 (87.5%)
Relationships between categorical variables.
One-sample proportion.
Two-sample proportions.
Test for goodness-of-fit.
Test for Independence.
Rudolph surveys elves in the North Pole Workshop to see if they think his nose is “bright enough for foggy nights.” He believes at least 80% of elves think so, and he wants to test that claim.
Let’s first look at the 95% confidence interval for \pi, the proportion of elves who think Rudolph’s nose is bright enough.
p̂= 0.875 (105/120)
95% confidence interval for π: (0.804, 0.9228)
Rudolph surveys elves in the North Pole Workshop to see if they think his nose is “bright enough for foggy nights.” He believes at least 90% of elves think so, and he wants to test that claim.
Now, let’s test the hypothesis – do at least 85% of elves think Rudolph’s nose is bright enough? Test at the \alpha=0.05 level.
Rudolph proposes backup LED antlers for safety. He surveys female and male reindeer about whether they support the LED plan. Rudolph wants to test whether support is higher among female reindeer than male reindeer.
Let’s find the 90% CI for \pi_{\text{F}}-\pi_{\text{M}}.
Sample proportion (Female): 0.775
Sample proportion (Male): 0.64
Point estimate for the difference in proportions (p̂[Female] − p̂[Male]): 0.135
90% confidence interval for π[Female] − π[Male]: (-0.0208, 0.2908)
Rudolph proposes backup LED antlers for safety. He surveys female and male reindeer about whether they support the LED plan. Rudolph wants to test whether support is higher among female reindeer than male reindeer.
Let’s now determine if support is indeed higher among female reindeer. Test at the \alpha=0.10 level.
rudolph %>% two_prop_HT(binary = support_led,
grouping = sex,
event = "Support",
alternative = "greater",
alpha = 0.1)Two-sample z-test for difference in proportions:
p̂[Female = 0.775 (31/40)
p̂[Male = 0.64 (32/50)
p̂[Female] − p̂[Male] = 0.135
Null: H₀: π[Female] − π[Male] ≤ 0
Alternative: H₁: π[Female] − π[Male] > 0
Test statistic: z = 1.43
p-value: p = 0.076
Conclusion: Reject the null hypothesis (p = 0.076 < α = 0.1)
Chi-square goodness-of-fit test:
Null: H₀: Observed frequencies match expected proportions
Alternative: H₁: Observed frequencies do not match expected proportions
Test statistic: χ²(5) = 2.08
p-value: p = 0.838
Conclusion: Fail to reject the null hypothesis (p = 0.838 ≥ α = 0.05)
Santa’s forecasting model predicts the distribution of toy requests:
Rudolph samples 400 letters to Santa and records the most requested toy per letter in order to determine if the observed distribution matches Santa’s forecast.
Santa’s forecasting model predicts the distribution of toy requests:
Rudolph samples 400 letters to Santa and records the most requested toy per letter in order to determine if the observed distribution matches Santa’s forecast. Because we want to be super sure, we will test at the \alpha=0.01 level.
rudolph %>% goodness_of_fit(categorical = toy_request,
expected = c("Trains" = 0.30,
"Dolls" = 0.25,
"Teddy Bears" = 0.20,
"Games" = 0.25),
alpha = 0.01)Chi-square goodness-of-fit test:
Null: H₀: Observed frequencies match expected proportions
Alternative: H₁: Observed frequencies do not match expected proportions
Test statistic: χ²(3) = 18.19
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.01)
Rudolph suspects that a reindeer’s team position (lead, middle, back) might be associated with which Misfit Toy they identify with most: Jack-in-the-Box, Elephant with Polka Dots, or Charlie-in-the-Box.
After surveying 120 reindeer, Rudolph wants to test if favorite Misfit Toy depends on team position.
Rudolph suspects that a reindeer’s team position (lead, middle, back) might be associated with which Misfit Toy they identify with most: Jack-in-the-Box, Elephant with Polka Dots, or Charlie-in-the-Box.
After surveying 120 reindeer, Rudolph wants to test if favorite Misfit Toy depends on team position. Test at the \alpha=0.05 level.
Chi-square test for independence:
Null: H₀: misfit_toy and team are independent
Alternative: H₁: misfit_toy and team depend on one another
Test statistic: χ²(4) = 15.85
p-value: p = 0.003
Conclusion: Reject the null hypothesis (p = 0.003 < α = 0.05)
This module covers the basics of categorical analysis.
Key topics include:
Note that we could extend what we know about regression (using continuous outcomes) to categorical outcomes.
STA4173 - Biostatistics - Fall 2025