STA4173: Biostatistics
Spring 2025
In the last lecture, we focused on describing data.
Today, we will focus on drawing conclusions about two population means using data.
Point Estimate
The single value of a statistic that estimates the value of a parameter.
Examples of point estimates:
It is necessary to know how good our estimation is, or to quantify our uncertainty.
Confidence Interval
A range of plausible values for the parameter based on values observed in the sample.
\text{estimate} \pm \text{margin of error}
Level of Confidence
The probability that the interval will capture the true parameter value in repeated samples. i.e., the success rate for the method.
(1-\alpha)100\% confidence interval for \mu_1-\mu_2
(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \sqrt{\frac{s_1^2 }{n_1} + \frac{s_2^2}{n_2}} where t_{\alpha/2} has \text{min}(n_1-1, n_2-1) degrees of freedom.
Let’s find the 95% confidence interval for the difference in average weight (body_mass_g) between male and female (sex) penguins.
Remember the R syntax:
What is the continuous variable?
What is the grouping variable?
What is the dataset name?
What is the confidence level?
Welch Two Sample t-test
data: body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-840.5783 -526.2453
sample estimates:
mean in group female mean in group male
3862.273 4545.685
Welch Two Sample t-test
data: body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
99 percent confidence interval:
-890.4112 -476.4124
sample estimates:
mean in group female mean in group male
3862.273 4545.685
\begin{align*} P[\text{T, T, T, T, T}] &= 0.5 \times 0.5 \times 0.5 \times 0.5 \times 0.5 \\ &= 0.03125 \end{align*}
Hypothesis Testing
A procedure, based on sample evidence and probability, used to test statements regarding a characteristic of one or more populations.
Steps in hypothesis testing
Make a statement regarding the nature of the population.
Collect evidence (sample data) to test the statement.
Analyze the data to assess the plausibility of the statement.
Note: if we have population parameters available, we do not need to perform a hypothesis test.
Hypothesis
A statement regarding a characteristic of one or more populations.
Null hypothesis, H_0
A statement to be tested.
Alternative hypothesis, H_1
A statement that we are trying to find evidence to support.
p-value
The probability of observing what we’ve observed or something more extreme, assuming the null hypothesis is true.
Hypothesis Test for Two Independent Means
Hypotheses
Test Statistic t_0 = \frac{(\bar{x}_1-\bar{x}_2)-\mu_0}{\sqrt{\frac{s_1^2}{n}+\frac{s_2^2}{n}}}
p-Value
Rejection Region
Conclusion/Interpretation
[Reject or fail to reject] H_0.
There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].
Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.
Remember the R
syntax:
What is the continuous variable?
What is the grouping variable?
What is the dataset name?
What is the hypothesized difference?
What is the alternative?
Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.
Remember the R
syntax:
Welch Two Sample t-test
data: body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-840.5783 -526.2453
sample estimates:
mean in group female mean in group male
3862.273 4545.685
Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
Reject H_0.
There is sufficient evidence to suggest that male and female penguins have different weights.
Today we reviewed statistical inference.
Get to know you quiz - complete with RStudio - due today.
Next meeting: how to conceptualize research questions; dependent t-test.