Review: Inferential Statistics & Two-Sample t-Tests

STA4173: Biostatistics
Spring 2025

Introduction

In the last lecture, we focused on describing data.
- Continuous data: mean with standard deviation, median with interquartile range
- Categorical data: count with percentage
Today, we will focus on drawing conclusions about two population means using data.
- Confidence intervals
- Hypothesis testing

Confidence Intervals

Point Estimate

The single value of a statistic that estimates the value of a parameter.

Examples of point estimates:
It is necessary to know how good our estimation is, or to quantify our uncertainty.

Confidence Interval

A range of plausible values for the parameter based on values observed in the sample.

\text{estimate} \pm \text{margin of error}

Level of Confidence

The probability that the interval will capture the true parameter value in repeated samples. i.e., the success rate for the method.

Confidence Intervals

Because CIs are a range of values, we will use interval notation,

(lower bound, upper bound)

where
- lower bound = point estimate – margin of error
- upper bound = point estimate + margin of error
Make sure to state your confidence intervals in numeric order.
- i.e., the lower bound must be the smaller number and the upper bound must be the larger number.

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

(1-\alpha)100\% confidence interval for \mu_1-\mu_2

(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \sqrt{\frac{s_1^2 }{n_1} + \frac{s_2^2}{n_2}} where t_{\alpha/2} has \text{min}(n_1-1, n_2-1) degrees of freedom.

To construct this interval, we require either:
- the two populations to be normally distributed or
- the sample sizes are sufficiently large (n_1 \ge 30 and n_2 \ge 30)
R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level)

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

Recall the Palmer penguin data,

penguins <- palmerpenguins::penguins

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

Let’s find the 95% confidence interval for the difference in average weight (body_mass_g) between male and female (sex) penguins.
Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level)

What is the continuous variable?
What is the grouping variable?
What is the dataset name?
What is the confidence level?

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

Let’s find the 95% confidence interval for the difference in weight (body_mass_g) between male and female penguins.

t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.95)


    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685

Thus, the 95% confidence interval for \mu_F - \mu_M is (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?

t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99)

What do you expect? Recall that the 95% CI was (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?

t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99)


    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
99 percent confidence interval:
 -890.4112 -476.4124
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685

Thus, the 99% confidence interval for \mu_F - \mu_M is (-890.4, -476.4).
- Recall that the 95% CI was (-840.6, -526.2).

Hypothesis Testing

A friend of yours wants to play a simple coin-flipping game.
- If the coin comes up heads, you win; if it comes up tails, your friend wins.
- Suppose the outcome of five plays of the game is T, T, T, T, T.
- Is your friend cheating?

Hypothesis Testing

A friend of yours wants to play a simple coin-flipping game.
- If the coin comes up heads, you win; if it comes up tails, your friend wins.
- Suppose the outcome of five plays of the game is T, T, T, T, T.
- Is your friend cheating?
  - We know the probability of flipping a tail is 0.5.
  - We can compute the probability of flipping five tails in a row.

\begin{align*} P[\text{T, T, T, T, T}] &= 0.5 \times 0.5 \times 0.5 \times 0.5 \times 0.5 \\ &= 0.03125 \end{align*}

Is this probability low enough to believe your friend is cheating?

Hypothesis Testing

Hypothesis Testing

A procedure, based on sample evidence and probability, used to test statements regarding a characteristic of one or more populations.

Steps in hypothesis testing
1. Make a statement regarding the nature of the population.
2. Collect evidence (sample data) to test the statement.
3. Analyze the data to assess the plausibility of the statement.
Note: if we have population parameters available, we do not need to perform a hypothesis test.

Hypothesis Testing: Hypotheses

Hypothesis

A statement regarding a characteristic of one or more populations.

In hypothesis testing, we have two hypotheses: the null and the alternative.

Null hypothesis, H_0

A statement to be tested.

This is a statement of no change, no effect, or no difference.
It is assumed true until evidence indicates otherwise.

Alternative hypothesis, H_1

A statement that we are trying to find evidence to support.

Hypothesis Testing: Hypotheses

One sample tests:
- Two-tailed test
  - H_0: parameter = some value
  - H_1: parameter \ne some value
- Left-tailed test
  - H_0: parameter \ge some value
  - H_1: parameter < some value
- Right-tailed test
  - H_0: parameter \le some value
  - H_1: parameter > some value

Hypothesis Testing: Hypotheses

Two sample tests
- Two-tailed test
  - H_0: parameter₁ – parameter₂ = 0
  - H_1: parameter₁ – parameter₂ \ne 0
- Left-tailed test
  - H_0: parameter₁ – parameter₂ \ge 0
  - H_1: parameter₁ – parameter₂ < 0
- Right-tailed test
  - H_0: parameter₁ – parameter₂ \le 0
  - H_1: parameter₁ – parameter₂ > 0

Hypothesis Testing: Errors

We use data to draw conclusions about hypotheses.
- We will either reject or fail to reject the null (H_0).
If we draw the wrong conclusion, we make an error.
These can be classified as Type I (\alpha) or Type II (\beta) errors.
- \alpha and \beta are probabilities (i.e., are between 0 and 1).

Hypothesis Testing: Errors

As stated earlier, Type I (\alpha) and Type II (\beta) errors are probabilities.
- \alpha = \text{P}[\text{reject } H_0 \text{ when } H_0 \text{ is true}]
- \beta = \text{P}[\text{fail to reject } H_0 \text{ when } H_1 \text{ is true}]
We also call \alpha the level of significance.
We should choose \alpha based on the level of error we are willing to withstand in the experiment.
- The \alpha that is commonly used is \alpha=0.05.
- Sometimes, smaller \alpha is used. e.g., clinical trial \to \alpha=0.01.
For a fixed sample size (n), \alpha and \beta are inversely related.

Hypothesis Testing: Test Statistics

After stating our hypotheses, we will construct a test statistic.
The choice of test statistic depends on:
1. The hypotheses being tested.
2. Assumptions made about the data.
The value of the test statistic depends on the sample data.
- If we were to draw a different sample, we would find a different value for the test statistic.
We will use the test statistic on our way to drawing conclusions about the hypotheses.

Hypothesis Testing: p-Values

p-value

The probability of observing what we’ve observed or something more extreme, assuming the null hypothesis is true.

After constructing test statistics, we will find the corresponding p-value.
Finding a p-value depends on the distribution being used.
We will compare the p-value to \alpha in order to draw conclusions.
- Reject H_0 if p < \alpha.

Hypothesis Testing: Conclusions and Interpretations

Once we’ve found the p-value, we can draw a conclusion.
- If p < \alpha, we reject H_0.
  - There is sufficient evidence to suggest that H_1 is true.
- If p \ge \alpha, we fail to reject H_0.
  - There is not sufficient evidence to suggest that H_1 is true.
Take aways:
- We never “accept” the null.
- We always interpret in terms of H_1.

Two-Sample t-Test

Hypothesis Test for Two Independent Means

Hypotheses

H_0: \mu_1-\mu_2 = \mu_0 | H_0: \mu_1-\mu_2 \le \mu_0 | H_0: \mu_1-\mu_2 \ge \mu_0
H_1: \mu_1-\mu_2 \ne \mu_0 | H_0: \mu_1 - \mu_2 > \mu_0 | H_1: \mu_1 - \mu_2 < \mu_0

Test Statistic t_0 = \frac{(\bar{x}_1-\bar{x}_2)-\mu_0}{\sqrt{\frac{s_1^2}{n}+\frac{s_2^2}{n}}}

p-Value

p = 2 P[t \ge |t_0|] | p = P[t \ge |t_0|] | p = P[t \le |t_0|]

Rejection Region

Reject H_0 if p < \alpha.

Conclusion/Interpretation

[Reject or fail to reject] H_0.
There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Two-Sample t-Test

R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)

Important!!
- We are estimating \mu_1 - \mu_2, but R is going to subtract in alphabetical or numeric order of the grouping variable.
  - e.g., if we have “Male” and “Female”, it will estimate \mu_{\text{Female}} - \mu_{\text{Male}}.
  - e.g., if we have “110” and “5”, it will estimate \mu_{5} - \mu_{110}.
  - In the case of two-tailed tets, this does not matter… but beware when doing a one-tailed test!

Two-Sample t-Test: Example

Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.
Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)

What is the continuous variable?
What is the grouping variable?
What is the dataset name?
What is the hypothesized difference?
What is the alternative?

Two-Sample t-Test: Example

Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.
Remember the R syntax:

t.test(body_mass_g ~ sex,
       data = penguins,
       mu = 0,
       alternative = "two")


    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685

Is this a significant difference?

Two-Sample t-Test: Example

Hypotheses

H_0: \mu_1-\mu_2 = 0
H_1: \mu_1-\mu_2 \ne 0

Test Statistic and p-Value

t_0 = -8.55
p < 0.001

Rejection Region

Reject H_0 if p < \alpha; \alpha=0.05.

Conclusion/Interpretation

Reject H_0.
There is sufficient evidence to suggest that male and female penguins have different weights.

Hypothesis Testing: Practical vs. Statistical Significance

Hypothesis testing depends on sample size.
As the sample size increases, our p-values decrease necessarily.
As p-values decrease, we are more likely to reject the null hypothesis.
We must ask ourselves if the value we are testing against makes practical sense.
- A new weight loss medication where the average amount of weight loss was 1 lb over 6 months.
- A new weight loss medication where the average amount of weight lost was 15 lb over 6 months.
- A new teaching method that raised final exam scores by 2 points.
- A new teaching method that raised final exam scores by 15 points.

Wrap Up

Today we reviewed statistical inference.
- Confidence intervals
- Hypothesis testing
Get to know you quiz - complete with RStudio - due today.
Next meeting: how to conceptualize research questions; dependent t-test.