Review: Inferential Statistics & Two-Sample t-Tests

STA4173: Biostatistics
Spring 2025

Introduction

  • In the last lecture, we focused on describing data.

    • Continuous data: mean with standard deviation, median with interquartile range
    • Categorical data: count with percentage
  • Today, we will focus on drawing conclusions about two population means using data.

    • Confidence intervals
    • Hypothesis testing

Confidence Intervals

Point Estimate

The single value of a statistic that estimates the value of a parameter.

  • Examples of point estimates:

  • It is necessary to know how good our estimation is, or to quantify our uncertainty.

Confidence Interval

A range of plausible values for the parameter based on values observed in the sample.

\text{estimate} \pm \text{margin of error}

Level of Confidence

The probability that the interval will capture the true parameter value in repeated samples. i.e., the success rate for the method.

Confidence Intervals

Confidence Intervals

  • Because CIs are a range of values, we will use interval notation,
(lower bound, upper bound)
  • where
    • lower bound = point estimate – margin of error
    • upper bound = point estimate + margin of error
  • Make sure to state your confidence intervals in numeric order.
    • i.e., the lower bound must be the smaller number and the upper bound must be the larger number.

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

(1-\alpha)100\% confidence interval for \mu_1-\mu_2

(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \sqrt{\frac{s_1^2 }{n_1} + \frac{s_2^2}{n_2}} where t_{\alpha/2} has \text{min}(n_1-1, n_2-1) degrees of freedom.

  • To construct this interval, we require either:
    • the two populations to be normally distributed or
    • the sample sizes are sufficiently large (n_1 \ge 30 and n_2 \ge 30)
  • R syntax:
t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level) 

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Recall the Palmer penguin data,
penguins <- palmerpenguins::penguins

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Let’s find the 95% confidence interval for the difference in average weight (body_mass_g) between male and female (sex) penguins.

  • Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       conf.level = confidence_level) 
  • What is the continuous variable?

  • What is the grouping variable?

  • What is the dataset name?

  • What is the confidence level?

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • Let’s find the 95% confidence interval for the difference in weight (body_mass_g) between male and female penguins.
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.95) 

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Thus, the 95% confidence interval for \mu_F - \mu_M is (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99) 
  • What do you expect? Recall that the 95% CI was (-840.6, -526.2).

Confidence Interval for \mathbf{\boldsymbol \mu_1-\boldsymbol\mu_2}

  • What about the 99% confidence interval for the difference in weight (body_mass_g) between male and female penguins?
t.test(body_mass_g ~ sex,
       data = penguins,
       conf.level = 0.99) 

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
99 percent confidence interval:
 -890.4112 -476.4124
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Thus, the 99% confidence interval for \mu_F - \mu_M is (-890.4, -476.4).
    • Recall that the 95% CI was (-840.6, -526.2).

Hypothesis Testing

  • A friend of yours wants to play a simple coin-flipping game.
    • If the coin comes up heads, you win; if it comes up tails, your friend wins.
    • Suppose the outcome of five plays of the game is T, T, T, T, T.
    • Is your friend cheating?

Hypothesis Testing

  • A friend of yours wants to play a simple coin-flipping game.
    • If the coin comes up heads, you win; if it comes up tails, your friend wins.
    • Suppose the outcome of five plays of the game is T, T, T, T, T.
    • Is your friend cheating?
      • We know the probability of flipping a tail is 0.5.
      • We can compute the probability of flipping five tails in a row.

\begin{align*} P[\text{T, T, T, T, T}] &= 0.5 \times 0.5 \times 0.5 \times 0.5 \times 0.5 \\ &= 0.03125 \end{align*}

  • Is this probability low enough to believe your friend is cheating?

Hypothesis Testing

Hypothesis Testing

A procedure, based on sample evidence and probability, used to test statements regarding a characteristic of one or more populations.

  • Steps in hypothesis testing

    1. Make a statement regarding the nature of the population.

    2. Collect evidence (sample data) to test the statement.

    3. Analyze the data to assess the plausibility of the statement.

  • Note: if we have population parameters available, we do not need to perform a hypothesis test.

Hypothesis Testing: Hypotheses

Hypothesis

A statement regarding a characteristic of one or more populations.

  • In hypothesis testing, we have two hypotheses: the null and the alternative.

Null hypothesis, H_0

A statement to be tested.

  • This is a statement of no change, no effect, or no difference.
  • It is assumed true until evidence indicates otherwise.

Alternative hypothesis, H_1

A statement that we are trying to find evidence to support.

Hypothesis Testing: Hypotheses

  • One sample tests:
    • Two-tailed test
      • H_0: parameter = some value
      • H_1: parameter \ne some value
    • Left-tailed test
      • H_0: parameter \ge some value
      • H_1: parameter < some value
    • Right-tailed test
      • H_0: parameter \le some value
      • H_1: parameter > some value

Hypothesis Testing: Hypotheses

  • Two sample tests
    • Two-tailed test
      • H_0: parameter1 – parameter2 = 0
      • H_1: parameter1 – parameter2 \ne 0
    • Left-tailed test
      • H_0: parameter1 – parameter2 \ge 0
      • H_1: parameter1 – parameter2 < 0
    • Right-tailed test
      • H_0: parameter1 – parameter2 \le 0
      • H_1: parameter1 – parameter2 > 0

Hypothesis Testing: Errors

  • We use data to draw conclusions about hypotheses.
    • We will either reject or fail to reject the null (H_0).
  • If we draw the wrong conclusion, we make an error.
  • These can be classified as Type I (\alpha) or Type II (\beta) errors.
    • \alpha and \beta are probabilities (i.e., are between 0 and 1).

Hypothesis Testing: Errors

  • As stated earlier, Type I (\alpha) and Type II (\beta) errors are probabilities.
    • \alpha = \text{P}[\text{reject } H_0 \text{ when } H_0 \text{ is true}]
    • \beta = \text{P}[\text{fail to reject } H_0 \text{ when } H_1 \text{ is true}]
  • We also call \alpha the level of significance.
  • We should choose \alpha based on the level of error we are willing to withstand in the experiment.
    • The \alpha that is commonly used is \alpha=0.05.
    • Sometimes, smaller \alpha is used. e.g., clinical trial \to \alpha=0.01.
  • For a fixed sample size (n), \alpha and \beta are inversely related.

Hypothesis Testing: Test Statistics

  • After stating our hypotheses, we will construct a test statistic.
  • The choice of test statistic depends on:
    1. The hypotheses being tested.
    2. Assumptions made about the data.
  • The value of the test statistic depends on the sample data.
    • If we were to draw a different sample, we would find a different value for the test statistic.
  • We will use the test statistic on our way to drawing conclusions about the hypotheses.

Hypothesis Testing: p-Values

p-value

The probability of observing what we’ve observed or something more extreme, assuming the null hypothesis is true.

  • After constructing test statistics, we will find the corresponding p-value.
  • Finding a p-value depends on the distribution being used.
  • We will compare the p-value to \alpha in order to draw conclusions.
    • Reject H_0 if p < \alpha.

Hypothesis Testing: Conclusions and Interpretations

  • Once we’ve found the p-value, we can draw a conclusion.
    • If p < \alpha, we reject H_0.
      • There is sufficient evidence to suggest that H_1 is true.
    • If p \ge \alpha, we fail to reject H_0.
      • There is not sufficient evidence to suggest that H_1 is true.
  • Take aways:
    • We never “accept” the null.
    • We always interpret in terms of H_1.

Two-Sample t-Test

Hypothesis Test for Two Independent Means

Hypotheses

  • H_0: \mu_1-\mu_2 = \mu_0 | H_0: \mu_1-\mu_2 \le \mu_0 | H_0: \mu_1-\mu_2 \ge \mu_0
  • H_1: \mu_1-\mu_2 \ne \mu_0 | H_0: \mu_1 - \mu_2 > \mu_0 | H_1: \mu_1 - \mu_2 < \mu_0

Test Statistic t_0 = \frac{(\bar{x}_1-\bar{x}_2)-\mu_0}{\sqrt{\frac{s_1^2}{n}+\frac{s_2^2}{n}}}

p-Value

  • p = 2 P[t \ge |t_0|] | p = P[t \ge |t_0|] | p = P[t \le |t_0|]

Rejection Region

  • Reject H_0 if p < \alpha.

Conclusion/Interpretation

  • [Reject or fail to reject] H_0.

  • There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].

Two-Sample t-Test

  • R syntax:
t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)
  • Important!!
    • We are estimating \mu_1 - \mu_2, but R is going to subtract in alphabetical or numeric order of the grouping variable.
      • e.g., if we have “Male” and “Female”, it will estimate \mu_{\text{Female}} - \mu_{\text{Male}}.
      • e.g., if we have “110” and “5”, it will estimate \mu_{5} - \mu_{110}.
      • In the case of two-tailed tets, this does not matter… but beware when doing a one-tailed test!

Two-Sample t-Test: Example

  • Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.

  • Remember the R syntax:

t.test(continuous_variable ~ grouping_variable,
       data = dataset_name,
       mu = hypothesized_difference,
       alternative = alternative)
  • What is the continuous variable?

  • What is the grouping variable?

  • What is the dataset name?

  • What is the hypothesized difference?

  • What is the alternative?

Two-Sample t-Test: Example

  • Consider the penguin data. Is there a significant difference in weight (body_mass_g) between male and female penguins? Test at the \alpha=0.05 level.

  • Remember the R syntax:

t.test(body_mass_g ~ sex,
       data = penguins,
       mu = 0,
       alternative = "two")

    Welch Two Sample t-test

data:  body_mass_g by sex
t = -8.5545, df = 323.9, p-value = 4.794e-16
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
 -840.5783 -526.2453
sample estimates:
mean in group female   mean in group male 
            3862.273             4545.685 
  • Is this a significant difference?

Two-Sample t-Test: Example

Hypotheses

  • H_0: \mu_1-\mu_2 = 0
  • H_1: \mu_1-\mu_2 \ne 0

Test Statistic and p-Value

  • t_0 = -8.55
  • p < 0.001

Rejection Region

  • Reject H_0 if p < \alpha; \alpha=0.05.

Conclusion/Interpretation

  • Reject H_0.

  • There is sufficient evidence to suggest that male and female penguins have different weights.

Hypothesis Testing: Practical vs. Statistical Significance

  • Hypothesis testing depends on sample size.
  • As the sample size increases, our p-values decrease necessarily.
  • As p-values decrease, we are more likely to reject the null hypothesis.
  • We must ask ourselves if the value we are testing against makes practical sense.
    • A new weight loss medication where the average amount of weight loss was 1 lb over 6 months.
    • A new weight loss medication where the average amount of weight lost was 15 lb over 6 months.
    • A new teaching method that raised final exam scores by 2 points.
    • A new teaching method that raised final exam scores by 15 points.

Wrap Up

  • Today we reviewed statistical inference.

    • Confidence intervals
    • Hypothesis testing
  • Get to know you quiz - complete with RStudio - due today.

  • Next meeting: how to conceptualize research questions; dependent t-test.