Point estimate: The single value of a statistic that estimates the value of a parameter.
Confidence interval: A range of plausible values for the parameter based on values observed in the sample.
\text{point estimate} \pm \text{margin of error}
What is the point estimate of:
\mu
\sigma
\pi (or p)
\mu_1-\mu_2
\pi_1-\pi_2
Confidence Intervals
We have different intervals based on the level of confidence.
Level of confidence: The probability that the interval will capture the true parameter value in repeated samples. i.e., the success rate for the method.
Confidence Intervals
Because CIs are a range of values, we will use interval notation,
(lower bound, upper bound)
where
lower bound = point estimate – margin of error
upper bound = point estimate + margin of error
Make sure to state your confidence intervals in numeric order.
i.e., the lower bound must be the smaller number and the upper bound must be the larger number.
For the entered variable (continuous), we will see:
Point estimate for \mu
Point estimate for \sigma
Confidence interval for \mu at the specified level (confidence)
Confidence Intervals: One-Sample Mean
In the skies above Cloudsdale, Pegasus trainers believe that an average healthy Pegasus flaps its wings 50 flaps per minute when cruising. To see if today’s young Pegasi conform to that standard, a researcher samples 25 Pegasi at the Cloudsdale Training Grounds and measures each pony’s wing‐flap rate (in flaps/minute).
A sample of our dataset:
Confidence Intervals: One-Sample Mean
Let’s find a 95% confidence interval for wing-flap rates.
wing_flap %>%one_mean_CI(wing_flap_rate)
Thus, the 95% CI for \mu is (49.13, 55.46).
Statistical Inference: Confidence Intervals
We have learned that confidence intervals give a plausible range for an unknown population parameter at a chosen confidence level.
e.g., 95% CI for \mu; 99% CI for \pi
What if we want to directly answer questions?
e.g., is the average wing-flap rate still 50 flaps/min?
We can use confidence intervals to answer these questions!
We will compare the interval to the question.
Recall that the 95% CI for mean wing-flap rate was (49.13, 55.46). Has the standard rate of 50 flaps/min changed?
Statistical Inference: Hypothesis Testing
We can also answer research questions more formally using hypothesis testing.
All hypothesis tests have the same components:
Hypotheses
Test Statistic
p-Value
Rejection Region
Conclusion
Interpretation
This process uses probability to make a determination, rather than looking at the interval estimate.
Hypothesis Testing
Hypothesis testing has several key components.
Hypotheses:
Null hypothesis (H_0): A statement of “no different than expected.”
Alternative hypothesis (H_1 or H_{\text{A}}): What we are investigating; this represents a change, effect, or difference.
Test Statistic and p-Value:
Test statistic: A single number calculated from the sample, measuring how far the observed data are from what is expected under the null.
p-value: The probability of observing data as (or more) extreme than ours, assuming the null is true.
Hypothesis Testing
Hypothesis testing has several key components.
Rejection Region:
We will always use the same rejection region: p < \alpha.
Conclusion and Interpretation:
Conclusion: reject or fail to reject the null based on the calculated p-value and rejection region.
Interpretation: Give context to your results. Interpret in terms of the alternative hypothesis.
Hypothesis Testing
One sample tests:
Two-tailed test
H_0: parameter = some value
H_1: parameter \ne some value
Left-tailed test
H_0: parameter \ge some value
H_1: parameter < some value
Right-tailed test
H_0: parameter \le some value
H_1: parameter > some value
Hypothesis Testing
After stating our hypotheses, we will construct a test statistic.
The choice of test statistic depends on:
The hypotheses being tested.
Assumptions made about the data.
The value of the test statistic depends on the sample data.
If we were to draw a different sample, we would find a different value for the test statistic.
We will use the test statistic on our way to drawing conclusions about the hypotheses.
Hypothesis Testing
After constructing test statistics, we will find the corresponding p-value.
p-value: the probability of observing what we’ve observed or something more extreme, assuming the null hypothesis is true.
Finding a p-value depends on the distribution being used.
One-sample mean: t distribution.
One-sample proportion: z distribution.
We will compare the p-value to \alpha in order to draw conclusions.
Reject H_0 if p < \alpha.
Hypothesis Testing
Once we’ve found the p-value, we can draw a conclusion.
If p < \alpha, we rejectH_0.
There is sufficient evidence to suggest that H_1 is true.
If p \ge \alpha, we fail to rejectH_0.
There is not sufficient evidence to suggest that H_1 is true.
Hypothesis Testing
For all hypothesis tests,
Rejection Region: Reject H_0 if p < \alpha.
Conclusion: [Reject or fail to reject] H_0.
Interpretation: There [is or is not] sufficient evidence to suggest [alternative hypothesis in words].
We NEVER accept H_0.
Practical vs. Statistical Significance
Hypothesis testing depends on sample size.
As the sample size increases, our p-values decrease necessarily.
As p-values decrease, we are more likely to reject the null hypothesis.
This means that are we rejecting based on sample size and not the size of the effect!
We must ask ourselves if the value we are testing against makes practical sense.
A new weight loss medication where the average amount of weight loss was 1 lb over 6 months.
A new weight loss medication where the average amount of weight lost was 15 lb over 6 months.
A new teaching method that raised final exam scores by 2 points.
A new teaching method that raised final exam scores by 15 points.
One-sample t-test for the population mean:
Null: H0: μ = 50
Alternative: H1: μ ≠ 50
Test statistic: t(24) = 1.497
p-value: p = 0.147
Conclusion: Fail to reject the null hypothesis (p = 0.1473 ≥ α = 0.1)
Hypothesis Testing: One Sample Mean
Hypotheses:
H_0: \ \mu = 50
H_1: \ \mu \ne 50
Test Statistic and p-Value
t_0 = 1.497, p = 0.147
Rejection Region
Reject H_0 if p < \alpha; \alpha = 0.10
Conclusion and interpretation
Fail to reject H_0 (p \text{ vs } \alpha \to 0.147 > 0.10). There is not sufficient evidence to suggest that the average wing-flap rate has changed from the historical value of 50 flaps/min.
Independent Data
Independent data: Observations in one group (or sample) do not influence or relate to observations in another group.
Examples:
Comparing the cruising speeds of a random sample of Pegasi vs. a random sample of Unicorns flying a short course.
Measuring friendship lesson quiz scores for a group of Cutie Mark Crusaders vs. a group of Wonderbolts Cadets.
Examining the graduation rates between Unicorns and Pegasi.
Dependent Data
Dependent (paired) data: Each observation in the first sample is paired with exactly one observation in the second sample.
Examples:
Students’ magic‐proficiency scores before and after Princess Celestia’s advanced spell workshop.
Applejack’s apple‐yield (in bushels) from Sweet Apple Acres in Spring vs. Fall for the last 10 years.
Comparing the “Wonderbolts Tryouts” performance scores for Spitfire and Skyflare (twins).
Independent vs. Dependent Data
Are the following dependent or independent?
Rainbow Dash times two separate groups, Pegasi trainees and Unicorn cadets, on the same 200-meter aerial course.
Twilight Sparkle measures her own spell‐casting accuracy before and after attending Princess Celestia’s advanced magic workshop.
Applejack records bushel counts from Sweet Apple Acres in spring this year and compares them to bushel counts from Sugarcube’s orchard over the same period.
The Cutie Mark Crusaders each take a friendship-lesson quiz, and their scores are compared to a completely different group of ponies at the School of Friendship.
Fluttershy records the heart rates of the same group of critters before and after she plays soothing music for them.
Confidence Intervals: Two Independent Means
(1-\alpha)100\% confidence interval for \mu_1-\mu_2:
The Pegasus trainers insist that a healthy Pony munches through 25 apples per day to stay strong and energetic. Looking for differences between those that are above and below target wing-flap rates, a researcher visits the apple stands at Sweet Apple Acres and records the exact number of apples each of the Pegasi in training eats in a typical day.
Use the wing-flap data to estimate the difference in apple consumption (apples) betwen those that are above or below the target rate (target). Estimate using a 95% confidence interval.
The Pegasus trainers insist that a healthy Pony munches through 25 apples per day to stay strong and energetic. Looking for differences between those that are above and below target wing-flap rates, a researcher visits the apple stands at Sweet Apple Acres and records the exact number of apples each of the Pegasi in training eats in a typical day.
Use the wing-flap data to estimate the difference in apple consumption (apples) betwen those that are above or below the target rate (target). Estimate using a 95% confidence interval.
We will use the independent_mean_HT function from library(ssstats) to perform the necessary calculations for the hypothesis test.
Generic syntax:
dataset_name %>%independent_mean_HT(grouping = grouping_variable,continuous = continuous_variable, mu = hypothesized_value, alternative ="alternative_direction", alpha = specified_alpha)
For the entered variable (continuous), we will see:
Hypotheses (based on hypothesized_value and alternative)
Test statistic and p-value
Conclusion
Note! When looking at the grouping variable, R will subtract in alphabetic/numeric order.
Hypothesis Testing: Two Independent Means
Perform the appropriate hypothesis test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi. Test at the \alpha=0.05 level.
What is the direction of the test? How do you know?
What is the hypothesized value? How do you know?
What are the corresponding hypotheses?
Hypothesis Testing: Two Independent Means
Perform the appropriate hypothesis test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi. Test at the \alpha=0.05 level.
How should we change the following code?
dataset_name %>%independent_mean_HT(grouping = grouping_variable,continuous = continuous_variable, mu = hypothesized_value, alternative ="alternative_direction", alpha = specified_alpha)
Hypothesis Testing: Two Independent Means
Perform the appropriate hypothesis test to determine if the above target pegasi are eating 5 or more apples than the below target pegasi. Test at the \alpha=0.05 level.
Our updated code should look like:
wing_flap %>%independent_mean_HT(grouping = target,continuous = apples, mu =5, alternative ="greater", alpha =0.05)
Hypothesis Testing: Two Independent Means
Running the code,
wing_flap %>%independent_mean_HT(grouping = target,continuous = apples, mu =5, alternative ="greater", alpha =0.05)
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 5
Alternative: H₁: μ₁ − μ₂ > 5
Test statistic: t(23) = 5.445
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
Reject H_0 (p \text{ vs } \alpha \to p < 0.001 < 0.05). There is sufficient evidence to suggest that ponies above target, on average, eat 5 more apples than those below target.
Two Dependent Means: Summary Statistics
We are now interested in comparing two dependent groups.
We assume that the two groups come from the same population and are going to examine the difference,
d = y_{i, 1} - y_{i, 2}
After drawing samples, we have the following,
\bar{d} estimates \mu_d,
s^2_d estimates \sigma^2_d, and
n is the sample size.
Two Dependent Means: Summary Statistics (R)
We will use the dependent_mean_median function from library(ssstats) to find the summary statistics for this data.
Note that this will compute summary statistics for:
x_d = x_1-x_2
x_1
x_2
Two Dependent Means: Summary Statistics
Princess Celestia has invited two groups of flyers to take part in a brand-new “SkyStride” aerial training camp. Before the camp begins, each pony perches on a floating platform while a team of Wonderbolt engineers use magical sensors to record their baseline wing-flap rate (flaps per second) as they hover in place (pre_training_wfr).
Over two weeks, trainees attend identical flight drills: precision loops, cloud-weaving obstacle courses, and high-altitude sprints. At camp’s end, each flyer returns to the sensor platforms for post-training measurements (post_training_wfr).
Let’s find the summary statistics. How should this code be edited?
Princess Celestia has invited two groups of flyers to take part in a brand-new “SkyStride” aerial training camp. Before the camp begins, each pony perches on a floating platform while a team of Wonderbolt engineers use magical sensors to record their baseline wing-flap rate (flaps per second) as they hover in place (pre_training_wfr).
Over two weeks, trainees attend identical flight drills: precision loops, cloud-weaving obstacle courses, and high-altitude sprints. At camp’s end, each flyer returns to the sensor platforms for post-training measurements (post_training_wfr).
The point estimate for the mean difference is x̄ = -1.712.
The point estimate for the standard deviation of differences is s = 9.9701.
The 99% confidence interval for the mean difference μ_d is (-7.2891, 3.8651).
The 99% confidence interval for \mu_d is (-7.16, 5.33).
The point estimate for the mean difference is x̄ = 1.712.
The point estimate for the standard deviation of differences is s = 9.9701.
The 99% confidence interval for the mean difference μ_d is (-3.8651, 7.2891).
The 99% confidence interval for \mu_d is (-5.33, 7.16).
Confidence Intervals: Two Dependent Means
When looking at post - pre, the CI was (-7.16, 5.33).
When looking at pre - post, the CI was (-5.33, 7.16).
Paired t-test for the mean of differences:
Null: H₀: μ_d = 0
Alternative: H₁: μ_d ≠ 0
Test statistic: t(24) = -0.859
p-value: p = 0.399
Conclusion: Fail to reject the null hypothesis (p = 0.3991 ≥ α = 0.01)