Relationships between continuous variables.
Correlation:
Linear regression:
During a puzzle-heavy session in Ponyville, Twilight wants to understand what helps small teams solve logic puzzles faster. For each puzzle, she records:
Our task is to determine how MindPractice, SleepHours, FocusMinutes, and PuzzleComplexity relate to SolveTime.
\hat{\text{solve time}} = 37.54 - 4.12 \text{ practice} - 1.82 \text{ sleep} - 0.02 \text{ focus} + 3.10 \text{ complexity}
| Predictor | Estimate (95% CI) | p-value |
|---|---|---|
| Mind Practice | -4.12 (-5.02, -3.22) | <0.001 |
| Sleep Hours | -1.82 (-2.58, -1.06) | <0.001 |
| Focus Minutes | -0.02 (-0.21, 0.17) | 0.857 |
| Puzzle Complexity | 3.10 (2.63, 3.57) | <0.001 |
Likelihood Ratio Test for Significant Regression Line:
Null: H₀: β₁ = β₂ = ... = βₖ = 0
Alternative: H₁: At least one βᵢ ≠ 0
Test statistic: χ²(4) = 8661.511
p-value: p < 0.001
Conclusion: Reject the null hypothesis (p = < 0.001 < α = 0.05)
| Predictor | Estimate (95% CI) | p-value |
|---|---|---|
| Mind Practice | -4.12 (-5.02, -3.22) | <0.001 |
| Sleep Hours | -1.82 (-2.58, -1.06) | <0.001 |
| Focus Minutes | -0.02 (-0.21, 0.17) | 0.857 |
| Puzzle Complexity | 3.10 (2.63, 3.57) | <0.001 |
| Predictor | Estimate (95% CI) | p-value |
|---|---|---|
| Mind Practice | -4.12 (-5.02, -3.22) | <0.001 |
| Sleep Hours | -1.82 (-2.58, -1.06) | <0.001 |
| Focus Minutes | -0.02 (-0.21, 0.17) | 0.857 |
| Puzzle Complexity | 3.10 (2.63, 3.57) | <0.001 |
| Predictor | Estimate (95% CI) | p-value |
|---|---|---|
| Mind Practice | -4.12 (-5.02, -3.22) | <0.001 |
| Sleep Hours | -1.82 (-2.58, -1.06) | <0.001 |
| Focus Minutes | -0.02 (-0.21, 0.17) | 0.857 |
| Puzzle Complexity | 3.10 (2.63, 3.57) | <0.001 |
| Predictor | Estimate (95% CI) | p-value |
|---|---|---|
| Mind Practice | -4.12 (-5.02, -3.22) | <0.001 |
| Sleep Hours | -1.82 (-2.58, -1.06) | <0.001 |
| Focus Minutes | -0.02 (-0.21, 0.17) | 0.857 |
| Puzzle Complexity | 3.10 (2.63, 3.57) | <0.001 |
Let’s create a visualization of the model.
# A tibble: 5 × 3
variable mean_sd median_iqr
<chr> <chr> <chr>
1 FocusMinutes 10.6 (4.7) 10.7 (6.8)
2 MindPractice 2.9 (1.0) 2.8 (1.4)
3 PuzzleComplexity 6.0 (1.9) 5.9 (3.0)
4 SleepHours 7.1 (1.2) 7.1 (1.6)
5 SolveTime 30.9 (9.4) 31.1 (12.5)
\hat{\text{solve time}} = 37.54 - 4.12 \text{ practice} - 1.82 \text{ sleep} - 0.02 \text{ focus} + 3.10 \text{ complexity}
puzzles %>% ggplot(aes(x = SleepHours, y = SolveTime)) + # specify x and y
geom_point(size = 2.5, color = "gray50") + # plot points
geom_line(aes(y = predicted), linewidth = 1, color = "black") + # plot line
labs(x = "Sleep Duration (hours)", # edit x-axis label
y = "Time to Complete Puzzle (min)") + # edit y-axis label
theme_bw() # change theme of graphThis module covers the basics of linear regression.
As a reminder, this is just the beginning. There are many more advanced topics to explore, including:
STA4231 - Statistics for Data Science II dives deeper into regression topics.
STA4173 - Biostatistics - Fall 2025