In this course, we will review formulas, but we will use R for computational purposes.
You can install R and RStudio if you wish; both are free.
We also have access to the Posit Workbench (“the server”) through HMCSE.
I know that this is probably the first time you are seeing R (or any sort of programming).
<!-- comment here --># comment hereI expect students to try their best. This includes:
You must know how to answer questions using R.
You will not be expected to write code beyond what is shown in class.
When grading, I am looking for competency.
mean(), sd()library(tidyverse), I can use summarize(mean(), sd())library(tidyverse) because of %>% (pipe operator).library(tidyverse) is a collection of R packages designed for data science.
library(tidyverse) packages we will use:
library(readr): read in data fileslibrary(dplyr): manipulate and summarize datalibrary(ggplot2): create data visualizationslibrary(tidyverse) website: https://www.tidyverse.org/library(ssstats) is the package I have developed for this course.
tidyverse friendly (ready for %>%).mean_median() from library(ssstats) to summarize continuous variables.
group_by() from library(tidyverse) to split the summaries by categories.mean_median() to summarize the MLP dataset.mean_median() to summarize the MLP dataset.mean_median() to summarize the MLP dataset by pony type.mean_median() to summarize the MLP dataset by pony type.# A tibble: 12 × 4
type variable mean_sd median_iqr
<chr> <chr> <chr> <chr>
1 Alicorn friendship 7.8 (1.3) 8.0 (2.0)
2 Earth friendship 7.6 (1.6) 8.0 (2.0)
3 Pegasus friendship 7.6 (1.6) 8.0 (2.0)
4 Unicorn friendship 7.5 (1.6) 8.0 (2.0)
5 Alicorn magical_energy 9.0 (8.0) 6.2 (10.9)
6 Earth magical_energy NaN (NA) NA (NA)
7 Pegasus magical_energy NaN (NA) NA (NA)
8 Unicorn magical_energy 9.9 (9.6) 7.0 (11.1)
9 Alicorn tail_shimmer 280.1 (64.5) 297.0 (104.0)
10 Earth tail_shimmer 252.2 (65.2) 246.0 (100.0)
11 Pegasus tail_shimmer 263.5 (67.0) 265.0 (110.0)
12 Unicorn tail_shimmer 261.2 (65.0) 260.0 (100.0)
We will use n_pct() from library(ssstats) to summarize categorical variables.
For one variable – this returns n_i \ (\%_i):
n_pct() to summarize the MLP dataset.n_pct() to summarize the MLP dataset.n_pct() to summarize the MLP dataset.n_pct() to summarize the MLP dataset.# A tibble: 4 × 5
friendship Alicorn Earth Pegasus Unicorn
<dbl> <chr> <chr> <chr> <chr>
1 1 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (0.2%)
2 2 0 (0.0%) 5 (0.3%) 1 (0.2%) 1 (0.2%)
3 3 0 (0.0%) 14 (0.8%) 6 (1.2%) 13 (2.0%)
4 4 2 (4.9%) 54 (3.2%) 14 (2.9%) 16 (2.4%)
ggplot()We will construct data visualizations using library(ggplot2), which loads in when we load library(tidyverse).
This package allows us to create a layered visualization.
ggplot() creates the base layer.geom_X() creates the individual pieces.
geom_point() creates a scatterplot.geom_line() creates connected lines.geom_bar() creates a bar chart.geom_histogram() creates a histogram.ggplot()We use ggplot() because it is very flexible - it allows us to customize every part of the graph.
The R Graphics Cookbook is a great place to get basic code for graphs.
Remember that I do not expect you to memorize code. I do not have the code memorized.
ggplot() Layerggplot() creates the initial layer the graph lasagna.ggplot() Layeraes() in ggplot().STA4173 - Biostatistics - Fall 2025