STA4173: Biostatistics
Spring 2025
Continuous Variables
A continuous variable is a variable that can has an infinite set of possible values.
Between any two possible values, there are an infinite number of possible values.
These typically arise from measurement. (Height, weight, etc.)
Discrete Variables
A discrete variable is a variable that can only take on a finite set of possible values.
The possible values can usually be listed.
These typically arise from categorizing (work vs. home) or counting.
Ratio Variables
A ratio variable is a variable that has a meaningful zero point, allowing comparisons of magnitude.
True zero point indicates the absence of the quantity being measured.
All arithmetic operations (addition, subtraction, multiplication, division) are meaningful.
Interval Variables
An interval variable has an arbitrary zero point and differences between values are meaningful.
The zero point does not indicate a true absence.
A 1 unit difference always represents the same amount.
Ordinal Variables
An ordinal variable has a meaningful order of responses; the exact differences between responses are not necessarily equal.
We understand which value is “greater” or “less,” but not by how much.
Arithmetic is not meaningful.
Nominal Variables
A nominal variable has is no intrinsic order among the categories.
Categories are used merely as labels or names.
No arithmetic or ordering operations are meaningful.
Sample Mean
The sample mean provides a single number that can represent a “typical” or central value in your data.
\bar{x} = \frac{\sum_{i=1}^n x_i}{n}
Sample Median
The sample median is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger.
Sample Variance
The sample variance measures how “widely spread” the data points are around the mean.
s^2 = \frac{\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^n x_i)^2}{n}}{n-1}
When we have a mound-shaped and symmetric distribution, most observations will fall within 2 standard deviations of the mean.
Variance results in units2, which typically does not make sense.
Sample Standard Deviation
The sample standard deviation also measures how “widely spread” the data points are around the mean.
s = \sqrt{s^2}
Standard deviation is the square root of the variance, measuring spread in the original units of the data.
R syntax:
Sample Interquartile Range
The sample interquartile range measures the spread of the middle 50% of data.
\text{IQR} = P_{75}-P_{25}
When should we use the mean vs. the median to describe the center of the distribution?
… How do we know the shape of the distribution?
We will explore histograms.
R
code)ggplot2
package for graphing.
ggplot()
.Today we reviewed estimation.
Next week, we will review statistical inference.
Get to know you quiz - complete with RStudio.
Join the Discord server!