Recall the general linear model, y = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k
\beta_0 is the y-intercept, or the average outcome (y) when all x_i = 0.
\beta_i is the slope for predictor i and describes the relationship between the predictor and the outcome, after adjusting (or accounting) for the other predictors in the model.
In the last lecture, we used a linear model to explore the relationships between clubhouse happiness and laughs, grumbles, and time spent with friends.
On a busy day at the clubhouse, Mickey Mouse wants to understand what drives “happiness” at the end of the day. For each day, he records (in the clubhouse dataset):
m1 <- glm(clubhouse_happiness ~ time_with_friends,
family = "gaussian",
data = clubhouse)
m1 %>% tidy()\hat{\text{happiness}} = 57.55 + 3.49 \text{ time}
We can visualize this simple linear regression model with a scatterplot and regression line.
To create the regression line, we need to create predicted values from our model.
library(ggplot2)ggplot() function initializes a ggplot object.library(ggplot2)library(ggplot2)ggplot().library(ggplot2)library(ggplot2)We must add geom_TYPE() layers to actually see anything on the plot.
geoms to the plot using + operator.library(ggplot2)library(ggplot2)geom_TYPE()s,library(ggplot2)library(ggplot2)Ooops! That geom_line() didn’t work as expected.
ggplot(), we set y to be the actual happiness values (clubhouse_happiness).library(ggplot2)library(ggplot2)Now, we can work on “prettying” up our plot.
theme_NAME().library(ggplot2)library(ggplot2)Now, we can work on “prettying” up our plot.
library(ggplot2)library(ggplot2)Now, we can work on “prettying” up our plot.
clubhouse %>% ggplot(aes(x = time_with_friends,
y = clubhouse_happiness)) +
geom_point() +
geom_line(aes(y = predicted_happiness_k1)) +
labs(x = "Time Spent with Friends (minutes)",
y = "Clubhouse Happiness",
title = "Predicted relationship between happiness and time spent with friends") +
theme_bw()library(ggplot2)library(ggplot2)Now, we can work on “prettying” up our plot.
aes(), we can specify colors, line types, point shapes, etc.library(ggplot2)m2 <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs,
family = "gaussian",
data = clubhouse)
m2 %>% tidy()\hat{\text{happiness}} = 39.25 + 3.06 \text{ time} + 0.66 \text{ laughs}
Now that there’s an additional predictor, we can’t easily visualize the model with a simple 2D scatterplot.
Instead, we will visualize the relationship between y (clubhouse happiness) and x_1 (one predictor) while holding x_2 (the other predictor) constant.
In our example,
We will visualize the relationship between clubhouse happiness and time spent with friends.
Time spent with friends will be on the x-axis and allowed to vary.
We will hold goofy laughs constant at some value.
median() when drafting initial graphs for collaborators.library(ggplot2)library(ggplot2)For our third example, let’s return to our full model.
We looked at clubhouse happiness (clubhouse_happiness) as a function of time spent with friends (time_with_friends), big, goofy laughs (goofy_laughs), and how much Donald grumbles (donald_grumbles).
m3 <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs + donald_grumbles,
family = "gaussian",
data = clubhouse)
m3 %>% tidy()\hat{\text{happiness}} = 47.58 + 3.58 \text{ time} + 0.66 \text{ laughs} - 1.06 \text{ grumbles}
In this example, we have k=3 predictors.
Instead, we will visualize the relationship between y (clubhouse happiness) and x_1 (one predictor) while holding all other x_i (the other predictors) constant.
In our example,
We will visualize the relationship between clubhouse happiness and time spent with friends.
Time spent with friends will be on the x-axis and allowed to vary.
We will hold goofy laughs constant at some value.
We will also hold Donald grumbles constant at some value.
library(ggplot2)library(ggplot2)For our fourth example, let’s again return to our full model.
We looked at clubhouse happiness (clubhouse_happiness) as a function of time spent with friends (time_with_friends), big, goofy laughs (goofy_laughs), and how much Donald grumbles (donald_grumbles).
m4 <- glm(clubhouse_happiness ~ time_with_friends + goofy_laughs + donald_grumbles,
family = "gaussian",
data = clubhouse)
m4 %>% tidy()\hat{\text{happiness}} = 47.58 + 3.58 \text{ time} + 0.66 \text{ laughs} - 1.06 \text{ grumbles}
Let’s now consider the relationship between clubhouse happiness and Donald’s grumbles.
library(ggplot2)library(ggplot2)In this lecture, we explored how to visualize simple and multiple linear regression models using the ggplot2 library.
For simple linear regression, we visualized the relationship between the outcome and predictor using a scatterplot and regression line.
For multiple linear regression, we visualized the relationship between the outcome and one predictor while holding the other predictors constant.
Every week, we will review model visualization.
Next lecture: Model Assumptions