pluto <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W6_pluto.csv")
pluto %>% head()In the previous weeks, we have built up what we understand about data visualization.
In this lecture, we will focus on visualizing Poisson and negative binomial regression models.
\ln(\hat{y}) = \hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k
The approach we will take will match the approach taken for gamma:
exp()).Pluto spends his days at Mickey’s park chasing squirrels and interacting with guests. Disney researchers are interested in understanding what factors influence the number of squirrels Pluto chases per hour.
For 300 observation periods, researchers recorded:
\ln\left( y \right) = 1.32 + 0.02 \text{ temp} + 0.02 \text{ crowd}
We want to construct a graph for this model. We will have:
Now that the \ln() is present in the model, we need to be careful when creating predicted values.
\hat{y} = e^{\hat{\beta}_0 + \hat{\beta}_1x_1 + \beta_2 x_2 + \cdot \cdot \cdot + \hat{\beta}_k x_k}
\ln\left( y \right) = 1.32 + 0.02 \text{ temp} + 0.02 \text{ crowd}
Coding our graph as we have before,
geom_point())geom_line())pluto %>% ggplot(aes(x = crowd)) +
geom_point(aes(y = squirrels_chased), alpha = 0.5) +
geom_line(aes(y = squirrels_50, color = "50°F"), linewidth = 1.5) +
geom_line(aes(y = squirrels_75, color = "75°F"), linewidth = 1.5) +
geom_line(aes(y = squirrels_100, color = "100°F"), linewidth = 1.5) +
scale_color_manual(values = c("50°F" = "#00BA38",
"75°F" = "#619CFF",
"100°F" = "#F8766D"),
breaks = c("50°F", "75°F", "100°F")) +
labs(x = "Guests per 100 people (Crowd Level)",
y = "Number of Squirrels Chased",
color = "Temperature") +
theme_bw()Let’s now tackle the second example from the Poisson regression lecture.
We then stratified the analysis to “get rid” of the three way interaction.
When Mickey is present:
We then stratified the analysis to “get rid” of the three way interaction.
When Mickey is not present:
Let’s stop and think about our graph.
We want to graph the predicted number of squirrels chased on the y-axis and temperature on the x-axis.
We should have time of day define the lines.
We should have two graphs, one for when Mickey is present and one for when Mickey is not present.
So, this will create variables for
Mickey present: morning, afternoon, and evening
Mickey not present: morning, afternoon, and evening
| Term | Mickey Present | Mickey Not Present |
|---|---|---|
| Intercept | 2.37 | 3.09 |
| Temperature | 0.02 | 0.009 |
| Time of Day (Evening) | -0.41 | -1.44 |
| Time of Day (Morning) | 0.41 | -1.69 |
| Temp x Evening | -0.001 | 0.01 |
| Temp x Morning | - 0.01 | 0.02 |
pluto <- pluto %>%
mutate(squirrels_morning_mickey = exp(2.37 + 0.02*temperature - 0.41*0 + 0.41*1 - 0.001*temperature*0 - 0.01*temperature*1),
squirrels_afternoon_mickey = exp(2.37 + 0.02*temperature - 0.41*0 + 0.41*0 - 0.001*temperature*0 - 0.01*temperature*0),
squirrels_evening_mickey = exp(2.37 + 0.02*temperature - 0.41*1 + 0.41*0 - 0.001*temperature*1 - 0.01*temperature*0))pluto <- pluto %>%
mutate(squirrels_morning_no_mickey = exp(3.09 + 0.009009*temperature - 1.44*0 - 1.69*1 + 0.01*temperature*0 + 0.02*temperature*1),
squirrels_afternoon_no_mickey = exp(3.09 + 0.009*temperature - 1.44*0 - 1.69*0 + 0.01*temperature*0 + 0.02*temperature*0),
squirrels_evening_no_mickey = exp(3.09 + 0.009*temperature - 1.44*1 - 1.69*0 + 0.01*temperature*1 + 0.02*temperature*0))Coding our graph as we have before,
geom_point())geom_line())pluto %>% ggplot(aes(x = temperature)) +
geom_point(aes(y = squirrels_chased), alpha = 0.5) +
geom_line(aes(y = squirrels_morning_mickey, color = "Morning"), linewidth = 1.5) +
geom_line(aes(y = squirrels_afternoon_mickey, color = "Afternoon"), linewidth = 1.5) +
geom_line(aes(y = squirrels_evening_mickey, color = "Evening"), linewidth = 1.5) +
scale_color_manual(values = c("Morning" = "#00BA38",
"Afternoon" = "#619CFF",
"Evening" = "#F8766D"),
breaks = c("Morning", "Afternoon", "Evening")) +
labs(x = "Temperature (°F)",
y = "Number of Squirrels Chased",
color = "Time of Day") +
theme_bw() pluto %>% ggplot(aes(x = temperature)) +
geom_point(aes(y = squirrels_chased), alpha = 0.5) +
geom_line(aes(y = squirrels_morning_no_mickey, color = "Morning"), linewidth = 1.5) +
geom_line(aes(y = squirrels_afternoon_no_mickey, color = "Afternoon"), linewidth = 1.5) +
geom_line(aes(y = squirrels_evening_no_mickey, color = "Evening"), linewidth = 1.5) +
scale_color_manual(values = c("Morning" = "#00BA38",
"Afternoon" = "#619CFF",
"Evening" = "#F8766D"),
breaks = c("Morning", "Afternoon", "Evening")) +
labs(x = "Temperature (°F)",
y = "Number of Squirrels Chased",
color = "Time of Day") +
theme_bw() In this lecture, we learned how to visualize Poisson and negative binomimal regression models.
This lecture completes the new material component of this course!
Always remember that visualization is a key component of data analysis!
We use data visualization to check model assumptions, interpret results, and communicate findings.
When constructing graphs, always think about your audience and the story you want to tell with your data.
Now, we will focus on putting everything together with a final assignment.
Your main focus for the assignment will be to build appropriate models and visualize them effectively.
Your main focus for the (non-computational) final exam will be choosing the appropriate modeling strategy, interpreting model output, and performing statistical inference.