Beta Regression

Introduction

  • Recall the general linear model,

y = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k + \varepsilon

  • Until this week, we focused on applying the normal distribution to our model.

  • What happens when we have a response variable that is continuous but bounded between 0 and 1?

    • In this lecture, we will introduce beta regression.

Beta Regression

  • Beta regression is used specifically for continuous response variables that fall between 0 and 1.

    • These are bounded proportions (or rates) that do not include the endpoints.
    • Examples:
      • Market share
      • Pass rate for courses
      • Proportion of time a unit is operational

Beta Regression

  • Why shouldn’t we use the normal distribution here?

    • When we apply the normal distribution, we assume that the outcome (response) can take any value. i.e., y \in (-\infty, \infty)

    • However, in this case, our response variable is bounded between 0 and 1. i.e.,y \in (0, 1)

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 1 and beta = 1

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 5 and beta = 5

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 10 and beta = 10

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 10 and beta = 20

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 0.5 and beta = 0.5

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 2 and beta = 1

Beta Distribution

  • The beta distribution is a continuous probability distribution defined on the interval (0, 1).

  • It is characterized by two shape parameters, \alpha and \beta, which determine the distribution’s shape.

Plot of Beta distribution with alpha = 1 and beta = 2

Data Management

  • Note that the support for the beta distribution is (0, 1).

    • Exactly 0 and exactly 1 and not included in the support.
  • What does that mean for us?

    • If we have exact 0s or 1s in our response variable, we need to transform them slightly to fall within (0, 1).
  • One common transformation is from Smithson & Verkuilen (2006), y_{\text{new}} = \frac{y(n-1)+0.5}{n}

Data Management

y_{\text{new}} = \frac{y(n-1)+0.5}{n}

  • Let’s see how this works across various sample sizes,
n 0 0.25 0.5 0.75 1
10 0.05 0.275 0.5 0.725 0.95
100 0.005 0.2525 0.5 0.7475 0.995
1000 0.0005 0.25025 0.5 0.74975 0.9995
10000 0.00005 0.250025 0.5 0.749975 0.99995

Beta Regression

  • When we assume that y \sim \text{Beta}(\alpha, \beta), our model becomes

\text{logit}(\mu) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k

  • In linear regression, we assume y \sim N(\mu, \sigma^2),

\mu = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k + \varepsilon

Beta Regression

  • When we assume that y \sim \text{Beta}(\alpha, \beta), our model becomes

\text{logit}(\mu) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k

  • Wait… \text{logit()}?

    • The \text{logit()} function is a link function. It maps \mu from (0, 1) \to (-\infty, \infty).
  • The \text{logit()} function is defined as

\text{logit}(\mu) = \log\left(\frac{\mu}{1 - \mu}\right)

Beta Regression (R)

  • In R, we can fit a beta regression model using the betareg package, then run the resulting model through tidy() (from the broom package).

    • We still specify models & examine results the same way!
m <- betareg(y ~ x1 + ... + xk,
             data = dataset_name,
             link = "logit")
m %>% tidy(conf.int = TRUE)
  • From last slide: the \text{logit()} function is a link function. It maps \mu from (0, 1) \to (-\infty, \infty).

Lecture Example Set Up

  • Minnie and Daisy have been hired as “Efficiency Leads” for three different park operations teams, and they want to understand what affects how well a shift goes.

  • Minnie or Daisy records information about their work shift:

    • Task completion rate (0 to 1): the proportion of assigned tasks that were completed by the end of the shift
    • Character: who was leading the shift (Minnie or Daisy)
    • Location: which operations team they worked with (Main Street Operations, Toontown Crew, or Epcot Showcase)
    • Task load: the number of tasks assigned during the shift
    • Shift hours: how long the shift lasted (in hours)

Lecture Example Set Up

  • Pulling in the data,
operations <- read_csv("https://raw.githubusercontent.com/samanthaseals/SDSII/refs/heads/main/files/data/lectures/W4_daisy.csv")
operations %>% head()

Beta Regression: Example

  • Let’s check the outcome of our data,
operations %>% ggplot(aes(x = completion_rate)) +
  geom_histogram(binwidth = 0.05, color = "black", fill = "lightblue") +
  labs(x = "Completion Rate", y = "Count") +
  theme_bw()
Histogram of completion rate

Beta Regression: Example

  • Let’s further examine the outcome,
operations %>% summarize(min = min(completion_rate),
                         median = median(completion_rate),
                         max = max(completion_rate))
  • We need to transform our data,
operations <- operations %>%
  mutate(completion_adj = (completion_rate * (nrow(operations) - 1) + 0.5) / nrow(operations))

Beta Regression: Example

  • Let’s now model completion rate as a function of character (character), location (location), task load (task_load), shift hours (shift_hours), and the interaction between character and task load.
m1 <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load,
             data = operations,
             link = "logit")
m1 %>% tidy()

Beta Regression: Example

  • We can now state the resulting model,
m1 %>% coefficients
                   (Intercept)                characterMinnie 
                    2.37702571                     0.82984905 
                     task_load                    shift_hours 
                   -0.03904442                    -0.08308716 
locationMain_Street_Operations          locationToontown_Crew 
                    0.01773734                    -0.30528088 
     characterMinnie:task_load                          (phi) 
                   -0.04533085                     6.82672202 

\begin{align*} \text{logit}(\mu) = \ & 2.38 + 0.83 \text{ Minnie} - 0.04 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} - \\ & 0.05 \text{ Minnie} \times \text{task} \end{align*}

Interpreting the Model

  • Recall the general linear model when y \sim N(\mu, \sigma^2), \mu = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k + \varepsilon,

    • \beta_i represents the change in the mean outcome for a one-unit increase in x_i, holding all other predictors constant.
  • But when y \sim \text{Beta}(\alpha, \beta), our model becomes \text{logit}(\mu) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k,

    • \beta_i represents the change in the log-odds of the mean outcome for a one-unit increase in x_i, holding all other predictors constant.

Interpreting the Model

  • Recall the \text{logit()} function, \text{logit}(\mu) = \ln\left(\frac{\mu}{1 - \mu}\right)

    • This means that we are not modeling the outcome directly. Instead, we are modeling the log-odds of the mean outcome.
  • To interpret the coefficients on the original scale of the outcome, we can exponentiate them to get the odds ratio (OR) of the mean.

\text{OR} = e^{\beta_i}

Interpreting the Model

  • To interpret the coefficients on the original scale of the outcome, we can exponentiate them to get the odds ratio (OR) of the mean. \text{OR} = e^{\beta_i}

    • When OR > 1, there is an increase in the odds of the mean outcome for a one-unit increase in x_i.
    • When OR < 1, there is a decrease in the odds of the mean outcome for a one-unit increase in x_i.
    • When OR = 1, there is no change in the odds of the mean outcome for a one-unit increase in x_i.

Interpreting the Model

  • To interpret the coefficients on the original scale of the outcome, we can exponentiate them to get the odds ratio (OR) of the mean. \text{OR} = e^{\beta_i}

  • Example interpretations:

    • For a one-unit increase in x_2, the odds of the mean rate are multiplied by 0.90. This is a 10% decrease.

Interpreting the Model: Example

  • Let’s examine the model results again,

\begin{align*} \text{logit}(\mu) = \ & 2.38 + 0.83 \text{ Minnie} - 0.04 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} - \\ & 0.05 \text{ Minnie} \times \text{task} \end{align*}

  • Let’s first stratify by character to simplfy the model.

Interpreting the Model: Example

  • For Minnie,

\begin{align*} \text{logit}(\mu) = \ & 3.21 - 0.09 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} \end{align*}

  • For Daisy,

\begin{align*} \text{logit}(\mu) = \ & 2.38 - 0.04 \text{ task load} -0.08 \text{ shift hours } + \\ & 0.02 \text{ Main Street} - 0.31 \text{ Toontown} \end{align*}

Intepreting the Model: Example

  • Now that we have stratified models, we can find the corresponding odds ratios.

    • We can do this one-by-one, but this is a quick way to do it for all coefficients,
# task load, shift hours, Main Street, Toontown
exp(c(-0.09, -0.08, 0.02, -0.31)) # Minnie
[1] 0.9139312 0.9231163 1.0202013 0.7334470
exp(c(-0.04, -0.08, 0.02, -0.31)) # Daisy
[1] 0.9607894 0.9231163 1.0202013 0.7334470
  • Note that the OR for shift hours and location is the same between the two models. That is because we did not include an interaction between character and location.

    • As the number of shift hours increases by 1 hour, the odds of the mean completion rate are multiplied by 0.92 – this is a decrease of approximately 8%.

    • As compared to Epcot Showcase, the odds of the mean completion rate for Main Street Operations are multiplied by 1.02 – this is an increase of approximately 2%.

    • As compared to Epcot Showcase, the odds of the mean completion rate for Toontown Crew are multiplied by 0.73 – this is a decrease of approximately 27%.

Intepreting the Model: Example

  • Now that we have stratified models, we can find the corresponding odds ratios.
# task load, shift hours, Main Street, Toontown
exp(c(-0.09, -0.08, 0.02, -0.31)) # Minnie
[1] 0.9139312 0.9231163 1.0202013 0.7334470
exp(c(-0.04, -0.08, 0.02, -0.31)) # Daisy
[1] 0.9607894 0.9231163 1.0202013 0.7334470
  • Now, the OR for task load differs between the two characters due to the interaction.

    • For Minnie, as the task load increases by 1 unit, the odds of the mean completion rate are multiplied by 0.91 – this is a decrease of approximately 9%.

    • For Daisy, as the task load increases by 1 unit, the odds of the mean completion rate are multiplied by 0.96 – this is a decrease of approximately 4%.

  • Another statement we can make is that as additional tasks are assigned during the shift, Minnie’s odds of the mean completion rate decrease faster than Daisy’s.

Statistical Inference (R)

  • Our approach to determining significance remains the same.

    • For continuous and binary predictors, we can use the Wald test (z-test from tidy()).

    • For omnibus tests, we can use the likelihood ratio test (LRT).

      • Significance of regression line.
      • Significance of categorical predictors with c \ge 3 levels.
      • Interaction terms with categorical predictors with c \ge 3 levels.
  • With beta regression, we must use the lrtest() from the lmtest package.

full <- betareg(y ~ x_1 + ... + x_k,
                data = dataset_name,
                link = "logit")
reduced <- betareg(y ~ subset of x_1 + ... + x_k,
                   data = dataset_name,
                   link = "logit")
lrtest(reduced, full)

Statistical Inference: Example

  • Let’s first determine if this is a significant regression line,
full <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load,
                data = operations,
                link = "logit")
reduced <- betareg(completion_adj ~ 1,
                   data = operations,
                   link = "logit")
lrtest(reduced, full)
  • Yes, this is a significant regression line (p < 0.001). At least one of the slopes is non-zero.

Statistical Inference: Example

  • What can we see with the tidy() results?
m1 %>% tidy() %>% select(term, estimate, p.value)
  • The interaction between character and task load is significant (p < 0.001).

    • We cannot discuss the main effects of character or task load.
  • Shift length is significant (p < 0.001).

Statistical Inference: Example

  • What can we see with the tidy() results?
m1 %>% tidy() %>% select(term, estimate, p.value)
  • What about location?

Statistical Inference: Example

  • Let’s look at the omnibus test for location.

    • Note that location is not involved in an interaction, so we can “look” at it and discuss it while ignoring the interaction term.
full <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load,
                data = operations,
                link = "logit")
reduced <- betareg(completion_adj ~ character + task_load + shift_hours + character:task_load,
                   data = operations,
                   link = "logit")
lrtest(reduced, full)
  • Yes, location is a significant predictor of completion rate (p = 0.008).

Statistical Inference: Example

  • Going back to our tidy() results,
m1 %>% tidy() %>% select(term, estimate, p.value) %>% filter(term %in% c("locationMain_Street_Operations", "locationToontown_Crew"))
  • The reference group is Epcot’s World Showcase.

    • There is not a difference in the mean completion rate between Epcot’s World Showcase and Main Street (p = 0.867).

    • There is a difference in the mean completion rate between Epcot’s World Showcase and Toontown (p = 0.010).

Modeling the Precision?

  • Note that beta regression is unique in that it allows us to simultaneously model the mean and the precision (variance).

  • Let’s return to our example.

    • We modeled the mean as a function of character, task load, shift hours, location, and the interaction between character and task load.

    • Let’s now additionally model the precision as a function of character, location, and the interaction between the two.

m2 <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load | 
                character + location + character:location,
             data = operations,
             link = "logit")

Modeling the Precision?

  • Looking at the model results,
m2 %>% tidy()

Modeling the Precision?

  • What happens if we request the omnibus tests?
m1 %>% car::Anova(type = 3)
Warning in Anova.default(., type = 3): there are coefficients in coef(mod)
that are not in the model matrix:
(phi)
tests may be incorrect
Warning in Anova.default(., type = 3): there are rows/columns in vcov.
that are not in the model matrix:
(phi)
tests may be incorrect

Examining the Precision

  • When categorical predictors with c \ge 3, we must use the lrtest() from the lmtest package.
m2_full <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load | 
                     character + location + character:location,
                   data = operations,
                   link = "logit")

m2_reduced <- betareg(completion_adj ~ character + task_load + shift_hours + location + character:task_load | 
                     character + location,
                   data = operations,
                   link = "logit")

lrtest(m2_reduced, m2_full)
  • We see that in the precision model, the interaction term between character and location is significant (p < 0.001).

Wrap Up

  • This week we are turning our attention to continuous responses that are not suitable for the normal distribution.

  • Beta regression is used for continuous response variables that fall between 0 and 1.

  • The interpretations are now updated to reflect the use of the logit link function.

    • Remember that we are not reporting on the mean directly! We must intepret the odds ratio.
  • Next lecture: visualizing beta regression.

  • Next week: logistic regression for categorical outcomes.