Assignment 3: Bayesian Regression Analysis
1. You are a Data Analyst on the Card Operations Analytics Team at the Equestrian Credit Union (ECU). Your team supports internal business partners by identifying performance drivers and operational bottlenecks across member service workflows. Recently, senior management requested an analysis of case processing time for member disputes handled by two internal departments: Operations and Fraud. The goal is to better understand how analyst experience and workload influence resolution time and to identify potential areas for process improvement.
Consider the data here: click for Google Sheet
- ProcessingTime – time to resolution (minutes)
- Team – team of employment; either Fraud or Operations
- Experience – length of employment (years)
- Workload – number of open cases
1a. Import the data.
1b. Construct the appropriate model to answer the following question: What factors explain variation in average case processing time, and does the impact of workload depend on analyst experience?
Remember to state the resulting model.
1c. Construct the appropriate table to display results. Recall that you should display \hat{\beta}_i \ (95\% \text{ CI for } \beta_i) or \exp\{\hat{\beta}_i\} \ (95\% \text{ CI for } \exp\{\beta_i\}).
Insert your table here.
1d. Let’s investigate the relationships observed. Create predicted values for those on the Fraud team with {0, 5, 10, 15} years of experience.
1e. Let’s investigate the relationships observed. Create predicted values for those on the Fraud team with {1, 5, 10, 20} open cases.
1f. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between processing time and number of cases:
- Scatterplot: x = Workload, y = ProcessingTime
- Regression lines for the fraud team: multiple
geom_line()s callingaes(y = predicted_column).
1g. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between processing time and experience:
- Scatterplot: x = Experience, y = ProcessingTime
- Regression lines for the fraud team: multiple
geom_line()s callingaes(y = predicted_column).
1h. Is there evidence of an interaction effect? Why or why not? Use your model output as well as graphs above to answer this question.
Insert your answer here.
1i. Write a brief summary of your model results. How would you explain the results to your boss?
Insert your answer here.
2. You are now a Data Analyst on the Member Insights & Retention Team at ECU. Your department collaborates with the Marketing and Contact Center Operations teams to identify members at risk of closing their accounts or discontinuing services. Member retention has become a top strategic priority following recent internal reports showing increased attrition among certain demographics.
Consider the data here: click for Google Sheet
- ClosedAccount – whether the member closed their account within the last 12 months (1 = yes, 0 = no)
- Age – member age (years)
- DigitalEngagement – average number of monthly logins to ECU digital platforms (mobile app, website)
- AccountType – checking, savings, or credit card
2a. Import the data.
2b. Construct the appropriate model to answer the following question: What member characteristics are associated with a higher likelihood of account closure, and does the risk of closing an account based on digital engagement vary by account type?
Remember to state the resulting model.
2c. Construct the appropriate table to display results. Recall that you should display \hat{\beta}_i \ (95\% \text{ CI for } \beta_i) or \exp\{\hat{\beta}_i\} \ (95\% \text{ CI for } \exp\{\beta_i\}).
Insert your table here.
2d. Let’s investigate the relationships observed. Create predicted values for members aged {18, 30, 45, 60} with checkings accounts.
2e. Let’s investigate the relationships observed. Create predicted values for members aged {18, 30, 45, 60} with savings accounts.
2f. Let’s investigate the relationships observed. Create predicted values for members aged {18, 30, 45, 60} with credit card accounts.
2g. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = DigitalEngagement, y = ClosedAccount
- Regression lines for the those with checkings accounts: multiple
geom_line()s callingaes(y = predicted_column).
2h. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = DigitalEngagement, y = ClosedAccount
- Regression lines for the those with savings accounts: multiple
geom_line()s callingaes(y = predicted_column).
2i. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = DigitalEngagement, y = ClosedAccount
- Regression lines for the those with credit card accounts: multiple
geom_line()s callingaes(y = predicted_column).
2j. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = DigitalEngagement, y = ClosedAccount
- Regression lines for 18 year olds: multiple
geom_line()s callingaes(y = predicted_column).
2k. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = DigitalEngagement, y = ClosedAccount
- Regression lines for 45 year olds: multiple
geom_line()s callingaes(y = predicted_column).
2l. Is there evidence of an interaction effect? Why or why not? Use your model output as well as graphs above to answer this question.
Insert your answer here.
2m. Write a brief summary of your model results. How would you explain the results to your boss?
Insert your answer here.
3. You are now a Data Analyst on the Fraud Monitoring & Risk Analytics team at ECU. The team tracks operational risk and partner performance across card portfolios. Senior leadership wants to understand what drives the weekly number of fraud alerts escalated to manual review so they can allocate staff and guide fraud-control investments.
Consider the data here: click for Google Sheet
- Alerts – number of fraud alerts needing manual review that week
- Portfolio – type of portfolio (checking only, savings only, mixed)
- AvgTicket – average transaction amount that week (bits)
- TxnVolume – total number of transactions processed that week
3a. Import the data.
3b. Construct the appropriate model to answer the following questions: What factors explain variation in the count of weekly fraud alerts, after accounting for weekly transaction volume? In particular, what are the differences in portfolio type?
Remember to state the resulting model.
3c. Construct the appropriate table to display results. Recall that you should display \hat{\beta}_i \ (95\% \text{ CI for } \beta_i) or \exp\{\hat{\beta}_i\} \ (95\% \text{ CI for } \exp\{\beta_i\}).
Insert your table here.
3d. Let’s investigate the relationships observed. Create predicted values for weeks with 40,000 transactions.
3e. Let’s investigate the relationships observed. Construct the following graph, showing the relationship between account closure and digital engagement:
- Scatterplot: x = AvgTicket, y = Alerts
- Regression lines for the those with savings accounts: multiple
geom_line()s callingaes(y = predicted_column).
3f. Write a brief summary of your model results. How would you explain the results to your boss?
Insert your answer here.