STA6349: Applied Bayesian Analysis
Welcome to Applied Bayesian Analysis - Fall 2024!
Days we definitely will not meet on Zoom:
General topics:
This is an applied class.
Lecture weeks:
Project weeks:
Final Exam:
Our course lectures and labs are posted on GitHub.
Please bookmark the repository: GitHub for STA6349.
You will want to look at my .qmd files for formatting / \LaTeX purposes.
Feel free to poke around my GitHub to see materials for other classes.
We will be using R in this course.
It is okay if you have not used R before!
Full disclosure: I am a biostatistician first, programmer second.
This means that I focus on the application of statistical methods and not on “understanding” the innerworkings of R.
Sometimes my code is not elegant/efficient, and that’s okay! Because our focus is on the application of methods, we are interested in the code working.
I have learned so much from my students since implementing R in the classroom.
This is an applied class.
You can install R and RStudio on your computer for free.
Alternative to installing: RStudio Server hosted by UWF HMCSE
Do not use Citrix.
I encourage you to install R on your own machine if you are able.
In the “real world,” you will not have access to the server.
Installing on your own machine will help your future self troubleshoot issues.
Journal article: Tidy Data by Wickham (2014, Journal of Statistical Software)
Book chapter: Data Tidying by Wickham, Çetinkaya-Rundel, and Grolemund
There are three interrelated rules that make a dataset tidy:
tibble
for modern data frames.
readr
and haven
for data import.
readr
is pulled in with tidyverse
haven
needs to be called in on its owntidyr
for data tidying.
dplyr
for data manipulation.
ggplot2
for data visualization.
It is not possible for me to teach you everything you will ever need to know about programming in R.
tidyverse
: data science in a boxA major advantage of using tidyverse
is the common “language” between the functions.
Another advantage: the pipe operator, %>%
.
Yes, there is a pipe operator now included in base R. No, I do not use it.
By default, %>%
deposits everything that came before into the first argument of the next function.
Error in tibble(starwars) %>% filter(mass < 100): could not find function "%>%"
tidyverse
.Be comfortable with Googling for help with code to import data.
As a collaborative statistician, I have received the following file types:
There have been times where I have received data as a .xlsx, but I can’t get it to import properly.
Usually, the issue is that there is a character variable with too much text.
Sometimes, it’s that the variable type changes mid-dataset.
Sometimes the solution is saving it as a different file type (I default to .csv).
Get comfortable Googling error messages.
Try not to do any data management within the original file type!
We want to be able to retrace our steps.
Reproducible research!
Functions:
select()
: Selecting columns.filter()
: Filtering the observations.mutate()
: Adding or transforming columns.summarise()
: Summarizing data.group_by()
: Grouping data for summary operations.%>%
: Pipelines.select()
: Selecting columns.filter()
: Filtering rows.mutate()
: Adding or transforming columns.summarise()
: Summarizing data.group_by()
: Grouping data for summary operations.Today we have gently introduced data management in R.
I do not expect you to become an expert R programmer, but the more you practice, the easier it becomes.
Today’s activity: Quiz 0