gamma \to continuous & positive outcomes; skewed right
normal \to continuous outcomes; mound-shaped & symmetric distribution
Poisson \to count outcomes; skewed right
uniform \to each outcome has equal probability; rectangular distribution
Today: building up Bayesian analysis concepts
Thinking Like a Bayesian
Bayesian analysis involves updating beliefs based on observed data.
Thinking Like a Bayesian
This is the natural Bayesian knowledge-building process of:
acknowledging your preconceptions (prior distribution),
using data (data distribution) to update your knowledge (posterior distribution), and
repeating (posterior distribution \to new prior distribution)
Thinking Like a Bayesian
Bayesian and frequentist analyses share a common goal: to learn from data about the world around us.
Both Bayesian and frequentist analyses use data to fit models, make predictions, and evaluate hypotheses.
When working with the same data, they will typically produce a similar set of conclusions.
Statisticians typically identify as either a “Bayesian” or “frequentist” …
🚫 We are not going to “take sides.”
✅ We will see these as tools in our toolbox.
Thinking Like a Bayesian
Bayesian probability: the relative plausibility of an event.
Considers prior belief.
Thinking Like a Bayesian
Frequentist probability: the long-run relative frequency of a repeatable event.
Does not consider prior belief.
Thinking Like a Bayesian
The Bayesian framework depends upon prior information, data, and the balance between them.
The balance between the prior information and data is determined by the relative strength of each
When we have little data, our posterior can rely more on prior knowledge.
As we collect more data, the prior can lose its influence.
Thinking Like a Bayesian
We can also use this approach to combine analysis results.
Thinking Like a Bayesian
We will use an example to work through Bayesian logic.
The Collins Dictionary named “fake news” the 2017 term of the year.
Fake, misleading, and biased news has proliferated along with online news and social media platforms which allow users to post articles with little quality control.
We want to flag articles as “real” or “fake.”
We’ll examine a sample of 150 articles which were posted on Facebook and fact checked by five BuzzFeed journalists (Shu et al. 2017).
Thinking Like a Bayesian
Information about each article is stored in the fake_news dataset in the bayesrules package.
We could build a simple news filter which uses the following rule: since most articles are real, we should read and believe all articles.
While this filter would solve the problem of disregarding real articles, we would read lots of fake news.
It also only takes into account the overall rates of, not the typical features of, real and fake news.
Thinking Like a Bayesian
Suppose that the most recent article posted to a social media platform is titled: The president has a funny secret!
Some features of this title probably set off some red flags.
For example, the usage of an exclamation point might seem like an odd choice for a real news article.
In the dataset, what is the split of real and fake articles?
Thinking Like a Bayesian
In the dataset, what is the split of real and fake articles?
Our data backs up our instinct on the article,
In this dataset, 26.67% (16 of 60) of fake news titles but only 2.22% (2 of 90) of real news titles use an exclamation point.
Thinking Like a Bayesian
We now have two pieces of contradictory information.
Our prior information suggested that incoming articles are most likely real.
However, the exclamation point data is more consistent with fake news.
Thinking like Bayesians, we know that balancing both pieces of information is important in developing a posterior understanding of whether the article is fake.
Building a Bayesian Model
Our fake news analysis studies two variables:
an article’s fake vs real status and
its use of exclamation points.
We can represent the randomness in these variables using probability models.
We will now build:
a prior probability model for our prior understanding of whether the most recent article is fake;
a model for interpreting the exclamation point data; and, eventually,
a posterior probability model which summarizes the posterior plausibility that the article is fake.
Building a Bayesian Model
Let’s now formalize our prior understanding of whether the new article is fake.
Based on our fake_news data, we saw that 40% of articles are fake and 60% are real.
Before reading the new article, there’s a 0.4 prior probability that it’s fake and a 0.6 prior probability it’s not.
P\left[B\right] = 0.40 \text{ and } P\left[B\right] = 0.40
Remember that a valid probability model must:
account for all possible events (all articles must be fake or real);
it assigns prior probabilities to each event; and
the probabilities sum to one.
Building a Bayesian Model
title_has_excl
fake
real
FALSE
73.3% (44)
97.8% (88)
TRUE
26.7% (16)
2.2% (2)
We have the following conditional probabilities:
If an article is fake (B), then there’s a roughly 26.67% chance it uses exclamation points in the title.
If an article is real (B^c), then there’s only a roughly 2.22% chance it uses exclamation points.
Building a Bayesian Model
We have the following conditional probabilities:
If an article is fake (B), then there’s a roughly 26.67% chance it uses exclamation points in the title.
If an article is real (B^c), then there’s only a roughly 2.22% chance it uses exclamation points.
Looking at the probabilities, we can see that 26.67% of fake articles vs. 2.22% of real articles use exclamation points.
Exclamation point usage is much more likely among fake news than real news.
We have evidence that the article is fake.
Building a Bayesian Model
Note that we know that the incoming article used exclamation points (A), but we do not actually know if the article is fake (B or B^c).
In this case, we compared P[A|B] and P[A|B^c] to ascertain the relative likelihood of observing A under different scenarios.
L\left[B|A\right] = P\left[A|B\right] \text{ and } L\left[B^c|A\right] = P\left[A|B^c\right]
Building a Bayesian Model
Event
B
Bc
Total
Prior Probability
0.4
0.6
1.0
Likelihood
0.2667
0.0222
0.2889
L\left[B|A\right] = P\left[A|B\right] \text{ and } L\left[B^c|A\right] = P\left[A|B^c\right]
It is important for us to note that the likelihood function is not a probability function.
This is a framework to compare the relative comparability of our exclamation point data with B and B^c.
Building a Bayesian Model
Event
B (fake)
Bc (real)
Total
Prior Probability
0.4
0.6
1.0
Likelihood
0.2667
0.0222
0.2889
The prior evidence suggested the article is most likely real,
P[B] = 0.4 < P[B^c] = 0.6
The data, however, is more consistent with the article being fake,
L[B|A] = 0.2667 > L[B^c|A] = 0.0222
Building a Bayesian Model
We can summarize our probabilities in a table,
B
B^c
Total
A
A^c
Total
0.4
0.6
1
As found earlier, P[A|B] = 0.2667 and P[A|B^c]=0.0222.