[1] 0.262144
STA6349: Applied Bayesian Analysis
In 1996, Gary Kasparov played a six-game chess match against the IBM supercomputer Deep Blue.
Of the six games, Kasparov won three, drew two, and lost one.
Thus, Kasparov won the overall match.
Kasparov and Deep Blue were to meet again for a six-game match in 1997.
Let \pi denote Kasparov’s chances of winning any particular game in the re-match.
Thus, \pi is a measure of his overall skill relative to Deep Blue.
Given the complexity of chess, machines, and humans, \pi is unknown and can vary over time.
i.e., \pi is a random variable.
Our first step is to start with a prior model. This model
Identifies what values \pi can take,
assigns a prior weight or probability to each, and
these probabilities sum to 1.
Based on what we were told, the prior model for \pi in our example,
\pi | 0.2 | 0.5 | 0.8 | Total |
---|---|---|---|---|
f(\pi) | 0.10 | 0.25 | 0.65 | 1 |
Note that this is an incredibly simple model.
The win probability can technically be any number \in [0, 1].
However, this prior assumes that \pi has a discrete set of possibilities: 20%, 50%, or 80%.
In the second step of our analysis, we collect and process data which can inform our understanding of \pi.
Here, Y = the number of the six games in the 1997 re-match that Kasparov wins.
Note that Y inherently depends upon \pi.
If \pi = 0.80, Y would also be high (on average).
If \pi = 0.20, Y would also be low (on average).
Thus, we must model this dependence of Y on \pi using a conditional probability model.
We must make two assumptions about the chess match:
Games are independent (the outcome of one game does not influence the outcome of another).
Kasparov has an equal probability of winning any game in the match.
We will use a binomial model for this problem.
f(y|\pi) = {n \choose y} \pi^y (1-\pi)^{n-y},
Y|\pi \sim \text{Bin}(6, \pi)
f(y=6|\pi=0.8) = {6 \choose 6} 0.8^6 (1-0.8)^{6-6},
f(y=0|\pi=0.8) = {6 \choose 0} 0.8^0 (1-0.8)^{6-0},
Your turn!
We want to reproduce Figure 2.5 from the Bayes Rules! textbook (from Section 2.3.2).
Work with your group to come up with that graph.
Pick one person to present in 15 minutes.
Note that the Binomial gives us the theoretical model of the data we might observe.
Next step: how compatible this particular data is with the various possible \pi?
Recall, f(y|\pi) = L(\pi|Y=y). When Y=1,
\begin{align*} L(\pi | y = 1) &= f(y=1|\pi) \\ &= {6 \choose 1} \pi^1 (1-\pi)^6-1 \\ &= 6\pi(1-\pi)^5 \end{align*}
Your turn!
Use your results from earlier to tell me the resulting likelihood values.
\pi | 0.2 | 0.5 | 0.8 |
---|---|---|---|
L(\pi|y=1) |
Your turn!
Use your results from earlier to tell me the resulting likelihood values.
\pi | 0.2 | 0.5 | 0.8 |
---|---|---|---|
L(\pi|y=1) | 0.3932 | 0.0938 | 0.0015 |
Bayes’ Rule requires three pieces of information:
Normalizing constant: a value that ensures that the sum of all probabilities is equal to 1.
It can be a scalar or a function.
Every probability distribution that does not sum to 1 will ahve a normalizing constant.
\begin{align*} f(y=1) &= \sum_{\pi \in \{0.2, 0.5, 0.8 \}} L(\pi |y=1)f(\pi) \\ &= L(\pi=0.2|y=1)f(\pi=0.2) + L(\pi=0.5|y=1)f(\pi=0.5) + L(\pi=0.8|y=1)f(\pi=0.8) \\ &= \ ... \end{align*}
\begin{align*} f(y=1) &= \sum_{\pi \in \{0.2, 0.5, 0.8 \}} L(\pi |y=1)f(\pi) \\ &= L(\pi=0.2|y=1)f(\pi=0.2) + L(\pi=0.5|y=1)f(\pi=0.5) + L(\pi=0.8|y=1)f(\pi=0.8) \\ &\approx 0.3932 \cdot 0.10 + 0.0938 \cdot 0.25 + 0.0015 \cdot 0.65 \\ &\approx 0.0637 \end{align*}
\text{posterior} = \frac{\text{prior} \times \text{likelihood}}{\text{normalizing constant}}
f(\pi | y=1) = \frac{f(\pi) L(\pi | y = 1)}{f(y=1)} \ \text{for} \ \pi \in \{ 0.2, 0.5, 0.8\}
Work with your group to find the posterior probabilities.
Note!! We do not have to calculate the normalizing constant!
We can note that f(Y=y) = 1/c.
Then, we say that
\begin{align*} f(\pi | y) &= \frac{f(\pi) L(\pi|y)}{f(y)} \\ & \propto f(\pi) L(\pi|y) \\ \\ \text{posterior} &\propto \text{prior} \cdot \text{likelihood} \end{align*}
Today we have gone through how to find posterior probabilities using the binomial distribution.
Next week, we will learn about the Beta-Binomial model.