Bayesian Modeling for Random Variables

STA6349: Applied Bayesian Analysis

Example

  • In 1996, Gary Kasparov played a six-game chess match against the IBM supercomputer Deep Blue.

    • Of the six games, Kasparov won three, drew two, and lost one.

    • Thus, Kasparov won the overall match.

  • Kasparov and Deep Blue were to meet again for a six-game match in 1997.

  • Let \pi denote Kasparov’s chances of winning any particular game in the re-match.

    • Thus, \pi is a measure of his overall skill relative to Deep Blue.

    • Given the complexity of chess, machines, and humans, \pi is unknown and can vary over time.

  • i.e., \pi is a random variable.

Example

  • Our first step is to start with a prior model. This model

    • Identifies what values \pi can take,

    • assigns a prior weight or probability to each, and

    • these probabilities sum to 1.

  • Based on what we were told, the prior model for \pi in our example,

\pi 0.2 0.5 0.8 Total
f(\pi) 0.10 0.25 0.65 1
  • Note that this is an incredibly simple model.

    • The win probability can technically be any number \in [0, 1].

    • However, this prior assumes that \pi has a discrete set of possibilities: 20%, 50%, or 80%.

Binomial Data Model

  • In the second step of our analysis, we collect and process data which can inform our understanding of \pi.

  • Here, Y = the number of the six games in the 1997 re-match that Kasparov wins.

    • As chess match outcome isn’t predetermined, Y is a random variable that can take any value in \{1, 2, 3, 4, 5, 6\}.
  • Note that Y inherently depends upon \pi.

    • If \pi = 0.80, Y would also be high (on average).

    • If \pi = 0.20, Y would also be low (on average).

  • Thus, we must model this dependence of Y on \pi using a conditional probability model.

Binomial Data Model

  • We must make two assumptions about the chess match:

    • Games are independent (the outcome of one game does not influence the outcome of another).

    • Kasparov has an equal probability of winning any game in the match.

      • i.e., probability of winning does not increase or decrease as the match goes on.
  • We will use a binomial model for this problem.

    • Recall the binomial pmf,

f(y|\pi) = {n \choose y} \pi^y (1-\pi)^{n-y},

  • So in our case,

Y|\pi \sim \text{Bin}(6, \pi)

Binomial Data Model

  • Assuming \pi = 0.8, the probability that he would win all 6 games,

f(y=6|\pi=0.8) = {6 \choose 6} 0.8^6 (1-0.8)^{6-6},

dbinom(6, 6, 0.8)
[1] 0.262144
  • The probability that he would win all six games is approximately 26%.

Binomial Data Model

  • Assuming \pi = 0.8, the probability that he would win none of the games,

f(y=0|\pi=0.8) = {6 \choose 0} 0.8^0 (1-0.8)^{6-0},

dbinom(0, 6, 0.8)
[1] 6.4e-05
  • The probability that he would lose all six games is approximately 0%.

Binomial Data Model

Binomial Data Model

  • Note that the Binomial gives us the theoretical model of the data we might observe.

    • In the end, Kasparov only won one of the six games against Deep Blue in 1997 (Y=1).
  • Next step: how compatible this particular data is with the various possible \pi?

    • What is the likelihood of Kasparov winning Y=1 game under each possible \pi?
  • Recall, f(y|\pi) = L(\pi|Y=y). When Y=1,

\begin{align*} L(\pi | y = 1) &= f(y=1|\pi) \\ &= {6 \choose 1} \pi^1 (1-\pi)^6-1 \\ &= 6\pi(1-\pi)^5 \end{align*}

  • Note that we do not expect all likelihoods to sum to 1.

Binomial Data Model

  • Your turn!

  • Use your results from earlier to tell me the resulting likelihood values.

\pi 0.2 0.5 0.8
L(\pi|y=1)

Binomial Data Model

  • Your turn!

  • Use your results from earlier to tell me the resulting likelihood values.

\pi 0.2 0.5 0.8
L(\pi|y=1) 0.3932 0.0938 0.0015
  • As we can see, the likelihoods do not sum to 1.

Normalizing Constant

  • Bayes’ Rule requires three pieces of information:

    • Prior
    • Likelihood
    • Normalizing constant
  • Normalizing constant: a value that ensures that the sum of all probabilities is equal to 1.

    • It can be a scalar or a function.

    • Every probability distribution that does not sum to 1 will ahve a normalizing constant.

Normalizing Constant

  • We now must determine the total probability that Kasparov would win Y=1 games across all possible win probabilities \pi, f(y=1).

\begin{align*} f(y=1) &= \sum_{\pi \in \{0.2, 0.5, 0.8 \}} L(\pi |y=1)f(\pi) \\ &= L(\pi=0.2|y=1)f(\pi=0.2) + L(\pi=0.5|y=1)f(\pi=0.5) + L(\pi=0.8|y=1)f(\pi=0.8) \\ &= \ ... \end{align*}

  • Work with your group to find the normalizing constant.

Normalizing Constant

  • We now must determine the total probability that Kasparov would win Y=1 games across all possible win probabilities \pi, f(y=1).

\begin{align*} f(y=1) &= \sum_{\pi \in \{0.2, 0.5, 0.8 \}} L(\pi |y=1)f(\pi) \\ &= L(\pi=0.2|y=1)f(\pi=0.2) + L(\pi=0.5|y=1)f(\pi=0.5) + L(\pi=0.8|y=1)f(\pi=0.8) \\ &\approx 0.3932 \cdot 0.10 + 0.0938 \cdot 0.25 + 0.0015 \cdot 0.65 \\ &\approx 0.0637 \end{align*}

  • Across all possible values of \pi, there is about a 6% chance that Kasparov would have won only one game.

Posterior Probability Model

  • Now recall,

\text{posterior} = \frac{\text{prior} \times \text{likelihood}}{\text{normalizing constant}}

  • In our example, where y = 1,

f(\pi | y=1) = \frac{f(\pi) L(\pi | y = 1)}{f(y=1)} \ \text{for} \ \pi \in \{ 0.2, 0.5, 0.8\}

  • Work with your group to find the posterior probabilities.

    • You will have one posterior probability for each value of \pi.

Posterior Probability Model

  • Note!! We do not have to calculate the normalizing constant!

  • We can note that f(Y=y) = 1/c.

  • Then, we say that

\begin{align*} f(\pi | y) &= \frac{f(\pi) L(\pi|y)}{f(y)} \\ & \propto f(\pi) L(\pi|y) \\ \\ \text{posterior} &\propto \text{prior} \cdot \text{likelihood} \end{align*}

Wrap Up

  • Today we have gone through how to find posterior probabilities using the binomial distribution.

  • Next week, we will learn about the Beta-Binomial model.