5
$\begingroup$

I have always wondered about the math behind Bayes Theorem because it looks really simple and seems like there's probably a simple explanation behind it. I don't understand the relationship between P(A and B) over P(B) and why this means "The probability of A given B."

  • 0
    @user1123950 I mean no offense, but there is a good youtube video titled "Bayes' Theorem - Explained Like You're Five" and the lady explaining it did a good job. (Hope this helps)2012-03-29

3 Answers 3

8

Here is a very informal justification, incomplete but I hope useful.

Suppose that we interview $1200$ people, of whom $700$ are women. Suppose that $500$ of the women use transit to get to get to work. Choose one of the people interviewed at random. Let $B$ be the event "the person chosen is a woman" and let $A$ be the event "the person chosen uses transit to get to work." So $P(A|B)$ is the probability that the person chosen uses transit, given that the person is a woman.

Let's solve the problem directly. There are $700$ women in the sample, of whom $500$ use transit to get to work. So given the information that the chosen person is a woman, we are looking only at the part of the sample space that consists of women. Thus effectively the sample space has been restricted to the $700$ women, and therefore $P(A|B)=\frac{500}{700}$.

The formula $P(A|B)=\frac{P(A \cap B)}{P(B)}$ gives precisely the same answer. This is because $P(A\cap B)=\frac{500}{1200}$, and $P(B)=\frac{700}{1200}$. When we divide the $1200$'s "cancel."

  • 0
    @MichaelHardy Even though the title of the question mentions Bayes, there is nothing in the question itself that involves Bayes's work or Bayesian ideas. Rather, the question asked seems to be "What is the motivation for the definition of conditional probability of $A$ given $B$ as $P(AB)/P(B)$?", and that is the question that André and I responded to.2012-03-29
3

Probability theory is a mathematical model for explaining the statistical regularity observed in real life, and its axioms and definitions are chosen so as to mirror this regularity. Suppose an experiment is repeated $N$ times (independent trials). Then, the observed relative frequencies are a probability measure, i.e. satisfy the axioms of probability. Thus, we set $P(A) = \frac{N_A}{N}$ if $A$ is observed to have occurred on $N_A$ trials out of the $N$ trials. Now consider events $A$, $B$, and $A\cap B = AB$ from whose observed relative frequencies we write $\begin{align*} P(A) &= \frac{N_A}{N},\\ P(B) &= \frac{N_B}{N},\\ P(AB) &= \frac{N_{AB}}{N}. \end{align*}$ Given that the event $B$ has occurred, what should we define as the conditional probability $P(A\mid B)$ of $A$ given $B$? If we confine our attention to the $N_B$ trials on which $B$ occurred, what is the relative frequency of $A$ on these $N_B$ trials? Clearly, $A$ must have occurred on $N_{AB}$ of these $N_B$ trials since any trial on which both $A$ and $B$ have occurred must be a trial on which $B$ has occurred. So, a reasonable assignment of probability values is $P(A\mid B) = \frac{N_{AB}}{N_B} = \frac{\frac{N_{AB}}{N}}{\frac{N_B}{N}} = \frac{P(AB)}{P(B)}.$ The formal definition of conditional probability as the rightmost expression above is motivated by this development. Probabilities need not and should not be defined in terms of relative frequencies, but the behavior of relative frequencies is the real-life situation that probability theory seeks to model. If it weren't so, probability theory would be mathematics at its purest, a small part of measure theory with little relevance to real-world problems.

1

Joe Blitzstein at Harvard gives a perfect answer to this question in Lecture 4 of stats 110, including a derivation. The few min are totally worth watching.

https://youtu.be/P7NE4WF8j-Q?list=PLLVplP8OIVc8EktkrD3Q8td0GmId7DjW0