3
$\begingroup$

I found an intuitive explanation for Bayes' Theorem but I do not understand point 3:

enter image description here

With $P(A|B) = \frac{P(A \cap B)}{P(B)}$ I see that P(B) is a scaling factor for partitions of B to be able to sum up to 1. But I can't seem to wrap my head around the meaning of the ratio in Bayes' Theorem.

  • 1
    Here is one simple observation: If $P(A \mid B) > P(A)$, and we observe that $A$ is true, this provides some evidence that $B$ is true. For example, $P(\text{Bob carrying umbrella} \mid \text{it's raining}) > P(\text{Bob carrying umbrella})$. So if we observe that Bob is carrying an umbrella, this gives us some evidence that it's raining. It makes sense, then, that Bayes' formula tells us that $P(B \mid A) > P(B)$ in this case.2017-01-02
  • 0
    Totally makes sense, if $P(A|B) > P(A)$ then it must be that point 3 is greater than 1. Hence it is increasing the probability of our prior assumption. I am still a bit shakey on the understanding of point 3 as a ratio though, is there a deeper meaning behind how much it can scale? Perhaps I am overthinking this.2017-01-02

4 Answers 4

2

I can give you a similar interpretation. Write it as $$\Pr(B|A)=\frac{\Pr(A|B)\Pr(B)}{\Pr(A)}=\frac{\Pr(B\cap A)}{\Pr(A)}$$

First notice the difference between the two events $B$ and $B|A$. The former is an event in the sample space $\Omega$ whose probability is one. We can write $\Pr(B)$ as $$\frac{\Pr(B\cap\Omega)}{\Pr(\Omega)}=\frac{\Pr(B)}{\Pr(\Omega)}=\frac{\Pr(B)}{1}$$

However, $B|A$ means that for event $B$ the sample space is reduced from $\Omega$ to $A$, hence we have $\Pr(A)$ in denomerator instead of $\Pr(\Omega)$. Similarly, in the numerator we have $\Pr(B\cap A)$ since we only consider $B$ in a subset of the sample space that intersects with $A$.

1

What is $P(A|B)$ intuitively? One way to think about probabilities is as proportions. For example, $P(A)$ is the proportion of the event $A$ happening out of every possible outcome, or if you prefer a more geometric intuition, imagine that $A$ is some shape on a piece of paper, $P(A)$ signifies the proportion of the total area that $A$ covers of the piece of paper.

What then does the conditional probability $P(A|B)$ correspond to? If we had two shapes $A$ and $B$ on our piece of paper, the $P(A|B)$ corresponds to the proportion that $A$ covers of the shape $B$. In a way we are restricting our space to the shape $B$ and asking what the proportion of $A$ is out of the total space. In terms of probability, we are restricting all the possible event to those in which the event $B$ happens and asking what proportion does $A$ happen out of all possible events.

The ratio $\frac{P(A|B)}{P(A)}$ tells us how the proportion of $A$ changes after we restrict the total space to $B$. The interesting thing about this ratio is that it is symmetric in $A$ and $B$, which is one way to interpret Baye's Theorem. That is, given neither $P(A)$ or $P(B)$ are zero, $$\frac{P(A|B)}{P(A)}=\frac{P(B|A)}{P(B)}.$$

In terms of the formula above, How would one figure out $P(B|A)$? Well this is the proportion of $B$ in $A$. Trivially, we have that $$P(B|A)=\frac{P(B|A)}{P(B)}P(B). $$ So this is the proportion of $B$ in the total space times the ratio of how much it changes upon restriction of $A$, but Bayes theorem tell us that this ratio is exactly the same as $\frac{P(A|B)}{P(A)}$, which gives the usual form of Bayes'.

  • 0
    Appreciate the expansion on the the intuition!2017-01-02
1

Well, you have a grasp of the intuition behind the following:$$\begin{align}\mathsf P(A\cap B)~&=~\mathsf P(B\mid A)~\mathsf P(A) \\[1ex] &=~\mathsf P(A\mid B)~\mathsf P(B)\end{align}$$

The measure of the intersection of A and B, equals the measure of event A times the proportion of event B within event A, and also equals the measure of event B times the proportion of event A within event B.

So Bayes' Theorem is just a rearrangement: $$\raise{1.5ex}{\mathsf P(B\mid A)} ~=~ \dfrac{\mathsf P(A\mid B)~\mathsf P(B)}{\qquad\qquad\mathsf P(A)}$$

The proportion of event B within event A is equal to the proportion of event A within event B times the measure of event B divided by the measure of event A.

0

When I watch Peter Falk play Columbo on TV, I think of Bayes' Theorem.

Suppose you are a detective and your focus in on the most likely suspect, Mr. Brooks. As coincidences pile up, not even hard evidence, you 'know' he is your guy.

Let $B$ represent that Mr. Brooks is guilty of murder, the cause for finding a dead body. Suppose as you investigate some new information comes to light, event $A$ and it seems relevant. As a detective, you know right away that

$P(B|A)=\lambda_A P(B) \le 1$ and $\lambda_A \ge 0$

You estimate $P(A|B)$ and from your experience in the world you have a good idea about $P(A)$.

If these two numbers are equal, then $A$ is not providing any useful information, it is status quo experience. As a detective, you know about Bayes' Formula, but due to your coarse estimates, you are on the alert just for the really strange stuff.

if $P(A|B) \gt P(A)$ by a big magnitude, then the odds that Mr. Brooks is guilty really climbs.

For example, you know if you ask certain questions 90% of the time you will hear the same type of answer from your suspects. Suppose you ask Mr. Brooks and he has a very strange response. You let $A$ denote strange answer event. As usual you don't worry too much about $P(A|B)$, but figure that a guilty person would have that strange response at least 30% of the time.

The chances that Mr. Brooks is guilty just went up 3 fold. If $P(B|A)$ is now greater than $1$ don't worry. All that remains is gathering evidence that will hold up in Court.

Detective Columbo: Uhh, Sir! Just one more thing - one more question.

  • 0
    This might be of interest: Bayes' Theorem for Intelligence Analysis https://www.cia.gov/library/center-for-the-study-of-intelligence/kent-csi/vol16no2/html/v16i2a03p_0001.htm CIA Paper2017-06-05