9
$\begingroup$

We know that conditional probability $P(A | B)$ is undefined when $P(B) = 0$. But this doesn't seem to be true to me always.

Consider the probability of chosing a real number between $r$ such that $0 \leq r \lt 1$ where any real number in $[0, 1)$ is equally likely to be chosen.

Thus, the sample space $S = [0, 1)$. Let $A = \{0.1\}$ and $B = \{0.1, 0.2\}$. Therefore, $P(A) = P(B) = 0$.

Now, speaking from intuition, if we are given that $r$ is either $0.1$ or $0.2$, one might conclude that the probability that $r = 0.1$ is $0.5$. Therefore, it seems like $P(A | B) = 0.5$. But this contradicts the fact that since $P(B) = 0$, $P(A|B)$ is undefined.

Where am I making a mistake?

  • 4
    There is no **contradiction**. You would like to assign the value $1/2$ to the conditional probability, and the machinery does not assign a value. It certainly does not assign a value that contradicts your intuition.2012-09-17
  • 0
    Intuitively, that $P$ (or any other mathematical being) is undefined in some circumstances does not have to convey much mathematical meaning - it's more like you impose some assumptions so that your machinery works, but perhaps you throw away more than needs be.2012-09-17
  • 0
    For instance, with standard Riemann integral, one might begin by defining it only for continuous functions on a finite, closed interval. Then there are plenty of functions whose integral is "undefined". But many extensions of the integral exist, and some "undefined" integrals may start to make sense and have well defined values.2012-09-17
  • 1
    And one more thing: how would you distinguish between "your" uniform distribution and "the distribution just like the uniform one, except $.1$ is twice as likely as $.2$"? ;)2012-09-17
  • 0
    Finally, you might be interested to learn that there is something like "conditioning on 0-probability events", but you have to do it globally at one step to have any mathematical meaning (like: rather than "glue" $0.1$ and $0.2$, "glue" all pairs with $x+y = 0.3$). The Wikipedia's page gives some idea: http://en.wikipedia.org/wiki/Conditional_expectation Sadly, the page on conditional expectation does not cover it. Also, beware Borel-Kołmogorow: http://en.wikipedia.org/wiki/Borel%E2%80%93Kolmogorov_paradox2012-09-17
  • 0
    The question you're asking is "what is the probability that $A$ happened knowing that something impossible (or extremely improbable) $B$ happened". Well the intuitive answer is: the impossible can't happen! And if it does, then all hell breaks loose and we can't expect probability theory to help us make predictions.2012-09-17
  • 0
    In your example the probability that X takes on some value in [0,1) is 1. Intuitively that means it could be 0.1. But the probability measure you defined gives X a uniform density and assigns the probability [b-a] to an interval [a,b] where 0<=a2012-09-17

2 Answers 2

2

As you know, when $\mathrm P(B)\ne0$, $\mathrm P(A\mid B)=\mathrm P(A\cap B)/\mathrm P(B)$. The trouble in your context is that $\mathrm P(B)=0$, and even, as you say, that $\mathrm P(A)=0$. A way to define nevertheless some quantity $\mathrm P^*(A\mid B)$ akin to $\mathrm P(A\mid B)$ is to replace $A$ and $B$ by some sets $A_t$ and $B_t$ whose probabilities are positive for every positive $t$ and such that, in a sense, $A_t\to A$ and $B_t\to B$ when $t\to0$. Then one could compute $\mathrm P(A_t\mid B_t)$ for every positive $t$ in the usual way and see if this quantity has a limit when $t\to0$. If so, the limit could be chosen as $\mathrm P^*(A\mid B)$.

In the case $A=\{a\}$ and $B=\{a,b\}$ with $a$ and $b$ in $(0,1)$, one can consider $A_t=A+[-t,t]$ and $B_t=B+[-t,t]$, that is, $A_t=[a-t,a+t]$ and $B_t=[a-t,a+t]\cup[b-t,b+t]$.

Assume that $t$ is small enough. Then, $[a-t,a+t]\subset[0,1]$ hence $\mathrm P(A_t)=2t$, and $[a-t,a+t]\cup[b-t,b+t]\subset[0,1]$ with $[a-t,a+t]\cap[b-t,b+t]=\varnothing$, hence $\mathrm P(B_t)=4t$. Thus $\mathrm P(A_t\mid B_t)=\frac12$ for every $t$ small enough, which suggests indeed that $\mathrm P^*(A\mid B)=\frac12$ is a reasonable choice.

Note that this procedure is relatively robust since $[-t,t]$ could be replaced by any neighbourhood of $0$ shrinking to $\{0\}$ when $t\to0$, for example $[-2t,5t+t^4]$, without changing the final result.

More generally, assume that $A=\{a_1\}$ and $B=\{a_1,a_2,\ldots,a_n\}\subset(0,1)$, that $\mathrm P$ has density $f$ and that $f$ is continuous at $a_k$ for every $1\leqslant k\leqslant n$. One sees that the reasoning above suggests to choose $\mathrm P^*(A\mid B)=\frac{f(a_1)}{f(a_1)+\cdots+f(a_n)}$. In effect, this is equivalent to replacing the nonexistent conditional probability $\mathrm P(\ \mid B)$ by the discrete probability measure $\mathrm P^*(\ \mid B)=\mu_B$ defined by $\mu_B(\{a_k\})=\frac{f(a_k)}{f(a_1)+\cdots+f(a_n)}$ for every $k$ and $\mu_B(C)=0$ if $B\cap C=\varnothing$.

0

First of all, the conditional probability $$ P_B(A) = P(A | B) $$ is supposed to be a probability. It is very easy, if $B$ is a finite set with $P(B) = 0$, to simply IMPOSE that the conditional probability $P_B$ is uniformly distributed. Since your original $P$ has uniform distribution in some sense (what sense?).

But, what if $B = \mathbb{Q} \cap S$? What would be $P_B$? Notice that $B = \{b_1, b_2, \dotsc\}$ is countable. So, if it is uniformly distributed, $$ P_B(b_j) = P_B(b_1) \quad\text{ and }\quad P_B(B) = \sum_j P_B(b_j). $$ This implies that either $P(S | B) = 0$ or $P(S | B) = \infty$. Which is NOT a probability at all.

If all you want is that $P(A \cap B) = P(A | B) P(B)$, then you can define $P(A | B)$ the way you think suits best --- or simply leave it undefined --- whenever $P(B) = 0$.

However, imagine that $S = [0,1] \times [0,1]$, and consider the Lebesgue measure $\lambda \times \lambda$, where $\lambda$ is the Lebesgue measure over $[0,1]$. Then, it could make sense to define conditional probabilities for the "slice" $B_x = \{x\} \times [0,1]$, where $P(A | B_x)$ would be the length of $A \cap B$. The same way you just imposed "uniform distribution" on the set $\{0.1, 0.2\}$, I am imposing "uniform distribution" on $B_x$. One nice thing here, is that $$ P(A) = \int P(A | B_x) dx. $$