4
$\begingroup$

This is related to the content in the book by Grimmett and Stirzaker "Probability and random processes" 3rd ed. On page 111, it calculated, as an example, the conditional density function of $X_1+X_2$ given $X_1=X_2$ for two i.i.d. exponential r.v. with parameter $\lambda$. They used two methods,

1) using $Y_1=X_1+X_2$ and $Y_2=X_1/X_2$ and then obtained the result $f_{Y_1\vert Y_3}(y_1\vert y_3)=\lambda^2 y_1 e^{-\lambda y_1}$ for $y_1\ge 0$

2) using $Y_1=X_1+X_2$ and $Y_3=X_1-X_2$ and then obtained the result $f_{Y_1\vert Y_3}(y_1\vert y_3)=\lambda e^{-\lambda y_1}$ for $y_1\ge 0$

The explanation provided there is that the two results are based on different sets of information.

My question is: Does this only occurs for 2 r.v.'s when conditioned on an event with zero probability, or it is in general true even for the condition involving only 1 r.v.? I am a little confused at the situation when the the conditional event has zero probability. Thanks!

2 Answers 2

3

The conclusion from this example is that sometimes we need to pay special attention when conditioning on events of zero probability. Suppose, as in the example, that $X_1$ and $X_2$ are i.i.d. exponentials. Fix a,b > 0 with $a < b$, and consider the question what is ${\rm P}\big[ X_1+X_2 \in [a,b] \big| X_1=X_2 \big]$? You may (naturally) wish to interpret this as $ {\rm P}\big[X_1 + X_2 \in [a,b]\big|X_1 - X_2 = 0 \big] = \mathop {\lim }\limits_{h \to 0^ + } {\rm P}\big[ X_1 + X_2 \in [a,b]\big|0 < X_1 - X_2 < h \big], $ so in this case $ {\rm P}\big [ X_1+X_2 \in [a,b] \big| X_1=X_2 \big ] = \mathop {\lim }\limits_{h \to 0^ + } \frac{{{\rm P} [ X_1 + X_2 \in [a,b],0 < X_1 - X_2 < h]}}{{{\rm P} [0 < X_1 - X_2 < h]}}. $ On the other hand, you may (less likely) wish to interpret that as $ {\rm P} \big[ X_1 + X_2 \in [a,b] \big|X_1 / X_2 = 1 \big] = \mathop {\lim }\limits_{h \to 0^ + } {\rm P}\big[ X_1 + X_2 \in [a,b] \big|1 < X_1 / X_2 < 1+h \big], $ leading to $ {\rm P}\big [ X_1+X_2 \in [a,b] \big| X_1=X_2 \big ] = \mathop {\lim }\limits_{h \to 0^ + } \frac{{{\rm P} [ X_1 + X_2 \in [a,b],0 < X_1 - X_2 < X_2 h]}}{{{\rm P} [0 < X_1 - X_2 < X_2 h]}}. $ However, it should not be surprising that the two interpretations lead to different probabilities, since we have limits of the form $a_i (h) / b_i (h)$, $i=1,2$, with $a_i(h),b_i(h) \to 0$ as $h \to 0^+$. Clearly, ${\rm P} [0 < X_1 - X_2 < h]$ and ${\rm P} [0 < X_1 - X_2 < X_2 h]$ are not expected to have the same behavior as $h \to 0^+$. So, the only problem was how to interpret conditioning on $X_1 = X_2$, and this is up to you. In general, however, there is no such problem; you just use the formula $f_{Y|X} (y|x) = \frac{{f_{X,Y} (x,y)}}{{f_X (x)}}$, to find the conditional density function of $Y$ given $X=x$ (where ${f_{X,Y}}$ is the joint density function of $X$ and $Y$). The conditional distribution function of $Y$ given $X=x$ is obtained by integrating the conditional density.

EDIT: The conditional density function of $X_1+X_2$ given $X_1-X_2=0$ (where $X_1$ and $X_2$ are independent exponential($\lambda$) rv's), when interpreting the conditioning with respect to the random variable $X_1-X_2$, is, by Eq. (11) in the book, the exponential$(\lambda)$ density function, $\lambda e^{-\lambda y}$, $y \geq 0$. Hence, ${\rm P}\big[X_1 + X_2 \in [a,b]\big|X_1 - X_2 = 0 \big]$ is given by $ \mathop {\lim }\limits_{h \to 0^ + } {\rm P}\big[ X_1 + X_2 \in [a,b]\big|0 < X_1 - X_2 < h \big] = \int_a^b {\lambda e^{ - \lambda y} \,{\rm d}y} = e^{ - \lambda a} - e^{ - \lambda b}. $ On the other hand, the conditional density function of $X_1+X_2$ given $X_1/X_2=1$, when interpreting the conditioning with respect to the random variable $X_1/X_2$, is, by Eq. (10) in the book, the ${\rm Gamma}(2,\lambda)$ density function, $\lambda ^2 ye^{ - \lambda y}$, $y \geq 0$ (i.e., the density function of $X_1+X_2$; this is since $X_1+X_2$ and $X_1/X_2$ are independent). Hence, ${\rm P} \big[ X_1 + X_2 \in [a,b] \big|X_1 / X_2 = 1 \big]$ is given by $ \mathop {\lim }\limits_{h \to 0^ + } {\rm P}\big[ X_1 + X_2 \in [a,b] \big|1 < X_1 / X_2 < 1+h \big] = \int_a^b {\lambda ^2 ye^{ - \lambda y} \,{\rm d}y} = (\lambda a + 1)e^{ - \lambda a} - (\lambda b + 1)e^{ - \lambda b}. $ Both results agree with numerical simulations (approximating the probabilities for small values of $h$).

0

This is not an answer, but rather a comment. This problem is related to the Borel paradox. In general, to define a conditional probability with respect to a $\sigma$-field (in this case, the one generated by continuous random variables), one needs the notion of regular conditional probability. In short, the (pointwise) conditional probability given a probability-zero event, is not well defined.

  • 0
    @Qiang Li: The p.d.f. should be defined in the almost everywhere sense. For examples such as $f_{Y\mid X}(y\mid x)$, the (conditional) p.d.f. is defined for all possible values of $X$. Then, you can use the formula to calculate, e.g. ${\mathbb P}(Y\in A\mid X\in B) = \int_B\int_Af_{Y\mid X}(y\mid x)f_X(x)dydx$. Another way to see the problem of, say $f_{Y\mid {X_1-X_2=0}}(y)$, is that there is no non-zero probability that such a p.d.f. can calculate.2011-01-24