8
$\begingroup$

Let $X$ and $Y$ be two events.

So $P(X)$ is the probability of $X$ happens, and $P(Y)$ is the probability of $Y$ happens. So $P(X,Y)$ is probability of both $X$ and $Y$ happen.

So what is the meaning of the following function: $h(X,Y)=\frac{P(X,Y)}{P(X)P(Y)}?$

I know that when $h=1$, it means $X$ and $Y$ are independent. So what is the situation when

$h>1$ or $h<1$?

  • 0
    @joriki :) Thank you!!2011-11-05

3 Answers 3

10

Since you say that $X$ and $Y$ are events, let us rename them $A$ and $B$, to avoid a confusion with random variables.

Then, at least in the environmental, medical and life sciences literature, $P(A\cap B)/(P(A)P(B))$ is called the observed to expected ratio (abbreviation o/e). The idea is that the numerator is the actual probability of $A\cap B$ while the denominator is what it would be if $A$ and $B$ were independent.

Obviously the o/e ratio is $1$ if $A$ and $B$ are independent, it is more than $1$ if $A$ is favored by $B$, or, equivalently, if $B$ is favored by $A$, and it is less than $1$ if the opposite holds.

In the statistical analysis of genomic sequences, the CpGo/e ratio is especially important, which represents the frequency of the word CG divided by the product of the frequencies of the letters C (cytosine) and G (guanine), see here for an example. The rough idea is that in non functional portions of the genome, CpGo/e is much less than $1$ due to some well-known biological and chemical processes (a methylation-deamination of the guanine when it is right next to a cytosine, if you want to know). By contrast, in portions of the genome called CpG islands, CpGo/e is only slightly smaller than $1$ or even, greater than $1$, a fact which witnesses a repression of these processes and, as a consequence, may signal some functional regions.

4

You could notice (assuming you know about conditional probabilites) that $h(X,Y) = \frac{P(X Y )}{P(X) P(Y)} = \frac{P(X|Y)}{P(X)}= \frac{P(Y | X)}{P(Y)}$

Hence, for example, $h(X,Y) > 1 \Leftrightarrow P(X |Y) > P(X)$ which, informally, says that the occurrence of event $Y$ increments the probability of event $X$ occurring (and vice versa). And that is all. Remember, though, here $X,Y$ are events, not variables , ie., it does not make sense to say that, e.g, $h(X,Y) > 1$ for some variables $X,Y$ globally, (so that we could say that the variables are "positively dependent" or something like that). For general measures of random variables dependence (or correlation, which is a related though weaker property), see here.

Added: If we regard the events as two joint Bernoulli variables (we identify the event $X$ with the probability that the variable equals 1, $P(X=1)=p_X$ ), we can note that the covariance is given by $Cov_{X Y} = E(X Y) - E(X)E(Y)=p_{XY} - p_X \,p_Y$ and then $h(X,Y) = \frac{1}{1+Cov_{X Y}/p_{XY}}$

-1

We first define two random variables as $A = {\bf{1}}\left\{ {x \in X} \right\}$ and $B = {\bf{1}}\left\{ {y \in Y} \right\}$. The correlation coefficient ${\rho _{AB}}$ can be written as \begin{align} {\rho _{AB}} &= \frac{{E\left( {AB} \right) - E\left( A \right)E\left( B \right)}}{{\sqrt {Var\left( A \right)Var\left( B \right)} }}\\ &= \frac{{P\left( {A \cap B} \right) - P\left( A \right)P\left( B \right)}}{{\sqrt {P\left( A \right)\left( {1 - P\left( A \right)} \right)P\left( B \right)\left( {1 - P\left( B \right)} \right)} }}\\ &= \frac{{P\left( A \right)P\left( B \right)\left( {\frac{{P\left( {A \cap B} \right)}}{{P\left( A \right)P\left( B \right)}} - 1} \right)}}{{\sqrt {P\left( A \right)P\left( B \right)\left( {1 - P\left( A \right)} \right)\left( {1 - P\left( B \right)} \right)} }}\\ &= \left( {\frac{{P\left( {A \cap B} \right)}}{{P\left( A \right)P\left( B \right)}} - 1} \right)\sqrt {\frac{{P\left( A \right)}}{{1 - P\left( A \right)}}\frac{{P\left( B \right)}}{{1 - P\left( B \right)}}}. \end{align} We define $\gamma = \frac{{P\left( {A \cap B} \right)}}{{P\left( A \right)P\left( B \right)}}$. From the above equation and definition of $\gamma$, we have \begin{equation} \gamma = 1 + {\rho _{AB}}\sqrt {\frac{{1 - P\left( A \right)}}{{P\left( A \right)}}\frac{{1 - P\left( B \right)}}{{P\left( B \right)}}}. \end{equation} If the probabilities ${P\left( A \right)}$ and ${P\left( B \right)}$ are fixed, we can find there is a linear relationship between $\gamma$ and $\rho_{AB}$, just as the second answer described.