1
$\begingroup$

My question

I have a set of $N$ random boolean variables $X_1, \ldots, X_N$ (each can be $1$ or $0$). For every $i \in [1, N]$, I know that

$$P(X_i = 1) = p^*$$

Now, I know that the variables are positively correlated, i.e., for every $i, j \in [1, N]$ I have:

$$P(X_i = 1 | X_j = 1) \geq p^*$$ $$P(X_i = 1 | X_j = 0) \leq p^*$$

(but I can't find an expression for those conditional probabilities)

Is the probability of at least one variable being $1$ larger or smaller than if they were independent?

The extreme case

Consider the case where all the variables are completely correlated, so that for every $i, j \in [1, N]$

$$P(X_i = 1 | X_j = 1) = 1$$ $$P(X_i = 1 | X_j = 0) = 0$$

Then the probability $p$ of at least one random variable being $1$ is $p = p^*$ (either all of them are $1$ or none is).

While in the case where they are completely independent, i.e.,

$$P(X_i = 1 | X_j = 1) = p^*$$ $$P(X_i = 1 | X_j = 0) = p^*$$

I have

$$p = 1 - (1 - p^*)^N$$

and since $(1 - p^*) < 1 \implies (1 - p^*)^N < (1 - p^*)$ I have

$$p > p^*$$

So in the limiting case I know that being positively correlated makes the probability of at least one variable being $1$ smaller. Is this true for any positive correlation?

1 Answers 1

1

Is the probability of at least one variable being $1$ larger or smaller than if they were independent?

NB: For convenience I write $p$ instead of your $p^*$. Now the probability in question is $1-P(X_1=0,...,X_N=0)$, which equals $1-(1-p)^N$ if all the $X_i$ are mutually independent.

Theorem: Suppose $(X_1,...,X_N)$ is distributed on $\{0,1\}^N$ such that $P(X_i=1)=p\in(0,1)$ for all $1\le i\le N$, and $P(X_i=1\mid X_j=1)\ge p$ for all $1\le i\lt j\le N$. If $N=2$, then
$$1-P(X_1=0,...,X_N=0) \le 1-(1-p)^N $$ but if $N\ge 3$, then the preceding inequality holds for some distributions and fails for others.

Proof:

Note that because $P(X_i=1\mid X_j=1)\ge p$, the correlation coefficient between any two distinct $X_i$ and $X_j$ is nonnegative: $$\begin{align}\rho_{ij} &=\frac{E(X_iX_j)-(EX_i)(EX_j)}{E(X_iX_i)-(EX_i)(EX_i)}=\rho_{ji}\\ \\ &= \frac{E(X_iX_j)-p^2}{p-p^2}\\ &\ge 0 \end{align}$$ because $$EX_iX_j = P(X_i=1,X_j=1) = P(X_i=1\mid X_j=1)P(X_j=1)=P(X_i=1\mid X_j=1)\cdot p\ge p^2. $$

Notation: Let $P(*)$ denote the sum of all $2^N$ joint probabilities. Let $P(*1_i)$ denote the sum of just those joint probabilities that have a $1$ in the $i$th position. Similarly, let $P(*1_i0_j)$ denote the sum of just those joint probabilities that have a $1$ in the $i$th position and a $0$ in the $j$th position, etc. We have the following general formulas: $$\begin{align}1 &= P(*)\tag{1}\\ p &= P(*1_i)=P(*1_i0_j)+P(*1_i1_j)\tag{2}\\ P(*1_i1_j)&=E(X_iX_j)= p^2 + (p-p^2)\rho_{ij}\tag{3}\\ P(*1_i0_j)&= 1-P(*1_i1_j)=(p-p^2)(1-\rho_{ij})\tag{4}. \end{align}$$

For convenience we also write $P(x_1...x_N)$ to denote $P(X_1=x_1,...X_N=x_N)$.

Case $N=2$

We show that it always holds that $1-P(00)\le [1-P(00)]_\text{independent}$: $$\begin{align}1 &= P(*) = P(00)+P(11)+P(01)+P(10)\\ 1-P(00)&=P(11)+P(01)+P(10)\\ &= [p^2 + (p-p^2)\rho_{12}]+[(p-p^2)(1-\rho_{21})]+[(p-p^2)(1-\rho_{12})]\\ &= 2p-p^2-(p-p^2)\rho_{12}\\ &\color{blue}{\le} 2p-p^2=1-(1-p)^2=[1-P(00)]_\text{independent} \end{align}$$ because $p-p^2\ge 0$ and $\rho_{12}\ge 0$.

Case $N\ge 3$

We show the proof for $N=3$ (the higher-dimensional cases being similar), that it is not generally true that $1-P(000)\le [1-P(000)]_\text{independent}$; that is, $(X_1,X_2,X_3)$ may be jointly distributed such that $1-P(000)> 1-(1-p)^3$:

$$\begin{align}1 = P(*)&= P(000) + P(111) + P(*1_10_2)+P(*1_20_3)+P(*1_30_1)\\ 1-P(000)&=P(111) + P(*1_10_2)+P(*1_20_3)+P(*1_30_1)\\ &=P(111)+[(p-p^2)(1-\rho_{12})]+[(p-p^2)(1-\rho_{23})]+[(p-p^2)(1-\rho_{31})]\\ &=P(111)+(p-p^2)(3-\rho_{12}-\rho_{23}-\rho_{31}) \end{align}$$ It suffices to take $p=\frac{1}{2}$ and $\rho_{12}=\rho_{23}=\rho_{31}(=\rho\text{, say)},$ in which case $$\begin{align}1-P(000)&=P(111)+\frac{3}{4}(1-\rho). \end{align}$$ Thus, the proof will be accomplished if we can find a joint distribution such that $$P(111)+\frac{3}{4}(1-\rho)>1-(1-p)^3=\frac{7}{8}$$ i.e., such that $$P(111)>\frac{1}{8}+ \frac{3}{4}\rho.\tag{5}$$ Now from (3) and (4) we can deduce the following: $$\begin{align}P(110)=P(011)=P(101)&=\frac{1}{4}(1+\rho)-P(111)\tag{6a}\\ P(001)=P(101)=P(100)&=P(111)-\frac{1}{2}\rho.\tag{6b} \end{align}$$ Because each $P(x_1x_2x_3)$ must lie in the unit interval, we therefore need to find a value for $P(111)$ that satisfies, in addition to (5), also the following: $$\begin{align}\frac{1}{2}\rho\le P(111)\le \frac{1}{4}(1+\rho). \tag{7}\end{align}$$ There will be solutions to both (5) and (7) iff $$\begin{align}\frac{1}{8}+ \frac{3}{4}\rho < \frac{1}{4}(1+\rho)\end{align}$$ i.e., iff $\rho <\frac{1}{4}.$

Hence, for $N=3$, any joint distribution with $p=\frac{1}{2}$, $\ \ 0<\rho_{12}=\rho_{23}=\rho_{31}=\rho<\frac{1}{4}$, and $\frac{1}{2}\rho\le P(111)\le \frac{1}{4}(1+\rho)$ will be an example such that $1-P(000)> [1-P(000)]_\text{independent}$.

As an explicit example, we may take $p=\frac{1}{2}$, $\rho=\frac{1}{8}$, and $P(111)=\frac{1}{4}$. Then $$P(110)=P(011)=P(101)=\frac{1}{32},$$ $$P(001)=P(101)=P(100)=\frac{3}{16},$$ and $$1-P(000)=\frac{29}{32}>1-(1-\frac{1}{2})^3=\frac{7}{8}.$$