1
$\begingroup$

Say I have an infinite collection of $0/1$ random strings of length $n$ (i.e., say 010001110101), where each digit is an independent Bernoulli RV, with parameter $p_i$, $i:1...n$. Define the "trait value" of a string to be the number of "1"s. Define $X$ to be a RV representing the absolute difference between the trait values of a sample of two strings. Define $Y$ to be a RV representing the "distance" between two sampled strings, where this distance is simply the number of mismatched digits (i.e., so 01101-00111=2), also known as the Hamming distance.

Are $X$ and $Y$ correlated? If so, given $n$ and $p_i$, $i:1...n$, what is the covariance?

  • 0
    Yes, sorry for not being specific. The bits (Bernoulli RVs) are independent.2012-05-28

1 Answers 1

1

Thus, $X=\left|\sum\limits_iZ_i\right|$ and $Y=\sum\limits_i|Z_i|$ where the random variables $(Z_i)$ are independent and $Z_i=0$, $+1$ or $-1$ with probability $1-a_i$, $\frac12a_i$ and $\frac12a_i$ respectively, with $a_i=2p_i(1-p_i)$.

The covariance of $X$ and $Y$ is not easy to compute but the covariance of $X^2$ and $Y$ is. To begin with, $\mathrm E(Z_i)=0$ and $\mathrm E(Z_i^2)=a_i$ hence $\mathrm E(X^2)=\sum\limits_ia_i$. Likewise, $Z_i^2=|Z_i|$ with full probability hence $Y=\sum\limits_iZ_i^2$ and $\mathrm E(Y)=\sum\limits_ia_i$. Finally, $ X^2Y=\sum\limits_iZ_i^4+2\sum\limits_{i,j}Z_i^2Z_j^2+2\sum\limits_{i,j}Z_i^3Z_j+2\sum\limits_{i,j,k}Z_i^2Z_jZ_k, $ where the sums are over $i$ and $j$ distinct and over $i$, $j$ and $k$ distinct. Since $Z_i^4=|Z_i|$ with full probability, $ \mathrm E(X^2Y)=\sum\limits_ia_i+2\sum\limits_{i\ne j}a_ia_j, $ and finally, $ \mathrm{Cov}(X^2,Y)=\sum\limits_ia_i(1-a_i)=\sum\limits_i2p_i(1-p_i)(1-2p_i(1-p_i)). $ Edit: For each $i$, introduce $ A_i=\mathrm E\left|\sum\limits_{j\ne i}Z_j\right|,\qquad B_i=\mathrm E\left|1+\sum\limits_{j\ne i}Z_j\right|. $ Decomposing the expectations along the values of $Z_i$, one gets $\mathrm E(X\cdot|Z_i|)=a_iB_i$ and $\mathrm E(X)=a_iB_i+(1-a_i)A_i$. By convexity, $B_i\geqslant1+A_i$ hence $ \mathrm E(X)\mathrm E(|Z_i|)=a_i(a_iB_i+(1-a_i)A_i)\leqslant a_iB_i-a_i(1-a_i)=\mathrm E(X\cdot|Z_i|)-a_i(1-a_i). $ Summing up, this yields $ \mathrm{Cov}(X,Y)=\sum_i\mathrm E(X\cdot|Z_i|)-\mathrm E(X)\mathrm E(|Z_i|)\geqslant\sum\limits_ia_i(1-a_i)\gt0. $

  • 0
    Please see my recent concern, @Did...2015-07-07