0
$\begingroup$

Let $X_i$ and $Y_i$, $i:1,\ldots,n$, be continuous i.i.d. random variables, uniformly distributed over $(0,1)$. Say we sample from these $RV$s and retain only values complying with $|X_i-Y_i|>\delta$, where $\delta$ is some given small positive constant.

I would like to prove that, EDIT: $$P\left(\frac{1}{n} \sum _{i=1}^n (Y_i-X_i)(1-2X_i)>\epsilon\right)\to1$$ when $n\to\infty$.

Where $\epsilon$ is a small positive constant which depends on $\delta$, but not on $n$.

3 Answers 3

2

It is most probable that for every $n$, $$ \mathrm P\left(\sum_{i=1}^n(X_i-Y_i)(1-2X_i)\gt0\right)\lt\frac12, $$ and, for every $\delta$, $$ \mathrm P\left(\sum_{i=1}^n(X_i-Y_i)(1-2X_i)\gt0\ \Bigg\vert\ \forall i,|X_i-Y_i|\gt\delta\right)\lt\frac12, $$ hence no $\epsilon\gt0$ will do.


Regarding the revised version, let $Z_i=(Y_i-X_i)(1-2X_i)$. Then $(Z_i)_i$ is i.i.d. with mean $\mathrm E(Z_i)=\frac16$ hence, for every $\epsilon\lt\frac16$, the (weak) law of large numbers shows that $$ \mathrm P\left(\frac1n\sum_{i=1}^nZ_i\gt\epsilon\right)\to1. $$ Likewise, let $(X,Y,Z)$ distributed like $(X_1,Y_1,Z_1)$, $A_\delta=[|X-Y|\gt\delta]$, $U^\delta$ any random variable distributed like $Z$ conditional on $A_\delta$, and $u_\delta=\mathrm E(U^\delta)=\mathrm E(Z\mid A_\delta)$. Then, for every $\epsilon\lt u_\delta$, the (weak) law of large numbers applied to an i.i.d. sequence $(U^\delta_i)_i$ distributed like $U^\delta$ shows that $$ \mathrm P\left(\frac1n\sum_{i=1}^nZ_i\gt\epsilon\ \Bigg\vert\ \forall i,|X_i-Y_i|\gt\delta\right)=\mathrm P\left(\frac1n\sum_{i=1}^nU^\delta_i\gt\epsilon\right)\to1. $$ To complete the proof, it remains to estimate $u_\delta$. First note that, by invariance of $A_\delta$ with respect to the symmetry $(X,Y)\to(Y,X)$, $\mathrm E(Y-X\,;\,A_\delta)=0$. Hence, $u_\delta=2\mathrm E((X-Y)X\mid A_\delta)$. Since $\mathrm E((X-Y)X\mid A_\delta)=\mathrm E((Y-X)Y\mid A_\delta)$, $u_\delta=\mathrm E((X-Y)^2\mid A_\delta)$. This proves the claim that $u_\delta\gt0$.

One can go further since the density of $|X-Y|$ is $2(1-x)\,[0\leqslant x\leqslant 1]$ and, consequently, the density of $|X-Y|$ conditionally on $A_\delta$ is $f_\delta(x)=2(1-\delta)^{-2}(1-x)\,[\delta\leqslant x\leqslant 1]$.

Thus, $u_\delta=\int\limits_\delta^1x^2f_\delta(x)\,\mathrm dx=\frac16(1+2\delta+3\delta^2)$. In particular, any $\epsilon\lt\frac16$ goes, for every $0\leqslant\delta\lt1$, and $\epsilon=\frac16$ goes, for every $0\lt\delta\lt1$.

  • 0
    Thanks much (again), @did. One (last) comment: If we remove the condition $|X_i - Yi| > δ$ altogether (which of course is not the same as having $δ=0$), does my version of the answer (without the $I_A$ part) also suffice? And then we can simply take $\epsilon$ to be any value < $2Var(X_i)$?2012-07-15
  • 0
    Removing the condition $|X_i-Y_i|\gt\delta$ **is the same thing as** having $\delta=0$. // Yes, one can choose any value $\epsilon\lt E((Y-X)(1-2X))=2E(X^2)-E(X)=\frac16$.2012-07-15
  • 0
    Of course, I meant $2(E(X^2)-E(X)^2)=2Var(X)$2012-07-15
1

Edit: here is a new version, with lighter computations and, hopefully, no error. My thanks to Did for pointing out some problems in my former proof. Of course, my answer is now essentially the same as Did...

Let $(X,Y)$ be a couple of random variables uniformly distributed in $[0,1]^2$. Let $\delta \in [0,1)$. The event $\{|X-Y|>\delta\}$ has probability:

$$\int_{[0,1]^2} 1_{|x-y|> \delta} \ dx \ dy = 2 \int_{[0,1]} \int_{[0,1]} 1_{x > \delta + y} dx \ dy = (1-\delta)^2.$$

Let $(\tilde{X}, \tilde{Y})$ be the random variables $(X,Y)$ conditioned by the event $\{|X-Y|>\delta\}$. It has the following density with respect to the Lebesgue measure on $[0,1]^2$:

$$\frac{1_{x > y + \delta} + 1_{x < y - \delta}}{(1-\delta)^2}.$$

Hence:

$$\mathbb{E} ((\tilde{X}-\tilde{Y})(1-2\tilde{X})) = \frac{1}{(1-\delta)^2} \int_{[0,1]^2} (x-y)(1-2x) (1_{x > y + \delta} + 1_{x < y - \delta}) \ dx \ dy$$

$$\cdots = \frac{1}{(1-\delta)^2} \int_\delta^1 \int_0^{x-\delta} (x-y)(1-2x) \ dy \ dx + \frac{1}{(1-\delta)^2} \int_0^{1-\delta} \int_{x+\delta}^1 (x-y)(1-2x) \ dy \ dx $$

The change of variables $(u,v) = (1-x,1-y)$ shows that these two integrals are the same, so we only need to compute one of them. Thanks to Wolfram,

$$\int_\delta^1 \int_0^{x-\delta} (x-y)(1-2x) \ dy \ dx = -\frac{1}{2} \int_\delta^1 (2x-1) (x^2-\delta^2) \ dx = -\frac{(1-\delta)^2 (1+2\delta+3\delta^2)}{12}.$$

Thus:

$$\mathbb{E} ((\tilde{X}-\tilde{Y})(1-2\tilde{X})) = -\frac{1+2\delta+3\delta^2}{6}.$$

This formula gives the good limits when $\delta$ goes to $0$ or $1$. It is also always negative (even for $\delta = 0$). By the law of large numbers, you should expect the sum to be negative for large $n$ almost surely.

  • 0
    Hi, I found the problem - it was a typo, see the edited original question, $(Y_i-X_i)$ instead of $(X_i-Y_i)$... in this case do we just remove the minus sign from the fraction in the RHS of your answer's last line?2012-07-15
  • 0
    @D.Thomine: Unless I am mistaken, the limit of $E_\delta((X-Y)(1-2X))$ when $\delta\to0$ should be $-\frac16$ but your formula yields $-\frac13$. (Also, since the result might be analytic with respect to $\delta$, I mention that when $\delta\to1$, the limit should be $-1$ and that the limit of your formula is $-\frac32$. Finally, I do not quite understand why the denominators $2-\delta$ survive until the end of your computations.)2012-07-15
  • 0
    @did: This computation is too complicated, it's not surprising that there are errors. I'm re-writing it in a simpler way.2012-07-15
0

[EDIT: the problem with this answer is that $I_A$ and $(Y_i-X_i)(1-2X_i)$ are dependent.]

Let $Z_i$ denote the summand multiplied by the indicator $I_A$ for $|X_i - Yi| > δ$, $Z_i=(Y_i-X_i)(1-2X_i)I_A$. We need to prove that $P\left(\bar{Z_n}>\epsilon\right)\to1$. Since the $Z_i$ are $i.i.d$ and bounded, from the weak law of large numbers $\bar{Z_n}$ converges in probability to $E(Z_i)$. It is therefore sufficient to show that $E(Z_i) > 0$. We proceed to evaluate $E(Z_i)$ analytically, $$E(Z_i)=E(I_A)\left(E(Y_i)-2E(Y_iX_i)-E(X_i)+2E(X_i^2)\right)=2E(I_A)\left(E(X_i^2)-E(X_iY_i)\right)=2E(I_A)Var(X_i)>0$$ since $0 by definition, $X_i$ and $Y_i$ are independent and with same means, and $Var(X_i)>0$.

  • 0
    Since $A$ depends on $(X_i,Y_i)$, the factorization of $E(I_A)$ in the first displayed equality used to compute $E(Z_i)$ is wrong. In fact, $E(Z_i)=-2E((X_i-Y_i)X_iI_A)=-E((X_i-Y_i)^2I_A)\lt0$, where the second identity stems from a symmetry argument.2012-07-14
  • 0
    OK, I suspected $I_A$ was dependent... However, why do you have a minus sign in the RHS of $E(Z_i)=-2E((X_i-Y_i)X_iI_A)$?2012-07-14
  • 0
    I think I will let you suspect that as well.2012-07-14
  • 0
    OK, I found the error - see my comment to D. Thomine.2012-07-15
  • 0
    @did, could you just elaborate on your "symmetry argument" in your first comment here?2012-07-15
  • 0
    See the version of my answer which addresses your new question (but **please** stop modifying your question like that).2012-07-15