6
$\begingroup$

Say you have 2 iid random variables $x,y\sim U[0,1]^k$, i.e. the uniform distribution over the k-dimensional unit cube. What's the expected value of the Euclidean distance between them when they have been normalized by the maximum distance possible, i.e. $\sqrt k$?

For $k=1$, I worked out that this is 1/3. For $k=100$, monte carlo simulations tell me it's a little over 0.4.

I tried to work out the math but $\frac{1}{\sqrt{k}}\int_{S_x} \int_{S_y} \sqrt{(x-y)^T (x-y)}\;dx\;dy$ where $S_x,S_y=[0,1]^k$ for general $k$ is beyond me.

2 Answers 2

7

The asymptotics is $1/\sqrt6=0.40824829$.

To see this, consider i.i.d. random variables $X_i$ and $Y_i$ uniform on $[0,1]$ and write the quantity to be computed as $I_k=\mathrm E\left(\sqrt{Z_k}\right)$ with $ Z_k=\frac1k\sum_{i=1}^k(X_i-Y_i)^2. $ By the strong law of large numbers for i.i.d. bounded random variables, when $k\to\infty$, $Z_k\to z$ almost surely and in every $L^p$, with $z=\mathrm E(Z_1)$. In particular, $I_k\to \sqrt{z}$. Numerically, $ z=\iint_{[0,1]^2}(x-y)^2\mathrm{d}x\mathrm{d}y=2\int_0^1x^2\mathrm{d}x-2\left(\int_0^1x\mathrm{d}x\right)^2=2\frac13-2\left(\frac12\right)^2=\frac16. $


Edit (This answers a different question asked by the OP in the comments.)

Consider the maximum of $n\gg1$ independent copies of $kZ_k$ with $k\gg1$ and call $M_{n,k}$ its square root. A heuristics to estimate the typical behaviour of $M_{n,k}$ is as follows.

By the central limit theorem (and in a loose sense), $Z_k\approx z+N\sqrt{v/k}$ where $v$ is the variance of $Z_1$ and $N$ is a standard gaussian random variable. In particular, for every given nonnegative $s$, $ \mathrm P\left(Z_k\ge z+s\right)\approx\mathrm P\left(N^2\ge ks^2/v\right). $ Furthermore, the typical size of $M_{n,k}^2$ is $z+s$ where $s$ solves $\mathrm P(Z_k\ge z+s)\approx1/n$. Choose $q(n)$ such that $\mathrm P(N\ge q(n))=1/n$, that is, $q(n)$ is a so-called quantile of the standard gaussian distribution. Then, the typical size of $M_{n,k}^2$ is $k(z+s)$ where $s$ solves $ks^2/v=q(n)^2$. Finally, $ M_{n,k}\approx \sqrt{kz+q(n)\sqrt{kv}}. $ Numerically, $z=1/6$, $v=7/180$, and you are interested in k=1'000. For n=10'000, $q(n)=3.719$ yields a typical size $M_{n,k}\approx13.78$ and for n=100'000, $q(n)=4.265$ hence $M_{n,k}\approx13.90$ (these should be compared to the values you observed).

To make rigorous such estimates and to understand why, in a way, $M_{n,k}$ concentrates around the typical value we computed above, see here.

  • 0
    @Didier I saw this problem in a CS class where the instructor was demonstrating counter-intuitive behaviors in high dimensions. And I couldn't remember if he said he normalized by the sample maximum distance or the theoretical maximum distance. I just assumed that with a large enough sample, the sample maximum and the theoretical maximum would be very close. You showed me it's not. I'll have to go through your added solution several times to fully grasp it. But I do have a rough idea of what you are saying. Thanks!2011-09-14
1

This probably isn't anything nice for general $k$. You're asking for $ E \sqrt{ \sum_{i=1}^k (X_i-Y_i)^2/k } $ where $X_i$ and $Y_i$ are independent uniform(0,1). However, let $Z_i = (X_i-Y_i)^2$. Now you're looking for $ E \sqrt{ (\sum_{i=1}^k Z_k)/k }$ and let's think about what happens when $k$ gets large. When $k$ is large, by the law of large numbers $(\sum_{i=1}^k Z_k)/k$ will be concentrated around $E(Z_k)$. Therefore as $k$ gets large I expect that the answer approaches $\sqrt{E(Z_k)}$, which is $\sqrt{1/6}$.