0
$\begingroup$

Semi-hypothetical situation:

I have a single stationary point, A, in 3 spatial dimensions. I imprecisely observe this point, so that the observations are distributed normally about A, with the dimensions being independent. By this I mean, if you only consider the x coordinates of each observation, those points will be distributed normally about the x-coordinate of X. Likewise with y and z. The variance in each dimension is the same.

I assume this forms a spherical cloud of observations around the point, denser toward the middle. The overall variance of these observations is the sum of the squared distances from the observation to A. Call this value variance1.

Now, lets say that I know the distance from A to the origin (or some other arbitrary point). Ie. It lies on a known spherical shell. So I modify each of my observations to put it on the closest point on the shell to its original position.

Now we can calculate the variance again as the sum of the squared distances from point A. Call this variance2.

My question is: what is the difference between variance1 and variance2 in terms of the original variance in each dimension, and the shell size.

1 Answers 1

0

Without loss of generality, let $\boldsymbol A = (1,0,0)$ and generate iid multivariate normal observations $\boldsymbol X_i$ according to the distribution $$\boldsymbol X \sim \operatorname{Normal}(\mu = \boldsymbol A, \Sigma = \sigma^2 I_3), \quad \sigma > 0.$$ Your question is basically asking for the variance of the scalar-valued random variable $$D_\sigma = \left|\frac{\boldsymbol X}{|\boldsymbol X|} - \boldsymbol A\right|,$$ which will be a function of $\sigma$ only. Note that we are measuring distance in the Euclidean metric in $\mathbb R^3$, not the metric on the unit sphere. This is not an easy thing to calculate. Intuitively, when $\sigma \gg 1$, the distribution of $\boldsymbol X/|\boldsymbol X|$ is almost uniform on the sphere, thus the asymptotic variance is $$\lim_{\sigma \to \infty} \operatorname{Var}[D_\sigma] = 2/9$$ (the result is left as an elementary exercise to the reader). The lower bound of course is zero, attained only when $\sigma \to 0$. Numerical simulation gives the following plot of $\operatorname{Var}[D_\sigma]$ as a function of $\sigma^2 \in (0,50]$, plotted on a log-linear scale: enter image description here

It is worth noting that this curve has a clear global maximum, somewhere around $\sigma^2 \approx 5$. This suggests that the variance of $D_\sigma$ is not maximized by making $\sigma \to \infty$: there is an intermediate range of values for the multivariate normal variance that make the corresponding distance distribution greater than the asymptotic variance. It would be a worthwhile exercise to see if $\arg\max \operatorname{Var}[D_\sigma]$ has a nice closed-form solution.