(problem statement revamped.)
Definition: A random variable $X$ is a sub-gaussian if \Pr(|X|>t) \leq \exp(-t^2/k^2) \quad \text{for all $t\geq 0$ and for some constant $k$.}
I want to compute trace of a matrix by Monte Carlo simulation. For example, in lattice Quantum Chromodynamics, we need to compute the trace of a function of a large matrix, trace($f(\mathbf{A}')$) where explicit computation of f(\mathbf{A}') is impractical but computing \mathbf{x}f(\mathbf{A}')\mathbf{x}^T is feasible. For simplicity, let's say the quadratic form is $\mathbf{x}\mathbf{A}\mathbf{x}^T$.
One standard approach is Monte Carlo simulation where the trace is estimated by $X=\frac{1}{M}\sum_{i=1}^M\mathbf{x}_i\mathbf{A}\mathbf{x}_i^T$ where each entry in $\mathbf{x}_i$ is sampled from $N(0,1)$ independently. I know that $X$ is sharply concentrated around $E(X) = tr(\mathbf{A})$, and $X$ is sub-gaussian, to be precise.
One standard way to show that a random variable is sharply concentrated around the mean is $(\epsilon,\delta)$-approximation of tail bound. $\Pr(|X - E(x)|>\epsilon)<\delta$ which can be easily computed from the definition of sub-gaussian in my problem. And this tells us how many samples $M$ are required to guarantees $(\epsilon,\delta)$-approximation.
Meanwhile, the Central Limit Theorem tells us that $\sqrt{M}(X- E(x))$ converges in distribution to $N(0, \sigma^2)$, which is sharply concentrated around the shifted mean 0. Even though it requires $M\to\infty$, it tells us the exact probability distribution of $X$. Maybe it's not a correct interpretation, but it seems to me that these two are the exact opposite of the spectrum.
- Sub-gaussian: a finite number of samples, but $(\epsilon,\delta)$-approximation
- CLT : number of samples $\to\infty$, but exact probability distribution.
Let's say I have two matrices $\mathbf{A}$ and $\mathbf{B}$ which differ at only one entry. I ran Monte Carlo simulation to obtain the value $X=\frac{1}{M}\sum_{i=1}^M \mathbf{x}_i\mathbf{A}\mathbf{x}_i^T$ and $Y=\frac{1}{M}\sum_{i=1}^M \mathbf{y}_i\mathbf{B}\mathbf{y}_i^T$ where each entry in $\mathbf{x}_i$ and $\mathbf{y}$ is independently sampled from $N(0,1)$.
Question: Is there anyway to point-wise compare the probability distribution functions of $\mathbf{X}$ and $\mathbf{Y}$ to show that $\exp(-\alpha)f_Y(t) \leq f_X(t) \leq \exp(\alpha)f_Y(t)$ with probability at least $1 − \beta$ over $t$ drawn from the range of $X$ or $Y$?
This kind of inequality may seem odd, but it's used in some recent papers. For example, let's say $X$ is sampled from a Laplace distribution with density function $h(x)=\frac{1}{2b}\exp(-|x|/b)$ and compare it to Y which is sampled from a shifted Laplace distribution with density function $h(x+\delta)=\frac{1}{2b}\exp(-|x+\delta|/b).$ Then the ratio $h(x)/h(x+\delta)\leq \exp(|\delta|/b)$ where I can show that $h(x) \leq \exp(|\delta|/b)h(x)$ for the entire domain of $X$.
In case that the random variable is sampled from a standard Gaussian with density function $g(x)=\frac{1}{\sqrt{2\pi}}\exp(-\frac{x^2}{2}),$ the picture is slightly nuanced: we can show that $g(x)/g(x+\delta)$ is bounded in the region of $|x|\leq \xi$ where $\xi$ depends on $\delta$. This results in $(\alpha,\beta)$ approximation.
This inequality may look too strong, but it's based on the intuition that
- sub-gaussian random variables must be sharply concentrated around the mean, and
- subgaussian ranodm variable is a generalization of Gaussian random variable.
Is there any way to do that, or is my conjectured inequality too strong?