0
$\begingroup$

Suppose I have a sequence of I.I.D random variables drawn from a population with population mean $\mu$ and variance $\sigma^2$.

I observe one at a time, then two at a time, then three at a time, and so on and so on. Each time, I record the sample population mean.

The Law of Large Numbers says that $\bar{X} \rightarrow \mu$ as $n \rightarrow \infty$.

How many observations are needed so that $\lvert \bar{X} - \mu \rvert < \epsilon$? Can this be expressed in terms of the mean and standard deviation?

  • 1
    This cannot be guaranteed. You can at most achieve $|\bar X-\mu|<\epsilon$ with probabilty $p$2017-01-13
  • 0
    The Chebyshev inequality says $P[|(\frac{1}{n}\sum_{i=1}^nX_i)- \mu| \geq \epsilon] \leq \frac{\sigma^2}{n \epsilon^2}$ for all $\epsilon>0$ and all $n \in \{1, 2, 3, ...\}$. Tighter bounds, such as in Robert Israel's answer, can be obtained for known distributions.2017-01-13

1 Answers 1

2

"How many observations are needed" is something that will vary from trial to trial. The best we can do is to determine a probability distribution.

If they are based on distinct samples (the "one at a time", "two at a time", "three at a time" don't overlap), the averages $\overline{X}_n$ are independent. The probability that you need more than $m$ observations (i.e. "one at a time", "two at a time", ... "$m$ at a time" are all out by at least $\epsilon$) is

$$\prod_{n=1}^m \mathbb P\left(\left|\overline{X}_n - \mu\right| \ge \epsilon\right)$$

If the random variables are normally distributed, $\overline{X}_n - \mu$ is normal with mean $0$ and variance $\sigma^2/n$. Thus $$\mathbb P\left(\left|\overline{X}_n - \mu\right| \ge \epsilon\right) = 2 \Phi(-\epsilon \sqrt{n}/\sigma) \sim \frac{\sqrt{2} \sigma}{\sqrt{n\pi} \epsilon} e^{-\epsilon^2 n/(2 \sigma^2)}\ \text{as}\ n \to \infty$$ where $\Phi$ is the standard normal CDF.

For other distributions it will be more complicated.

Note: it is tempting to try the Central Limit Theorem here, but that temptation should be avoided, because we are not looking at a fixed number of standard deviations here. Instead, Large Deviations theory could be used.

  • 0
    Seems more direct to just use the Chebyshev inequality with the given info.2017-01-13
  • 0
    Chebyshev inequality will give rather poor bounds, I think. Chernoff may be better.2017-01-13
  • 0
    Yes, or the Chernoff-Hoeffding bound. [For the sake of the asker of the question: If the i.i.d. random variables $\{X_i\}$ are bounded and always contained in an interval of size $M$, such as $X_i \in [a, a+M]$ always (for some $a \in \mathbb{R}$), then $$P\left[\left|(\frac{1}{n}\sum_{i=1}^n X_i) - \mu\right| \geq \epsilon\right] \leq 2\exp\left(\frac{-2n\epsilon^2}{M^2}\right)$$ which is qualitatively similar to the Gaussian case above.]2017-01-14