7
$\begingroup$

Let $x_i$ be iid nonnegative discrete random variables $E[x_i]=N/M$ for some integers $N, M$, variance $\sigma^2$ and higher moments known (finite).

Then, the sum $\displaystyle S = \sum_{i=1}^M x_i$ will have $E[S]=N$.

I'm interested in the probability that $S$ takes that precise value: $A=P\left(S=E[S]\right)$.

Applying the central limit theorem, I can write

$\displaystyle A \approx \frac{1}{\sqrt{ 2 \pi M \sigma^2}}$

My question is: can this approximation be refined?

ADDED: To add some example-context-motivation:

Lets consider $X$ as a sum of $N$ Bernoullis (0/1) with prob=$p$, such that $E(X)=E(N p)$ is an integer. We can compute exactly the probability that $X$ attains its expected value, it's a Binomial:

$\displaystyle P = P(X= N p) = {N \choose N p} p^{N p} q^{N q} \hspace{2cm}$ [1a]

We might also get an approximate value of that probability using the CTL (Central Limit Theorem)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \hspace{2cm} $ [2a]

If we take [1a] and use the Stirling approximation, with $K \approx (K/e)^K \sqrt{2 \pi K}$, we get the same value. Fine.

Now, we may try to refine the approximation, both from [1a] and [2a].

Plugging the next orden Stirling approximation in [1a], we get (I am not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1- p q}{12 N p q} \right) \hspace{2cm} $ [1b]

To refine the CTL, one can think of

  • use some "continuity correction" to evaluate more precisely the (hipothetical) gaussian integral

  • add some terms from the Edgeworth expansions

  • do nothing of the above - because the CLT does not justify those procedures in this scenario (just one value of a discrete variable)

I'm not sure which is the correct way.

But let's try the first one: the next order approximation of the integral gives me (again, if I'm not mistaken)

$\displaystyle P \approx \frac{1}{\sqrt{2 \pi N p q}} \left(1 - \frac{1}{24 N p q} \right) \hspace{2cm} $ [2b]

This is not the same as [1b], but it's close.

Is this just casual? Was it a reasonable thing to do? Should I look (also/instead) after the Edgeworth expansions?

  • 0
    I think the Edgeworth expansion should work fine in general. The characteristic function of a discrete distribution looks like that of a continuous distribution, at least around $t=0$. The only difference (as noted by @whuber) will be when the characteristic function has multiple points of modulus $1$, which will occur just when the original distribution is nonzero only at $an+b$ for $n \in \mathbb{Z}$ and some a>1.2011-02-16

1 Answers 1

8

For a discrete random variable $X$ with support $\mathbb{Z}$, the Fourier transform of the probability distribution $P_x \equiv P[X=x]$ is given by $ \tilde{P}(k) = \sum_{x=-\infty}^{\infty} e^{ikx} P_x = E\left[e^{ikx}\right] = e^{h(k)}, $ where $ h(k) = \sum_{n=1}^{\infty} \kappa_{n} \frac {(ik)^{n}}{n!} $ is the natural logarithm of the characteristic function of $X$, and $\kappa_{n}$ is the $n$th cumulant of $X$. Recall that $\kappa_{1} = \mu$ is the mean and $\kappa_{2} = \sigma^2$ is the variance. The probability that a sum of $M$ independent variables $X_i$ with the same distribution is exactly $x \in {\mathbb{Z}}$ is then $ \begin{eqnarray} P\left[\sum_{i=1}^{M} X_i = x\right] &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{-ikx}\tilde{P}(k)^M \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{Mh(k)-ikx} \\ &=& \int_{-\pi}^{\pi}\frac{dk}{2\pi} e^{ik(M\mu - x) - \frac{1}{2}M\sigma^2 k^2} \exp\left(\sum_{n=3}^{\infty}M\kappa_{n}\frac{(ik)^{n}}{n!}\right). \end{eqnarray} $ Considering the desired case where $x = M\mu \in {\mathbb{Z}}$, and making the change of variable $k \rightarrow k/(\sigma\sqrt{M})$, we have $ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\int_{-\pi\sigma\sqrt{M}}^{\pi\sigma\sqrt{M}} d\Phi(k) \exp\left(\sum_{n=3}^{\infty} \sigma^{-n}M^{1-\frac{1}{2}n}\kappa_{n}\frac{(ik)^{n}}{n!}\right), $ where $d\Phi(k) = \phi(k) dk$ is the standard normal distribution (with mean $0$ and variance $1$). Here we assume that exponential decays rapidly away from $k=0$, so we may replace the limits of integration by $\pm\infty$. Then, expanding the exponential in inverse powers of $M$, and using the fact that the $n$th central moment of the standard normal distribution vanishes for odd $n$ and is equal to $(n-1)!!$ for even $n$, we obtain the following: $ P\left[\sum_{i=1}^{M} X_i = M\mu\right] = \frac{1}{\sigma\sqrt{2\pi M}}\left(1 + \frac{\kappa_4}{8M\sigma^4} - \frac{5\kappa_3^2}{24M\sigma^6} + O(M^{-2})\right). $ This is essentially the Edgeworth expansion. If $X$ is the Bernoulli distribution with probability of success $p = \frac{1}{2}(1+a)$ (and of failure $q=\frac{1}{2}(1-a)$), then it is straightforward to verify that $ \begin{eqnarray} \kappa_2 &=& \sigma^2 = pq = \frac{1}{4}(1-a^2) \\ \kappa_3 &=& \frac{1}{4}(1-a^2)(-a) = -\frac{1}{4}a(1-a^2) \\ \kappa_4 &=& \frac{1}{8}(1-a^2)(3a^2-1), \end{eqnarray} $ and hence $ \begin{eqnarray} \frac{5\kappa_3^2}{24\sigma^6} &=& \frac{5a^2}{6(1-a^2)} \\ \frac{\kappa_4}{8\sigma^4} &=& \frac{3a^2 - 1}{4(1-a^2)}, \end{eqnarray} $ for a total correction term proportional to $ -\frac{5\kappa_3^2}{24M\sigma^6} + \frac{\kappa_4}{8M\sigma^4} = \frac{9a^2-3-10a^2}{12M(1-a^2)} = -\frac{3+a^2}{12M(1-a^2)} = -\frac{1-pq}{12Mpq}, $ which agrees with the Stirling approximation to the exact result.

  • 0
    Great answer! Only one doubt: I'm not sure if the aproximation introduced (when extending the integral, which assimilates our discrete-lattice variable to a continuous value) produces an error less that O(1/M). If your procedure is equivalent to a Edgeworth expansion of the distribution function and evaluating it in the points `u-1/2, u+1/2`, then I think Edgeworth gives an extra error for lattice vars (sort of a discretization error) that requires special correction if we want a total error of order better than $1/sqrt(M)$) (eg, http://goo.gl/c3WTB ). I'll check all this and let you know.2011-02-17