I have a sequence $X_i$ of random variables which can with probability $1/2$ each, take values $+1$ and $-1$. How do I find $\lim_{n \to \infty} P(\Sigma X_i \le x)$? Pretty obviously the sum has equal masses at $+\infty$ and $- \infty$, but I am not able to write a proof down. Should I use all moments from the characteristic function and use it to describe the cdf?
Distribution of sum of iid binary random variables
-
0I assume that $x$ is fixed. We can either work with the $X_i$ directly, or let $Y_i=(X_i+1)/2$. Then use the Central Limit Theorem. More unpleasant but doable would be to make explicit estimates for probabilities associated with $\sum_1^n Y_i$. – 2012-03-12
-
0Note that $X_i$ is _not_ a Bernoulli random variable. The variables $Y_i = (X_i + 1)/2$ given by André are. – 2012-03-12
-
0http://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem – 2012-03-12
-
0@AndréNicolas : The central limit theorem won't work here. The central limit theorem will give you $\lim_{n\to\infty} \Pr(\sum_{i=1}^n X_i/\sqrt{n} \le x)$. That $\sqrt{n}$ needs to be there. – 2012-03-12
-
0@Michael Hardy: I may have trouble with division. We want the probability that $\overline{X} \le \frac{x}{n}$, so the probability that a standard normal is $\le \frac{cx\sqrt{n}}{n}$, where $c$ is an irrelevant constant. For $n$ large, we are calculating the probability that a standard normal is $\le$ to a nearly $0$ number. – 2012-03-12
-
0@AndréNicolas : Sorry---I should have written $\sqrt{n}\;\overline X$. – 2012-03-13
2 Answers
We assume that $x$ is fixed for the duration of this solution. The first solution is deliberately pretty silly. For no good reason except familiarity, we let $Y_i=(X_i+1)/2$. Then the $Y_i$ are independent Bernoulli Random variables, and $\sum_{1}^n X_i=2\sum_1^n Y_i -n$. We are interested in $P(\sum_1^n x_i \le x)$. This is $$P\left(\sum_1^n Y_i -\frac{n}{2} \le \frac{x}{2}\right).$$ The random variable $\sum_1^n Y_i$ has Binomial distribution.
Divide both sides by $n$. We want $$P\left(\overline{Y} -\frac{1}{2} \le \frac{x}{2n}\right).$$ The random variable $\overline{Y}$ has mean $\frac{1}{2}$ and variance $\frac{1}{4n}$. By the Central Limit Theorem, the probability that $\overline{Y}-\frac{1}{2}$ is $\le y$ is approximately the probability that $Z \le y/\sqrt{1/4n}$, where $Z$ is standard normal.
Putting things together, we find that for $n$ large, the probability that $\overline{Y}-\frac{1}{2}$ is $\le \frac{x}{2n}$ is approximately the probability that $Z\le \frac{2x\sqrt{n}}{2n}$, that is, the probability that $Z\le \frac{x}{\sqrt{n}}$. By approximately we mean that the difference between the two probabilities approaches $0$ as $n\to\infty$.
As $n\to \infty$, the number $\frac{x}{\sqrt{n}}$ approaches $0$, so our probability approaches $\frac{1}{2}$.
Remark: The approach we used was very wasteful. In particular, the $Y_i$ were completely unnecessary! The $X_i$ have mean $0$ and variance $1$, what could be nicer than that? The random variable $\sum_1^n X_i$ has mean $0$ and variance $n$, and therefore standard deviation $\sqrt{n}$. So the probability that $\sum_1^n X_i
We have deliberately used the Central Limit Theorem imprecisely. That can be replaced by the precise limit version.
-
0Thanks for the answer, Andre! It is simple, except for differing from the usual applications of the CLT where the variance tends to 0. – 2012-03-13
-
0@Bravo: Yes, in a certain sense because the variance of the sum is large, and $x$ is (in standard deviation units) is close to $0$, $x$ is virtually at the center of the distribution. The constant $x$ could be replaced by something that does not grow too fast, like $cn^{1/4}$. But $n^{1/2}$ would be too big. – 2012-03-13
You can also compute directly. Fix $x<0$ and $k$ as the biggest integer not over $x$. $$P(S_n \leq x)=\{1-P(S_n=0)\}\frac{1}{2}-P(S_n=-1)-...-P(S_n=k+1)$$ and note that those probabilities converge to $0$.