9
$\begingroup$

I was wondering how to get the cumulative distribution function of a sum of two random binomial variables.

X + Y, where X has n=15 trials and Y has m=15 trials and probability=0.2 for both

P(15<= X+Y <= 20) how would I express this as a cumulative distribution function?

Can I just add their probability mass functions Px(X) and Py(Y) and then get the CDF from there or do I need to go from joint distribution formulas?

Any help would be greatly appreciated!

Thanks.

  • 0
    Tha$n$$k$s David, how can I give you reputation points based on your comments? : )2012-01-30

2 Answers 2

8

Assuming that $X$ and $Y$ are independent, you can use the following, standard, result:

Let $X_1$ and $X_2$ be discrete, independent random variables with ${\rm dist }(X_1)={\rm B}(n,p)$ and ${\rm dist }(X_2)={\rm B}(m,p)$, where $B(n,p)$ denotes the Binomial distribution with $n$ trials and success factor $p$.

By independence, it would seem that the total number of successes in $n$ trials of $X_1$ and $m$ trials of $X_2$ should be a binomial variable with parameters $n+m$ and $p$. We now show that this is, indeed, the case.

Let $Y=X_1+X_2$. We will find the probability mass function of $Y$. Since $Y$ is the total number of successes in $n$ trials of $X_1$ and $m$ trials of $X_2$, the random variable $Y$ takes the values $0$, $1$, $\ldots\,$, $n+m$. Using the Convolution Theorem, for $0\le k\le n+m$, we have:

$ \eqalign{ p_Y(k)&=\sum_{i=0}^kP[X_1=i,X_2=k-i]\cr &=\sum_{i=0}^kP[X_1=i]\cdot P[ X_2=k-i]\cr &=\sum_{i=0}^k{n\choose i}(1-p)^{n-i}p^i\cdot{m\choose k-i}(1-p)^{m-(k-i)}p^{k-i}\cr % &=\sum_{i=1}^k{n\choose i}{m\choose k-i}(1-p)^{m+n}p^{k}\cr &=(1-p)^{m+n-k }p^{k}\sum_{i=0}^k{n\choose i}{m\choose k-i}\cr &={m+n\choose k}(1-p)^{m+n-k}p^{k }. } $ Thus, ${\rm dist }(X_1+X_2)={\rm B}(n+m,p)$.

In the above, we used the following:

Lemma

For any positive integers $n$, $m$, and $k\le n+m$: $ \sum_{i=0}^{k} {n\choose i}{m\choose k-i} = {n+m\choose k}. $

Proof: Apply the Binomial Theorem to the equality
$ (1+x)^n(1+x)^m=(1+x)^{n+m} $ to obtain $\tag{1} \sum_{i=0}^n{n\choose i}x^{n-i}\cdot\sum_{j=0}^m{m\choose j}x^{m-j}=\sum_{k=0}^{n+m}{n+m\choose k}x^{n+m-k}. $ But $ \eqalign{ \sum_{i=0}^n{n\choose i}x^{n-i}\cdot\sum_{j=0}^m{m\choose j}x^{m-j} &=\sum_{i=0}^n{n\choose i}\cdot\Bigl[\sum_{j=0}^m{m\choose j}x^{m-j}\bigr]x^{n-i}\cr &=\sum_{i=0}^n\Bigl[\sum_{j=0}^m{n\choose i}{m\choose j}x^{n+m-(i+j)}\Bigr].\cr } $ Now, terms of the form $x^{n+m-k}$ on the right hand side of the above equality are obtained only when $0\le i\le k$ and $j=k-i$. Thus, the $x^{n+m-k}$-th term of the left hand side of equation $(1)$ is: $ \sum_{i=0}^k{n\choose i}{m\choose k-i}x^{m+n-k}. $ Since the $x^{n+m-k}$-th term of the right hand side of equation $(1)$ is $ {n+m\choose k}x^{n+m-k}, $ we have $ \sum_{i=0}^k{n\choose i}{m\choose k-i}={n+m\choose k}, $ as desired.


Convolution Theorem:

The probability mass function of the sum of two independent discrete variables is the convolution of their probability mass functions:

Let $X_1$ and $X_2$ be independent, discrete random variables that take integer values with respective probability mass functions $p_{X_1}$ and $p_{X_2}$. Let $Y=X_1+X_2$. Then for each admissable $k$: $ p_Y(k)=\sum_{i\le\, k}p_{X_1}(i)p_{X_2}(k-i). $

The sum appearing on the right hand side of the above equality is called the convolution of $p_{X_1}$ and $p_{X_2}$.

Proof: Exercise.

6

The binomially distributed random variable $X$ records the number of successes if we repeat an experiment independently $m$ times, with probability of success each time equal to $p$. The random variable $Y$ records the the number of successes in $n$ independent trials, and we are told that $X$ and $Y$ are independent.

Define Bernoulli random variables $W_1, W_2, \dots, W_m, W_{m+1}, W_{m+2}, \dots, W_{m+n}$ as follows. Perform an experiment independently $m+n$ times, where the probability of success each time is $p$. Let $W=\sum_{i=1}^{m+n} W_i.$ Then $W$ has binomial distribution with parameters $m+n$, $p$. Note that $S=\sum_{i=1}^m W_i$ has binomial distribution with parameters $m$, $p$, and that $T=\sum_{i=m+1}^{m+n} W_i$ has binomial distribution with parameters $n$, $p$. So $S$ has the same distribution as $X$, and $T$ has the same distribution as $Y$.

Note also that $S$ and $T$ are independent. Since the distribution of $X+Y$ is completely determined by the distributions of $X$ and $Y$, we conclude that $X+Y$ has the same distribution as $S+T$. But $S+T=W$, so $X+Y$ is binomially distributed with parameters $m+n$, $p$.

Remark: Informally, $X+Y$ records the total number of successes in $m+n$ independent trials, where the probability of success on any trial is $p$. So it is "obvious" that $X+Y$ has binomial distribution, and there is nothing much to the argument above. However, showing that if $X$ and $Y$ are independent, then the Bernoulli components of $X$ and $Y$ are independent looks as if it may require some work. That was the reason for the workaround that used the fact that the distribution of $X+Y$ is completely determined by the individual distributions.