Suppose I have independent random variables $X_i$ which are distributed binomially via $X_i \sim \mathrm{Bin}(n_i, p_i)$.
Are there relatively simple formulae or at least bounds for the distribution $S = \sum_i X_i$ available?
Suppose I have independent random variables $X_i$ which are distributed binomially via $X_i \sim \mathrm{Bin}(n_i, p_i)$.
Are there relatively simple formulae or at least bounds for the distribution $S = \sum_i X_i$ available?
See this paper (The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens).
This answer provides an R implementation of the explicit formula from the paper linked in the accepted answer (The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens). (This code can in fact be used to combine any two independent probability distributions):
# explicitly combine two probability distributions, expecting a vector of # probabilities (first element = count 0) combine.distributions <- function(a, b) { # because of the following computation, make a matrix with more columns than rows if (length(a) < length(b)) { t <- a a <- b b <- t } # explicitly multiply the probability distributions m <- a %*% t(b) # initialized the final result, element 1 = count 0 result <- rep(0, length(a)+length(b)-1) # add the probabilities, always adding to the next subsequent slice # of the result vector for (i in 1:nrow(m)) { result[i:(ncol(m)+i-1)] <- result[i:(ncol(m)+i-1)] + m[i,] } result } a <- dbinom(0:1000, 1000, 0.5) b <- dbinom(0:2000, 2000, 0.9) ab <- combine.distributions(a, b) ab.df <- data.frame( N = 0:(length(ab)-1), p = ab) plot(ab.df$N, ab.df$p, type="l")
One short answer is that a normal approximation still works well as long as the variance $\sigma^2 = \sum n_i p_i(1-p_i)$ is not too small. Compute the average $\mu = \sum n_i p_i$ and the variance, and approximate $S$ by $N(\mu,\sigma)$.
It is possible to get a Chernoff bound using the standard moment generating function method: $ \begin{align} \Pr[S\ge s] &\le E[\exp[t \sum_i X_i]]\exp(-st) \\&= \exp\left(\sum_i 1 + (e^t-1) p_i\right) \exp(-st) \\&\le \exp\left(\sum_i \exp((e^t-1) p_i)-st\right) \\&= \exp\left(s-\sum_ip_i-s\log\frac{s}{\sum_i p_i}\right) \end{align}, $ where we took $t=\log(s/\sum_ip_i)$. This is basically equal to the standard Chernoff bound for equal probabilities, just replaced with the sum (or average if you set $s=n s'$.)
Here we (surprisingly) used the inequality $1+x\le e^x$, but a slightly stronger bound may be possible without it. It'll just be much more messy.
Another way to look at the bound is that we bound each variable with a poisson distribution with the same mean.