For $i \in \{1, 2, \dots, K\}$, we have that $$X_i \sim \hbox{Bernoulli}(p_i) \quad 0 I'm looking to try and calculate the probability mass of the random variable $$S_K = \sum_{i=1}^{K} X_i.$$ I tried to do this recursively, i.e. in the following way
\begin{align}
\mathbb{P}(S_K = s) & = \mathbb{P}(S_{K} = s | S_{K-1} = s) \mathbb{P}(S_{K-1} = s) + \mathbb{P}(S_{K} = s | S_{K-1} = s-1) \mathbb{P}(S_{K-1} = s-1) \\
& = \mathbb{P}(X_K = 0)\mathbb{P}(S_{K-1} = s) + \mathbb{P}(X_K = 1)\mathbb{P} (S_{K-1} = s-1) \\
& = (1-p_K)\mathbb{P}(S_{K-1} = s) + p_K\mathbb{P} (S_{K-1} = s-1).
\end{align} But it lead to discrepancies since I'm not sure you can condition on the random variable $S_{k-1}$. Any ideas? Thanks!
Analytic distribution for the sum of correlated Bernoulli trials
-
1You need to specify how they are correlated. For instance, does $p_i$ depend on the previous value of $X_i$, the previous several values of it? Is it random or deterministic in terms of them? – 2017-01-11
-
0Can you describe the discrepancies? – 2017-01-11
-
0So the $p_i$s don't directly depend on the $X_i$. Each $X_i$ is a realisation from a binary HMM so the $p_i$s are dependent on a hidden Markov process – 2017-01-11
-
0The discrepancies come from simulations – 2017-01-11
-
0Ok. The first line is unassailable. But in the second line when you separate out the the $P(X_K=0),$ etc, they need to still be conditional on $S_{K-1},$ right? – 2017-01-11
1 Answers
Comment: The distribution of $S$ depends on how the $p_i$ are chosen. You say they are simulated, but you do not say according to what distribution or process. The beta family of distributions is a reasonable choice for modeling probabilities, because its support is $(0,1).$
I used $k = 10,$ and simulated distributions of $S$ for four beta distributions of different shapes, with R statistical software:
(1) $Beta(1,1)=Unif(0,1),$ giving $E(S) \approx 5.0,\,SD(S) \approx 1.58.$
m = 10^6; k = 10; p = rbeta(m*k,1,1); x = rbinom(m*k, 1, p)
DTA = matrix(x, nrow=m) # m x k matrix, each row has five Bernoulli's
s = rowSums(DTA) # m sums of k
mean(s); sd(s)
## 5.000446
## 1.580513
(2) $Beta(1,10),$ giving $E(S) \approx 0.91,\,SD(S) \approx 0.91.$
(3) $Beta(10,1),$ giving $E(S) \approx 9.1,\,SD(S) \approx 0.91.$
(4) $Beta(10,10),$ giving $E(S) \approx 5.0,\,SD(S) \approx 1.58.$
Histograms of results are show below; each is based on a sample of a million sums of five Bernoulli trials (with varying beta $p_i$).
Note: (a) I have no idea whether you have an application in mind, or if so, whether beta distributions are suitable. It might be interesting to see whether these are binomial distributions and to derive the means and variances.
(b) In my beta models the ten Bernoulli's in each sum are independent. Some simple Markovian model (e.g., Ehrenfest urn, genetic inheritance model, or DNA sequencing model) might also be used.
