0
$\begingroup$

For $i \in \{1, 2, \dots, K\}$, we have that $$X_i \sim \hbox{Bernoulli}(p_i) \quad 0

I'm looking to try and calculate the probability mass of the random variable $$S_K = \sum_{i=1}^{K} X_i.$$

I tried to do this recursively, i.e. in the following way \begin{align} \mathbb{P}(S_K = s) & = \mathbb{P}(S_{K} = s | S_{K-1} = s) \mathbb{P}(S_{K-1} = s) + \mathbb{P}(S_{K} = s | S_{K-1} = s-1) \mathbb{P}(S_{K-1} = s-1) \\ & = \mathbb{P}(X_K = 0)\mathbb{P}(S_{K-1} = s) + \mathbb{P}(X_K = 1)\mathbb{P} (S_{K-1} = s-1) \\ & = (1-p_K)\mathbb{P}(S_{K-1} = s) + p_K\mathbb{P} (S_{K-1} = s-1). \end{align}

But it lead to discrepancies since I'm not sure you can condition on the random variable $S_{k-1}$. Any ideas? Thanks!

  • 1
    You need to specify how they are correlated. For instance, does $p_i$ depend on the previous value of $X_i$, the previous several values of it? Is it random or deterministic in terms of them?2017-01-11
  • 0
    Can you describe the discrepancies?2017-01-11
  • 0
    So the $p_i$s don't directly depend on the $X_i$. Each $X_i$ is a realisation from a binary HMM so the $p_i$s are dependent on a hidden Markov process2017-01-11
  • 0
    The discrepancies come from simulations2017-01-11
  • 0
    Ok. The first line is unassailable. But in the second line when you separate out the the $P(X_K=0),$ etc, they need to still be conditional on $S_{K-1},$ right?2017-01-11

1 Answers 1

0

Comment: The distribution of $S$ depends on how the $p_i$ are chosen. You say they are simulated, but you do not say according to what distribution or process. The beta family of distributions is a reasonable choice for modeling probabilities, because its support is $(0,1).$

I used $k = 10,$ and simulated distributions of $S$ for four beta distributions of different shapes, with R statistical software:

(1) $Beta(1,1)=Unif(0,1),$ giving $E(S) \approx 5.0,\,SD(S) \approx 1.58.$

m = 10^6;  k = 10;  p = rbeta(m*k,1,1); x = rbinom(m*k, 1, p)
DTA = matrix(x, nrow=m)  # m x k matrix, each row has five Bernoulli's
s = rowSums(DTA)         # m sums of k
mean(s);  sd(s)
## 5.000446
## 1.580513

(2) $Beta(1,10),$ giving $E(S) \approx 0.91,\,SD(S) \approx 0.91.$

(3) $Beta(10,1),$ giving $E(S) \approx 9.1,\,SD(S) \approx 0.91.$

(4) $Beta(10,10),$ giving $E(S) \approx 5.0,\,SD(S) \approx 1.58.$

Histograms of results are show below; each is based on a sample of a million sums of five Bernoulli trials (with varying beta $p_i$).

enter image description here

Note: (a) I have no idea whether you have an application in mind, or if so, whether beta distributions are suitable. It might be interesting to see whether these are binomial distributions and to derive the means and variances.

(b) In my beta models the ten Bernoulli's in each sum are independent. Some simple Markovian model (e.g., Ehrenfest urn, genetic inheritance model, or DNA sequencing model) might also be used.