This article states that the formula for the average run lengths for large numbers of trials is:$\frac{1}{1-Pr(event\ in\ one\ trial)}.$
My questions
- What is the intuition behind this formula?
- Do you know an elementary proof for this result?
This article states that the formula for the average run lengths for large numbers of trials is:$\frac{1}{1-Pr(event\ in\ one\ trial)}.$
My questions
This is one of the workhorses of first courses in probability. Assume one repeats an experiment whose probability to happen in one trial is $p$, that the results of the successive trials are independent, and that one wishes to estimate the mean number $R$ of consecutive successes before the next failure (that is, a run of successes) once a success occurred.
For every $r\geqslant1$, the event $[R\geqslant r]$ corresponds to $r-1$ supplementary successes after the first one, hence $\mathrm P(R\geqslant r)=p^{r-1}$. Thus the distribution of $R$ is geometric with parameter $p$ and the mean length of a run is the expectation of $R$, that is, $ \langle R\rangle=\sum_{r\geqslant1}r\cdot\mathrm P(R=r)=\sum_{r\geqslant1}\mathrm P(R\geqslant r)=\sum_{s\geqslant0}p^s=\frac1{1-p}. $ Edit Alternatively, note that either a failure occurs immediately after the first success and then $R=1$, which happens with probability $1-p$, or a second success occurs immediately after the first success and then $R=1+R'$ where $R'$ is distributed like $R$, which happens with probability $p$. Taking expectations, one sees that $\langle R\rangle=(1-p)\cdot1+p\cdot(1+\langle R\rangle)$, that is, $\langle R\rangle=1/(1-p)$.
The run is ended by a failure, and the probability of failure is $1-p$, where $p$ is the probability of success. Since the density of failures in the sequence of results is $1-p$, the average length of a run ending in a failure must be the reciprocal of that. The failure itself isn't part of the run, so you might think you'd have to subtract $1$ from this result; however, if you're calculating the average length of a run, the fact that you're only looking at runs already guarantees that there's at least one success, and this success that you get "for free" offsets the failure to be subtracted.