2
$\begingroup$

I want to know how to go about proving that the Good-Turing estimator has a total probability of $1$. I have seen this proof (page 2) but I found unclear the first step:

$$\sum_j \theta[j] = \sum_r \theta[r]N_r = \frac{1}{N}\sum \left[(r+1) \frac{N_{r+1}}{N_r}\right]N_r$$

$\theta[j]$ is the probability of having an $n$-gram, $\theta[r]$ is the probability of a $n$-gram occurring $r$ times and $N_r$ is the number of $n$-grams that occur $r$ times.

Since $\sum_r(r+1)N_{r+1}$ = $\sum_r rN_r$, it's more or less straightforward that it actually sums $1$. However, as I said, I don't understand the first part: $\sum_j \theta[j] = \sum_r \theta[r]N_r$. What is going on there?

Thanks in advance.

1 Answers 1

1

$\theta[j]$ is the probability that a future sample will be $x_j$, and $\theta(r)$ is the probability of a word occurring given that it appeared $r$ times in the sample $W$. Let $J(r)=\{j:x_j\text{ appeared }r\text{ times in }W\}$. It’s assumed (near the bottom of page 1) that if $j,j\,'\in J(r)$, then $\theta[j]=\theta[j\,']$, and this common value is $\theta(r)$. Thus,

$$\begin{align*} \sum_j\theta[j]&=\sum_r\sum_{j\in J(r)}\theta[j]\\ &=\sum_r\theta(r)|J(r)|\\ &=\sum_r\theta(r)N_r\;. \end{align*}$$

  • 0
    Thanks. A question. How do you think about $\sum_{r} \sum_{j\in J(r)}\theta[j]$ so it's obvious that the left side, the sum of all those probabilities, is equal to the right side, the sum of every probability of each element that appeared $r$ times summed over each possible $r$?2012-08-28
  • 0
    @Robert: It’s the same kind of rearrangement as in $$\begin{align*}2+3+1+3+2+1+1+1+2&=(1+1+1+1)+(2+2+2)+(3+3)\\&=4\cdot1+3\cdot2+2 \cdot 3\;,\end{align*}$$ gathering together like terms.2012-08-28
  • 0
    Yes, I know but for some reason, it doesn't feel obvious. I will think a bit more about it.2012-08-28
  • 0
    Now it feels obvious. Thanks!2012-08-28
  • 1
    @Robert: :-) You’re welcome. (It may help that I’ve seen many very similar manipulations.)2012-08-28