0
$\begingroup$

Given a string of random symbols with yet a priori unknown distribution, what are the known algorithms to compute its Shannon entropy?

$H = - \sum_i \; p_i \log p_i$

Is there an algorithm to compute it without calculating the probabilities $p_i$ first? Having calculated the entropy $H_n$ of the first $n$ symbols can I find the entropy $H_{n+m}$ of the $n+m$ symbols (knowing about the first $n$ symbols only $H_n$)?

  • 0
    If you calculate the $p_i$ over all $m+n$ symbols, then you have $H_n=\frac n{m+n}H_{m+n}$ but I suspect that is not the type of result you want. If they are almost correct, it will almost scale. But I don't think there is an easy middle ground.2012-02-02

1 Answers 1

2

Is there an algorithm to compute it without calculating the probabilities $p_i$ first?

I doubt it, but I don't have a proof.

Having calculated the entropy $H_n$ of the first $n$ symbols can I find the entropy $H_{n+m}$ of the $n+m$ symbols (knowing about the first $n$ symbols only $H_n$)?

No. Suppose $H_n = 0$ and the final $m$ symbols are $b\ldots b$. You don't know whether $H_{n+m} = 0$ or $H_{n+m} = -\sum_{i\in\{n,m\}} \frac{i}{n+m} \log \frac{i}{n+m}$

  • 0
    @Yrogirg, if you define "good" tightly enough and have a sufficiently precise value of $H_n$ then you might be able to recover the frequencies and calculate the true probabilities; otherwise, don't expect much joy.2012-02-02