0
$\begingroup$

Given a string of random symbols with yet a priori unknown distribution, what are the known algorithms to compute its Shannon entropy?

$$H = - \sum_i \; p_i \log p_i$$

Is there an algorithm to compute it without calculating the probabilities $p_i$ first? Having calculated the entropy $H_n$ of the first $n$ symbols can I find the entropy $H_{n+m}$ of the $n+m$ symbols (knowing about the first $n$ symbols only $H_n$)?

  • 1
    If you assume the probabilities you found for the first $n$ symbols are correct for the next $m$, the entropy scales by the number of symbols, so you will have $H_{m+n}=\frac{m+n}nH_n$2012-02-01
  • 0
    @RossMillikan and what if they are almost correct, but I want to improve them with the next $m$ symbols?2012-02-02
  • 0
    If you calculate the $p_i$ over all $m+n$ symbols, then you have $H_n=\frac n{m+n}H_{m+n}$ but I suspect that is not the type of result you want. If they are almost correct, it will almost scale. But I don't think there is an easy middle ground.2012-02-02

1 Answers 1

2

Is there an algorithm to compute it without calculating the probabilities $p_i$ first?

I doubt it, but I don't have a proof.

Having calculated the entropy $H_n$ of the first $n$ symbols can I find the entropy $H_{n+m}$ of the $n+m$ symbols (knowing about the first $n$ symbols only $H_n$)?

No. Suppose $H_n = 0$ and the final $m$ symbols are $b\ldots b$. You don't know whether $H_{n+m} = 0$ or $$H_{n+m} = -\sum_{i\in\{n,m\}} \frac{i}{n+m} \log \frac{i}{n+m}$$

  • 0
    Assume that the input string in some sense is "good", that is [a,a,b,b,b,a,b,a,a,b..] and not [a..a,b..b]. What then?2012-02-02
  • 0
    @Yrogirg, if you define "good" tightly enough and have a sufficiently precise value of $H_n$ then you might be able to recover the frequencies and calculate the true probabilities; otherwise, don't expect much joy.2012-02-02