0
$\begingroup$

In this paper on page 4, it is said that $H(x_0|P)=-logP(x_0)$, provided that $P(x_0)$ is 1 and $P(x)=0$, $\forall x \neq x_0$.

I thought that $-P(x_0)logP(x_0)$ is $0$, for $logP(x_0)$ is $0$, but later the author says that "$H(x_0|P)$ is large in this case". But why?

1 Answers 1

1

That doesn't seem to be a "tutorial" but a paper. And the concepts and notation look rather strange to me. One does not normally speak of the entropy of "a message", but of a source. Indeed, the definition of the entropy (of a source, or of a given probability function $P$) is

$$H(P)=\sum_i P(i) \, \log (-P(i))$$ which is the same as the expected value of the quantity $I_i=\log(P(-i))$ The above quantity $I_i$ is usually known as the "information content" of the message $i$. Putting all that together, the entropy is the average of the "information content" of each message.

What the paper (rather unortodoxly) calls "the entropy of a message" is then the "information content" (or self-information, or surprisal mesure) of a message.

And, then, yes, a message with a $P(x_0) \to 0$ has $I_{x_0} \to \infty$ - which is to say that a very improbable message has an enormous information content (as intuition says). Of course, when when computes the true entropy as $H=E(I_x)$, this enormous information content gets multiplied by a small probability, and because $x \log x \to 0$ then the contribution of this message tends to zero.

  • 0
    OK. I have updated it.2017-02-09