That doesn't seem to be a "tutorial" but a paper. And the concepts and notation look rather strange to me. One does not normally speak of the entropy of "a message", but of a source. Indeed, the definition of the entropy (of a source, or of a given probability function $P$) is
$$H(P)=\sum_i P(i) \, \log (-P(i))$$
which is the same as the expected value of the quantity $I_i=\log(P(-i))$
The above quantity $I_i$ is usually known as the "information content" of the message $i$. Putting all that together, the entropy is the average of the "information content" of each message.
What the paper (rather unortodoxly) calls "the entropy of a message" is then the "information content" (or self-information, or surprisal mesure) of a message.
And, then, yes, a message with a $P(x_0) \to 0$ has $I_{x_0} \to \infty$ - which is to say that a very improbable message has an enormous information content (as intuition says). Of course, when when computes the true entropy as $H=E(I_x)$, this enormous information content gets multiplied by a small probability, and because $x \log x \to 0$ then the contribution of this message tends to zero.