2
$\begingroup$

Shannon formally defined the amount of information in a message as a function of the probability of the occurrence of each possible message[1]. Given a universe of messages $\mathbf{M} = \{ m_1,m_2,..,m_n \}$ and a probability $\mathbf{p(m_i)}$ for the occurrence of each message, the information content of a message in $\mathbf{M}$ is given by:

$\sum\limits_{i=1}^{n} -p(m_{i})\log_{2}(p(m_{i})) $

How is this formula derived? Why did Shannon add $\log_2$?

[1]C.E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal, 27:379–423, 623–656, 1948.

  • 0
    bits, nats, whatever2011-03-06

2 Answers 2

7

Shannon proves in his paper that in order to code $n$ messages you need roughly $nH(M)$ bits. You can take that as the definition of entropy, and then derive Shannon's formula.

There are also axiomatic derivations of $H(M)$ - you start with some expected properties of the entropy function, and you end up with exactly that formula. I think that's also in Shannon's paper.

The logarithm comes in basically since to represent $T$ different messages you need at least $\log_2 T$ bits.

2

Toss a coin $n$ times. The probability of the outcome you get is $p=(1/2)^n$. The number of bits of information is $n$, which is $-\log_2 p$.