2
$\begingroup$

Exercise 2.27 in Elements of Information Theory (2nd ed.) reads:

Let $\mathbf{p} = (p_{1}, p_{2}, \ldots, p_{m})$ be a probability distribution on $m$ elements (i.e., $p_{i} \geq 0$ and $\sum_{i=1}^{m} p_{i} = 1$). Define a new distribution $\mathbf{q}$ on $m - 1$ elements as $q_{1} = p_{1}, q_{2} = p_{2}, \ldots, q_{m-2} = p_{m-2}$, and $q_{m-1} = p_{m-1} + p_{m}$ [i.e., the distribution $\mathbf{q}$ is the same as $\mathbf{p}$ on $\{1, 2, \ldots, m - 2\}$, and the probability of the last element in $\mathbf{q}$ is the sum of the last two probabilities of $\mathbf{p}$]. Show that $ H(\mathbf{p}) = H(\mathbf{q}) + (p_{m-1} + p_{m})H\left( \frac{p_{m-1}}{p_{m-1} + p_{m}}, \frac{p_{m}}{p_{m-1} + p_{m}} \right). $

Question 1. What does $H\left( \frac{p_{m-1}}{p_{m-1} + p_{m}}, \frac{p_{m}}{p_{m-1} + p_{m}} \right)$ mean in this context?

Question 2. Where is this notation introduced in the text?

Please do not provide a solution to the exercise.

  • 0
    Further, near example 4.2.1, the book also expresses the entropy-rate of a$2$state markov-chain as $H(\mathcal{X})=[\beta H(\alpha) + \alpha H(\beta)]/(\alpha + \beta)$. Apparently, $H(\alpha) \equiv H(\alpha, 1-\alpha)$.2016-07-08

1 Answers 1

1

To define the entropy $H$, all you need is a probability distribution; that is, a vector of nonnegative real numbers summing to $1$. In this case, it happens to be the vector $\mathbf r = (a,b)$ with $ a := \frac{p_{m-1}}{p_{m-1}+p_m}, b := \frac{p_m}{p_{m-1} + p_m}. $

Note that before defining $H(\mathbf r) = H(a,b)$, you must still verify that $\mathbf r = (a,b)$ is a well-defined probability distribution. That is, you should check that: $a \geq 0, b \geq 0$, and $a+b = 1$. (Easy exercise!)

Added: The OP remarked that the book defines $H(X)$ for a random variable $X$, but not $H(\mathbf p)$ for a distribution vector $\mathbf p$. So as per @Didier's comment below, I will explicitly say what this means. Apparently, the book does define the notation after all; see the comments below. Originally I had defined the notation $H(\mathbf p)$ explicitly; I will just keep it in my answer.

For a probability distribution $\mathbf p$, we can define $H(\mathbf p)$ by the familiar Shannon formula: $ H(\mathbf p) := \sum_{i=1}^{n} p_i \log_2 \left(\frac{1}{p_i}\right). $ We can also relate this definition to the entropy of a random variable. Imagine a random variable $X$ with support $\{ 1,2, \ldots, n \}$ such that $\Pr[X = i] = p_i$. Then directly from the two definitions, we see that $H(X)$ is equal to $H(\mathbf p)$.

  • 0
    @Srivatsan You're right, thanks.2011-09-06