5
$\begingroup$

According to the Wikipedia article on conditional entropy, $\sum p(x,y)\log p(x)=\sum p(x)\log p(x)$. Can someone please explain how?

3 Answers 3

4

$$\sum_{x,y} p(x,y)\log p(x)=\sum_x\sum_yp(x,y)\log p(x)=\sum_x\left(\sum_yp(x,y)\right)\log p(x)=\sum_x p(x)\log p(x)$$

  • 0
    @joriki- I still dint get it. Can you please give a little more detail? Might be I'm missing something very basic here.. some concept of probability2012-01-27
  • 2
    @Buxme In the step $\sum \limits_{y} p(x,y) = p(x)$, you could think of the variable $y$ as being [marginalised out](http://en.wikipedia.org/wiki/Marginal_distribution).2012-01-27
  • 1
    @Buxme: Which of the three steps are you having trouble with?2012-01-27
  • 0
    @joriki- How did y get away? The step Srivatsan mentioned in the second comment..2012-01-27
  • 0
    @Buxme: The other answers explain this step, one in general and one by example. The probability of obtaining $x$ is the probability of obtaining $x$ and any arbitrary value of $y$, and this is just the sum of all probabilities of obtaining $x$ and particular values of $y$.2012-01-27
  • 1
    @Buxme: That step is essentially the [law of total probability](http://en.wikipedia.org/wiki/Law_of_total_probability). The wikipedia page contains a detailed explanation. Read it and get back to us if you still don't understand.2012-01-27
  • 0
    think of using y as an index variable in the joint distribution - you sum them all out and are left with only the marginal on x2012-08-26
3

I find the notation irritating, whereby the same letter, $p$, is used to refer to two or more different functions.

I could write an algebraic proof, but I wonder if a concrete example might not shed more light. Suppose we have $$ \begin{align} P(X=0\ \&\ Y=0) = 1/10 & & P(X=0\ \&\ Y=1) = 2/10 \\ \\ P(X=1\ \&\ Y=0) = 3/10 & & P(X=1\ \&\ Y=1) = 4/10 \end{align} $$ Then $P(X=0)=4/10$ and $P(X=1)= 6/10$. So the first sum above is $$ \underbrace{(1/10)\log (4/10) + (3/10)\log(4/10)} + \underbrace{(2/10)\log(6/10) + (4/10)\log(6/10) }. $$ The second sum above is $$ (4/10)\log(4/10) + (6/10)\log(6/10). $$ So it's really just the distributive law.

3

In the article $p(x,y)$ is summed (integrated) over all possible values of y for a fixed value of x.

$\sum_{y\in\mathcal Y} p(x,y)= p(x)$