11
$\begingroup$

What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)\ge0$

  • 3
    Convexity of the function $t\mapsto t\log t$.2012-06-17
  • 0
    In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.2018-08-10

1 Answers 1

16

By definition, $$I(X;Y) = -\sum_{x \in X} \sum_{y \in Y} p(x,y) \log\left(\frac{p(x)p(y)}{p(x,y)}\right)$$ Now, negative logarithm is convex and $\sum_{x \in X} \sum_{y \in Y} p(x,y) = 1$, therefore, by applying Jensen Inequality we will get, $$I(X;Y) \geq -\log\left( \sum_{x \in X} \sum_{y \in Y} p(x,y) \frac{p(x)p(y)}{p(x,y)} \right) = -\log\left( \sum_{x \in X} \sum_{y \in Y} p(x)p(y)\right) = 0$$ Q.E.D

  • 1
    What if the variables are continuous?2016-01-13
  • 1
    @becko The same arguments hold, you just have to replace the summations by integrals.2016-01-13
  • 0
    why must be the sum of p(x,y) = 1?2018-04-17
  • 1
    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 12018-06-12