4
$\begingroup$

Problem statement:

The entropy of a probability vector $ p = (p_1, ... , p_n)^T $ is defined as $ H(p)= - \sum\limits_{i=1}^{n} p_i \log{p_i} $, subject to $ \sum\limits_{i=1}^{n} p_i = 1 \mbox{ and } p_i \geq 0$, where $ 0\log{0} = 0 $ by convention.

What is the largest and smallest entropy of any probability vector?

Discussion

I can maximise the expression using Lagrange multipliers without a problem. It's also equally easy to prove that the smallest entropy of any probability vector is 0, by noting that it must be non-negative, and providing an example with entropy 0.

However, I would very much like to prove that the minimum value is 0 using Lagrange multipliers (the fact that I can't seems to me to suggest that there is something that I haven't understood).

I set up the Lagrangian: $ L(p, \lambda) = - \sum p_i\log{p_i} + \lambda(\sum p_i - 1) $ and by differentiating to find stationary points obtain the system: $ {\frac{\partial{L}}{\partial{p_i}}}= -\log{p_i} - 1 + \lambda = 0 $. The obvious solution to this gives me my maximum point, but I am expecting another solution for the minimum.

  • Am I misusing Lagrange multipliers?
  • Is there another solution to my system that I am missing (maybe involving the "convention" about $ 0\log{0} = 0$)?
  • Do I need to worry about slackness and the condition $p_i \geq 0$ to find the minimum?
  • Is finding this minimum beyond the scope of this method for some reason, and my lecturers are in fact expecting me to resort to the "easy" method described above?

Any hints or answers are welcome, thank you in advance.

  • 0
    The inequality constraints do not appear in the Lagrangian, so can have no 'effect'. A bigger issue is that the objective is not differentiable at the boundary. In this case, it would be more straightforward to note that $H(p) \geq 0$, and that $H(p) = 0$ is only attained when each $p_i$ is 0 or 1.2012-10-03
  • 0
    Hi, thank you for your answer - I hadn't understood that the Lagrange method fails to provide extrema that lie on the boundary of the set of possible values for p.2012-10-03
  • 0
    There are two separate issues here. One is just the fact that the function is not differentiable on the whole domain. The second is that the inequality constraints $p_i \geq 0$ need to be incorporated. Handling the latter would involve a generalization using KKT conditions. Both are issues here.2012-10-03

2 Answers 2