2
$\begingroup$

The goal is to maximize the following function: \begin{align} K_p(q) = q\log \frac{q}{p} + (1-q)\log \frac{1-q}{1-p} \end{align} where \begin{align} 0 \leq q \leq 1 \end{align} and $p \in (0,0.5)$ and is some constant. In other words, I'm maximizing the KL divergence in terms of one Bernoulli variable with success parameter $q$ with regard to another Bernoulli with success parameter $p$. The way I've set it up, I can see without formal calculations that it's maximized when $q=1$.

However, I'd like to use the KKT conditions to solve this problem formally and I'm running into problems doing that.

First, I set up the Lagrangian to minimize with the KKT conditions: \begin{align} \mathcal{L}(q,\mu,\lambda) & = -K_p(q) - \mu(1-q) - \lambda q \newline \textrm{where} \quad & \mu, \lambda \geq 0 \newline & 0 \leq q \leq 1 \newline & \mu(1-q) = 0 \newline & \lambda q = 0 \end{align} I then set the derivative in terms of $q$ to zero and plug it back in and get the following dual form: \begin{align} D(\mu,\lambda) & = -\mu + \log(1-p+pe^{\mu-\lambda}) \newline & = \log(e^{-\mu}-pe^{-\mu}+pe^{-\lambda}) \end{align} which I'm supposed to maximize.

But again, I run into the problem where the maximum of the dual is attained when $\mu,\lambda=0$ (and I got these values again by looking at the equation because setting derivatives to zero was infeasible). Which, according to the KKT conditions, means that q is neither zero nor one. But that can't be right, because the maximum of $K$ is attained when $q=1$.

And my obvious question is: what am I doing wrong? And how is this done correctly?

2 Answers 2

1

Is there any specific reason, you need to use KKT conditions to solve this problem? I have written a formal way of solving it below.

$$f(q) = q \log \left( \frac{q}{p} \right) + \left( 1-q \right) \log \left( \frac{1-q}{1-p} \right)$$ $$\frac{\partial f}{\partial q} = \log \left( \frac{q}{p} \right) + q \times \frac{p}{q} \times \frac{1}{p} - \log \left( \frac{1-q}{1-p} \right) + \left( 1-q \right) \times \frac{1-p}{1-q} \times \frac{-1}{1-p} = \log \left( \frac{q}{p} \frac{1-p}{1-q} \right)$$

Now $q = p$ gives the minimum of $0$ since $\frac{\partial^2 f}{\partial q^2} = \frac{1}{q} + \frac{1}{1-q} = \frac{1}{q (1-q)} > 0$

Note that when $q>p$, $f'(q) > 0$ and when $q

$q=0$ gives us $f(0) = - \log(1-p)$ and $q=1$ gives us $f(1) = - \log(p)$.

So if $p \in (0,0.5]$, then the maximum occurs at $q=1$ with the maximum being $- \log(p)$ and if $p \in [0.5,1)$, then the maximum occurs at $q=0$ with the maximum being $- \log(1-p)$

  • 0
    Thanks for the reply. But I'm solving this to learn about and get practice applying the KKT conditions. I just thought maximizing KL divergence for two Bernoullis would be nice toy problem for this. In real applications, I think I've only come across cases where people try to _minimize_ KL divergence, not maximize it. As well they should, since I can't really see any use for maximizing it.2011-02-27
  • 0
    But if this happens to be a problem where KKT conditions can't be applied towards a solution, and you happen to know why, I'd be just as happy to find out.2011-02-27
  • 0
    @JasonMond: I think you got the sign of $\mu$ wrong. You have $0 \leq q \leq 1$. The way you have written it you need $\mathcal{L}(q,\mu,\lambda) = -K_p(q) + \mu(1-q) - \lambda q$2011-02-27
  • 0
    @Sivaram Ambikasaran According to the textbook I got it from ("Numerical Optimization" by Nocedal and Wright), the sign in front of the multiplier is negative if the constraint is positive.2011-02-27
  • 0
    @Jason: Minimize $f$ subject to $g \leq 0$ then the function $\mathcal{L}$ is $f+ \mu g$ where $\mu g = 0$, $g \leq 0$ and $\mu \geq 0$ and hence your $\lambda$ and $\mu$ both must be both of the opposite sign.2011-02-27
  • 0
    http://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions2011-02-27
  • 0
    @Sivaram Ambikasaran Yes, but in this case $g = 1-q \geq 0$ so I need to put a negative in front of $(1-q)$ to flip the inequality.2011-02-27
0

Setting the derivative of $\mathcal{L}$ to zero does not guarantee that you obtain the minimum of it. I believe $\mathcal{L}$ is a concave function of $q$ at least for some $p,\lambda,\mu$.

And also, the function $K_p$ to maximize is a convex function; therefore, solving the dual problem may not give you an optimal solution to the primal problem.

  • 0
    Yes, I suspected as much after sleeping on this for a few days. I'm still not entirely comfortable with using Lagrange multipliers when it comes to inequality constraints. If you should ever come up with a way of maximizing $K_p$ using Lagrange multipliers, please share.2011-03-01
  • 0
    @JasonMond To find KKT points with inequalities, you can consider all possible cases, that is, for your problem (i) q=1, lambda=0, (ii) q=0, mu=0. For each case, you can determine other variables and see if they all satisfy all other KKT conditions. If all conditions are met, then you obtained a KKT point.2011-03-01