4
$\begingroup$

In my earlier question I asked about a technical aspect of solving a system of equations arising from looking for an entropy-maximizing distribution $p(x)$ continuous on $\mathbb{R}$ and constrained by KL-divergence with a zero-mean Gaussian distribution. That is, in addition to the usual probability density and variance constraints, I have the following constraint for $p(x)$:

$D(p_N(y)\|p(y))=\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_N}e^{-y^2/2\sigma^2}\log\frac{\frac{1}{\sqrt{2\pi}\sigma}e^{-y^2/2\sigma^2}}{p(y)}dy<\epsilon$

Thanks to user anon, the form of the function $p(y)$ was found, but it is not a density function and now I am trying to interpret why is this the case.

First, here is the system of equations (copied from the earlier question) that I derived using Calculus of Variations (and help from Gallager's "Information Theory and Reliable Communication":

$\begin{align} 0&=\log(p(y))+1-\lambda-\gamma y^2-\eta \left(\frac{e^{-y^2/2}}{\sqrt{2\pi}}\right)\left(\frac{1}{p(y)}\right)\\ 0&=1-\int_{-\infty}^{\infty}p(y)dy\\ 0&=1-\int_{-\infty}^{\infty}y^2p(y)dy\\ 0&=c+\int_{-\infty}^{\infty}\frac{e^{-y^2/2}}{\sqrt{2\pi}}\log(p(y))dy \end{align} $

(for simplicity I set $\sigma=1$; $c=\epsilon+\frac{1}{2}\log(2\pi )$)

From anon's helpful comment, we can actually solve the first equation in terms of Lambert W function to obtain the following:

$p(x)=\frac{\eta e^{-y^2/2}}{\sqrt{2\pi}W(e^{-(1+2\gamma)y^2/2+(1-\lambda)})}$

When $|y|\rightarrow\infty$, $e^{-ay^2+b}\rightarrow 0$, and since $W(0)=0$, $p(y)\rightarrow\infty$. Thus, this is obviously not a pdf!

This is entirely due to the KL-divergence constraint (very similar situation arises when variance constraint is removed). How does one explain this? There are obviously probability distributions that meet the KL-divergence constraint (e.g. a Gaussian with appropriately picked variance). Does this mean that optimal distribution does not exist, and all distributions one can try would be sub-optimal? Is there a rigorous explanation for this?

Perhaps I did something wrong? Is there another method I should have employed?

  • 3
    Stupid question: is $1 + 2\gamma$ necessarily positive?2011-09-22

1 Answers 1

3

You established the indeterminate form $0/0$; from this you cannot casually conclude $p\to\infty$. In the last line of my comment that you cite, I used the trick $e^{W(t)}=t/W(t)$ in order to simplify the expression - but if you want to look at it asymptotically, you can also analyze the prior form: $p(y)=\exp(-\alpha+W(e^{\alpha}\beta)),$ where $\alpha=1-\lambda-\gamma y^2$ and $\beta=\eta e^{-y^2/2}/\sqrt{2\pi}$. Now if $\gamma\in[-1/2,0)$, we get the form $p=e^{-\infty+0}$, so we find that $p\to0$. For $\gamma<-1/2$, you'll have to find a way to show $W(ae^{-(1/2+\gamma)y^2})+\gamma y^2\to-\infty \text{ as } |y|\to\infty.$ Above $a>0$ is arbitrary (and $\eta e^{1-\lambda}/\sqrt{2\pi}$ in particular). Note $W(x)>1$ for $x>e$, so we have $e^W\le W e^W =x$ hence $W\le\log x$ for sufficiently large $x$, which makes the above $-O(y^2)\to-\infty$ (allowing notational sloppiness), proving $p\to0$ as $|y|\to\infty$ whenever we have $\gamma<0$.


This only addresses your claim that $p$ doesn't vanish at the extremes. For your broader questions concerning maximum entropy, KL divergence, or the original optimization problem, I really have no idea.

  • 0
    Sorry, I had a family emergency of sorts and haven't been able to reply here. First of all, thank you again, anon. I am appreciative of your time -- you have really elucidated this for me.2011-09-27