5
$\begingroup$

In the following integral, p(x) and q(x) are probability distributions. Can you help me to determine in what situation this integral is equal to infinity. For example, I think that such a situations is when only p(x) has an infinite peak.

$\int_{-\infty}^{\infty}\{\log\frac{p(x)}{q(x)}\}p(x)dx$

Thank you very much!

2 Answers 2

4

As Dinesh says if the support of $\mathbf{P}$ includes points not in the support of $\mathbf{Q}$ then the Kullback-Leibler divergence will be infinite (or undefined). However this is not the only way this can happen. For a simple example I'll use a discrete distribution, so your integral becomes the sum $\sum_{n\in\mathbb{N}}\mathbf{P}(n)\log\left(\frac{\mathbf{P}(n)}{\mathbf{Q}(n)}\right)$ then define: $\mathbf{P}(n) = 2^{-n}$ and $\mathbf{Q}(n)=\begin{cases} 2^{-n-2^n} & n\geq 2\\ 1-\sum_{n=2}^\infty 2^{-n-2^n} & n=1\end{cases}$ then for $n\geq 2$, $\frac{\mathbf{P}(n)}{\mathbf{Q}(n)}=2^{2^n}$ so $\mathbf{P}(n)\log\left(\frac{\mathbf{P}(n)}{\mathbf{Q}(n)}\right)=1$. So: $D_{KL}(\mathbf{P}||\mathbf{Q})=\sum_{n=1}^\infty \mathbf{P}(n)\log\left(\frac{\mathbf{P}(n)}{\mathbf{Q}(n)}\right)=\infty$

I don't know if there are any nice necessary and sufficient conditions. But the best sufficient condition I can come up with is, in the discrete case: if Shannon's entropy of $\mathbf{P}$, $\mathrm{H}(\mathbf{P})$ is finite and $\log(\mathbf{Q}(x)),\frac{\mathrm{d}\mathbf{P}}{\mathrm{d}\mathbf{Q}}\in\mathscr{L}^2(\mathbf{Q})$. In the case of continuous distributions with pdfs it's just a matter of replacing all the pmfs with pdfs. The proof is identical in both cases: \begin{align} D_{KL}(\mathbf{P}||\mathbf{Q}) & =E_\mathbf{P}\left[\log\left(\frac{\mathrm{d}\mathbf{P}}{\mathrm{d}\mathbf{Q}}\right)\right]\\ & = E_\mathbf{P}(\log\mathbf{P}(x))-E_\mathbf{P}(\log\mathbf{Q}(x))\\ & = \mathrm{H}(\mathbf{P})-E_\mathbf{Q}\left[\frac{\mathrm{d}\mathbf{P}}{\mathrm{d}\mathbf{Q}}\log(\mathbf{Q}(x))\right] \end{align} In the final line the first term is finite by assumption and the second term is finite by the Schwarz inequality.

2

The integral that you have is the Kullback-Leibler divergence between distributions P and Q, $D_{KL}(P \parallel Q)$. This divergence is roughly a kind of a "distance" between the two distributions. The reason "distance" is in quotes is because this divergence is not symmetric and is hence not a metric. However, a useful way to think about the divergence $D_{KL}(P \parallel Q)$ is that it is the penalty paid for mistaking distribution $P$ as distribution $Q$. This statement can be made precise using information theory. If $D_{KL}(P \parallel Q)$ is infinity, it is because the two distributions are quite unlike each other that you incur an infinite penalty for mistaking $P$ as $Q$. This can happen for instance when distribution $P$ can produce values that distribution $Q$ can never do - in this case, mistaking $P$ as $Q$ is indeed a grievous error. I will leave it to you to interpret this in terms of the integral above to derive the condition under which the divergence is infinite.

Update: Adding more information in response to Marco's comment below. It got too unwieldy to be left as a comment: Given any $M>0$ and any distribution $P$, we can find $Q$ such that $D_{KL}(P \parallel Q) > M$. But note that as $M$ grows, we need to adaptively change $Q$ to make sure the divergence grows larger than $M$. This is different from saying $D_{KL}(P \parallel Q) = \infty$ for a given $P$, $Q$ which is what the question is stating. I think this can happen only if the support of $P$ includes points not in the support of $Q$.

  • 0
    The update in this question is incorrect. See my answer.2018-09-04