10
$\begingroup$

Let $X_1,X_2,\dots$ be i.i.d. samples drawn from a discrete space $\mathcal{X}$ according to probability distribution $P$, and denote the resulting empirical distribution based on n samples by $\hat{P}_n$. Also let $Q$ be an arbitrary distribution. It is clear that (KL-divergence)

$KL( \hat{P}_n || Q) \stackrel{n\rightarrow \infty}{\longrightarrow} KL(P || Q)$,

but I am wondering if there exist any known quantitative rate of convergence for it. I mean if it can be shown that

$\Pr\Big[ | KL( \hat{P}_n || Q) - KL(P || Q) | \geq \delta\Big] \leq f(\delta, n, |\mathcal{X}|)$,

and what is the best expression for the RHS if there is any.

Thanks a lot!

  • 0
    Just to clarify: You're asking for a uniform bound in $P$ and $Q$?2012-12-06
  • 0
    @cardinal: Yes. In fact I want to know how many samples should I take to guarantee a maximum gap for the KL-divergence. Note that $n$ can be much larger that $|\mathcal{X}|$, i.e. $\hat{P}_n$ closely follow the true $P$.2012-12-06
  • 0
    I am wondering whether (or how) $D(\hat{P_n}\|P)\to 0$ is different from $D(\hat{P_n}\|Q)\to D(P\|Q)$.2012-12-08
  • 0
    @Ashok: The KL-divergence is not a _true_ metric (http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence). In particular, the triangle inequality does NOT hold, and $D(\hat{P}_n\| Q)−D(P \| Q)$ can be larger than $D(\hat{P}_n \| P)$. Hence, even if $D(\hat{P}_n\| P)\rightarrow 0$ implies $ D(\hat{P}_n \| Q)\rightarrow D(P\| Q)$ (does it?), the convergence rate of the former does not guarantee the convergence rate for the latter.2012-12-10
  • 0
    @Sam: Yes,yes, you are right. I know KL-divergence is not a metric. But I just wanted to know the difference between $D(\hat{P}_n\| P)\rightarrow 0$ and $D(\hat{P_n}\|Q)\to D(P\|Q)$. I am also very interested in your question.2012-12-10
  • 0
    Sam, I had hoped to get back to looking at this question a bit more closely, but am very tied up at the moment, so I thought I'd at least suggest in the interim that you look at Liam Paninski's work on estimating entropy and mutual information. I recall some very strong *negative* results for such questions for arbitrary entropy estimators.2012-12-10

1 Answers 1