10
$\begingroup$

Let $X_1,X_2,\dots$ be i.i.d. samples drawn from a discrete space $\mathcal{X}$ according to probability distribution $P$, and denote the resulting empirical distribution based on n samples by $\hat{P}_n$. Also let $Q$ be an arbitrary distribution. It is clear that (KL-divergence)

$KL( \hat{P}_n || Q) \stackrel{n\rightarrow \infty}{\longrightarrow} KL(P || Q)$,

but I am wondering if there exist any known quantitative rate of convergence for it. I mean if it can be shown that

$\Pr\Big[ | KL( \hat{P}_n || Q) - KL(P || Q) | \geq \delta\Big] \leq f(\delta, n, |\mathcal{X}|)$,

and what is the best expression for the RHS if there is any.

Thanks a lot!

  • 0
    Sam, I had hoped to get back to looking at this question a bit more closely, but am very tied up at the moment, so I thought I'd at least suggest in the interim that you look at Liam Paninski's work on estimating entropy and mutual information. I recall some very strong *negative* results for such questions for arbitrary entropy estimators.2012-12-10

1 Answers 1

1

Here's a thought. I don't know whether I am in an imaginary world and asking too much. Any way, let me propose this. I am writing it as an answer as it is little larger to put it as a comment.

Suppose, for every fixed $Q$, we can find a linear family $\Pi$ of prob. distributions consisting of all empirical measures $\hat{P_n}$ and $P$, and such that $P$ is the $I$-projection of $Q$ on $\Pi$, then the Pythagorean property $D(\hat{P_n}\| Q)=D(\hat{P_n}\|P)+D(P \| Q)$ holds and hence the convergence rate of $D(\hat{P}_n\| Q)\to D(P \| Q)$ is same as that of $D(\hat{P}_n\| P)\to 0$.