2
$\begingroup$

I just learned about Chernoff Bounds and am wondering if one can prove the Asymptotic Equipartition Property using them instead of the Weak Law of Large Numbers (which is the consequence of the Chebyshev Inequality). I started with a what I thought was a simple case: sequence of i.i.d. standard Gaussian random variables $\{X_i\}$ where $X_i\sim\mathcal{N}(0,\sigma^2)$. We are interested in proving:

$P\left(\left|-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)-H(X)\right|\geq\epsilon\right)\rightarrow 0$

First step in applying Chernoff Bound is finding the Moment Generating Function. However, I ran into trouble immediately:

$\begin{array}{rcl}E\left[\exp(-t\log(\phi(x))\right]&=&\int e^{-t\log(\phi(x))}\phi(x)dx\\ &=&\int\exp\left(-t\log\left(\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/\sigma^2}\right)\right)\phi(x)dx\\ &=&\int\exp\left(t\left(\log\left(\sqrt{2\pi}\sigma\right)+x^2/\sigma^2\right)\right)\phi(x)dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}\exp\left(tx^2/\sigma^2\right)}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}dx\\ &=&\int\frac{\left[2\pi\sigma^2\right]^{\frac{t}{2}}}{\sqrt{2\pi}\sigma}\exp\left(\frac{-x^2(1-t)}{2\sigma^2}\right)dx\\ &=&? \end{array}$ where $\phi(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}$. I am stuck because $t\in\mathbb{R}$ and thus can be greater than one...

What did I do wrong? Is it even possible to prove AEP using Chernoff Bounds?

  • 0
    What specific part is less easy to *digest*?2012-01-19

2 Answers 2

2

Let me abstract a little bit the situation and show that the inequality you have in mind stems from a rather general result (as mentioned in the comments, the following should be explained in every introductory text on large deviations principles). Consider some integrable random variables $\xi$ and $(\xi_n)_{n\geqslant1}$ and, for some $n\geqslant1$ and some positive $x$, the event

$ A_n=[\xi_1+\cdots+\xi_n\geqslant n\mathrm E(\xi)+nx]. $

Using first $\xi_i=\log\phi(X_i)$ and then $\xi_i=-\log\phi(X_i)$, the event you consider is the union of two events $A_n$, hence our task is to bound $\mathrm P(A_n)$.

Now comes an additional, non trivial, hypothesis: assume that $\xi$ is exponentially integrable, that is:

$\qquad\qquad$ There exists $t_0\gt0$ such that $\mathrm E(\mathrm e^{t\xi})$ is finite for every $|t|\lt t_0$. $\qquad(\ast)$

Then the proof is simple. Start from the almost sure inequality, valid for every nonnegative $t$: $ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathbf 1_{A_n}\leqslant\mathrm e^{t(\xi_1+\cdots+\xi_n)}=\mathrm e^{t\xi_1}\cdots\mathrm e^{t\xi_n}. $ Integrate both sides and use the independence assumption. This yields $ \mathrm e^{nt(\mathrm E(\xi)+x)}\,\mathrm P(A_n)\leqslant\mathrm E(\mathrm e^{t\xi})^n. $ We proved that, for every nonnegative $t$, $ \mathrm P(A_n)\leqslant u(t)^n\qquad\text{where}\qquad u(t)=\mathrm E(\mathrm e^{t\xi})\mathrm e^{-t(\mathrm E(\xi)+x)}. $ Note that a lot of these inequalities are just useless. For example, $u(0)=1$ hence the $t=0$ version is $\mathrm P(A_n)\leqslant 1$... Big deal! Even worse: if $t\gt0$ is such that $u(t)\gt1$, or such that $\mathrm E(\mathrm e^{t\xi})$ is infinite, we painfully proved that $\mathrm P(A_n)$ is bounded by something greater than $1$, or infinite. :-)

On the other hand, if there exists at least some nonnegative $t$ such that $u(t)\lt1$, the inequality becomes a strong control on $\mathrm P(A_n)$, namely, that $\mathrm P(A_n)\to0$ at least as fast as a geometric sequence. Hence the key question becomes:

$\qquad\qquad\qquad$ Does there exist some $t\gt0$ such that $u(t)\lt1$?

To answer this, let us consider the limit $t\to0$ with $t\gt0$. Then, $ \mathrm E(\mathrm e^{t\xi})=1+t\mathrm E(\xi)+o(t),\qquad \mathrm e^{-t(\mathrm E(\xi)+x)}=1-t(\mathrm E(\xi)+x)+o(t), $ hence $u(t)=1-tx+o(t)$. Since $x\gt0$, this limited expansion implies that $u(t)\lt1$ for (at least) some (small) $t\gt0$. The proof is over.

To sum up, the result stems from a Markov-type inequality plus a limited expansion of the Laplace transform $t\mapsto\mathrm E(\mathrm e^{t\xi})$ at $t=0$. The Markov-type inequality is powerful because the $\xi_n$ are independent. The limited expansion is valid because we assumed $(\ast)$.

Note that the proof above uses the finiteness of $\mathrm E(\mathrm e^{t\xi})$ for $0\leqslant t\leqslant t_0$ only. The finiteness for $-t_0\leqslant t\leqslant 0$ is needed when dealing with the event $[\xi_1+\cdots+\xi_n\leqslant n\mathrm E(\xi)-nx]$, hence the control $(\ast)$ must indeed be two-sided.

Coming back at last to the question you asked, if $\xi=\log\phi(X)$ with $X$ centered normal of variance $\sigma^2$ and $\phi$ the density of the distribution of $X$, then $\mathrm E(\mathrm e^{t\xi})$ is finite if and only if $\mathrm E(\phi(X)^{t})$ is finite if and only if $\mathrm E(\mathrm e^{-tX^2/(2\sigma^2)})$ is finite if and only if $t+1\gt0$, hence $(\ast)$ holds with $t_0=1$.

  • 0
    I finally was able to properly sit down and read your answer. Thank you, I think it's clear to me now. Basically, one *can* use Chernoff Bounds to prove AEP for a limited selection of distributions of the random variables (i.e. when (*) holds).2012-02-07
0

I don't think we can use Chernoff bound to prove AEP. We should actually prove $P\left(\left|-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)-H(X)\right|>\epsilon\right)\rightarrow 0$ So we should prove both $P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\rightarrow 0 \text{ and }P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)

Chernoff bound reads, for any $t>0$, $P(X>x)\le e^{-t x} E(e^{t X})$. Your calculation of $E(e^{t \log \phi(X)})$ is correct. This expectation exists for $t<1$.

Now $P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\le e^{-nt(H(X)+\epsilon)} \cdot E\left[e^{-t \sum_{i=1}^n\log \phi(X_i)}\right]$ where $E\left[e^{-t\sum_{i=1}^n\log \phi(X_i)}\right]=n\cdot E(e^{-t \log \phi(X)})=\frac{n}{(\sqrt{2\pi}\sigma)^{1-t}}\int_{-\infty}^{\infty}\exp\left(\frac{-x^2(1-t)}{2\sigma^2}\right)dx$ which is positive for $t<1$. So $P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i)>H(X)+\epsilon\right)\rightarrow 0$ But I don't think we can use Chernoff bound to show the other probability also goes to $0$.

Edit: As Bullmoose requested I will post the reason why Chernoff bound doesn't work to the show the other probability goes to zero.

Chernoff bound in the other case is $P(X. So what I think we would get in the other case is that $\begin{eqnarray*}P\left(-\frac{1}{n}\sum_{i=1}^n\log \phi(X_i) and the right hand side goes to $\infty$ as $n\to \infty$ for $\epsilon.

  • 0
    My comment on Bullmoose's answer applies to this one as well.2012-01-05