1
$\begingroup$

From Wikipedia

The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality.

Let $X$ be any real-valued random variable with expected value $E[X] = 0$ and such that$ a ≤ X ≤ b$ almost surely. Then, for all $λ ∈ R$, $$ \mathbf{E} \left[ e^{\lambda X} \right] \leq \exp \left( \frac{\lambda^2 (b - a)^2}{8} \right). $$

Proof of the lemma

Since $e^{\lambda x}$ is a convex function, we have $$ e^{\lambda x}\leq \frac{b-x}{b-a}e^{\lambda a}+\frac{x-a}{b-a}e^{\lambda b} \forall a\leq x\leq b $$ So, $\mathbf{E}\left[e^{\lambda X}\right] \leq \frac{b-EX}{b-a}e^{\lambda a}+\frac{EX-a}{b-a}e^{\lambda b}$.

Let $h=\lambda(b-a), p=\frac{-a}{b-a}$ and $L(h)=-hp+ln(1-p+pe^h)$

Then, $\frac{b-EX}{b-a}e^{\lambda a}+\frac{EX-a}{b-a}e^{\lambda b}=e^{L(h)}$ since $EX=0$

Taking derivative of $L(h)$, $ L(0)=L'(0)=0$ and $L^{''}(h)\leq \frac{1}{4} $ By Tayor's expansion, $$ L(h)\leq \frac{1}{8}h^2=\frac{1}{8}\lambda^2(b-a)^2 $$ Hence,$\mathbf{E}\left[e^{\lambda X}\right] \leq e^{\frac{1}{8}\lambda^2(b-a)^2}$

I was wondering whether the proof has actually used Jensen's inequality? I don't think so.

Thanks!

  • 1
    It does so in the first line, doesn't it?2012-10-16

1 Answers 1

1

Jensen's inequality is used in the line:

Since $e^{\lambda x}$ is a convex function, we have $$ e^{\lambda x}\leq \frac{b-x}{b-a}e^{\lambda a}+\frac{x-a}{b-a}e^{\lambda b} \qquad \forall a\leq x\leq b.$$

  • 0
    Thanks! But it is derived based on $e^{\lambda x}\leq \frac{b-x}{b-a}e^{\lambda a}+\frac{x-a}{b-a}e^{\lambda b} \forall a\leq x\leq b$, and then apply expectation on both sides. Is it really Jensen's inequality?2012-10-16
  • 0
    @Tim : The inequality you mentionned is based on the convexity of the exponential function. This is nothing else than a Jensen's inequality.2012-10-16
  • 0
    My apologies; I was a tad confused as well. The usual Jensen's inequality for convex functions was used (some use this inequality to _define_ a convex function, in particular if it isn't twice differentiable). In this particular case, there is no need to invoke the other (probabilistic) Jensen: $\mathbf E \phi(X) \le \phi(\mathbf EX)$.2012-10-16
  • 0
    Can you point out which version of Jensen's inequality is used in the proof? I am confused.2012-10-16
  • 0
    @Ahriman: "This is nothing else than a Jensen's inequality." Do you mean it uses some version of Jensen's inequality?2012-10-16
  • 0
    In the statement present in my answer, we see an instance of the "non-probabilistic" Jensen's inequality, valid for convex real-valued functions. Cf. [this WikiPedia article](http://en.wikipedia.org/wiki/Jensen's_inequality). The $f$ used in the first paragraph of ref'd article is $\exp(\lambda x)$ and the $t$ used is $\dfrac{b-x}{b-a}$. The other Jensen's inequality is not necessary in this proof. I hope it's clear now.2012-10-16
  • 0
    Thanks! So you are saying the definition of convex function $f(tx+(1-t)x) \leq tf(x) + (1-t)f(x), \forall a \in [0,1]$ is a version of Jensen's inequality?2012-10-16
  • 0
    That seems an accurate summary of what's going on, yes.2012-10-16
  • 0
    This a version of Jensen's inequality on a probability space consisting of two elements which have respectively probability $t$ and $1-t$.2012-10-16
  • 0
    @Ahriman: Thanks! Can you be explicit? What are the two elements in the probability space, so that LHS and RHS can be seen as those in a more usual Jensen's inequality?2012-10-17