3
$\begingroup$

Let $Y_1, \ldots ,Y_n$ be iid with density $f(y;\theta)$.

We assume that $\dfrac\partial{\partial\theta}\log f(y ; \theta)$ and $\dfrac{\partial^2}{\partial\theta^2}\log f(y ; \theta)$ exist for all $\theta$. Consider a class $G$ of real functions $g(Y;\theta)$ such that:

1) $E_\theta[g(Y;\theta)] = 0$.

2) $\dfrac{\partial{g(Y;\theta)}}{\partial\theta}$ exists, is negative and bounded for all $\theta$ and $Y$.

3) $E_\theta[g^2(Y;\theta)] < \infty,\,$ for all $\theta$.

The goal is to show that the score function (derivative of the log likelihood with respect to $\theta$) is a member of $G$ that minimizes:

$ \frac{E_{\theta}[g^2(Y;\theta)]}{(E_\theta[\partial_\theta g(Y;\theta)])^2}$

Thanks for your help.

  • 2
    Sorry. Bad joke. (+1)2012-12-18

1 Answers 1

2

This is a bit short but here how the argument goes.

First, the formula $\frac{E_{\theta} [g^2 (Y ; \theta)]}{(E_{\theta} [\partial_{\theta} g (Y ; \theta)])^2}$ is the asymptotic variance of the estimator obtained by the estimating equation. Take $g \left( y ; \theta \right) = \frac{\partial}{\partial \theta} \log f \left( y ; \theta \right)$ (that's the score), then the numerator will be given by $E_{\theta} \left(\left[ \frac{\partial}{\partial \theta} \log f \left( y ; \theta \right) \right]^2\right)$ and the denominator will be given by $\left( E_{\theta} \left[ \frac{\partial^2}{\partial \theta^2} \log f \left( y ; \theta \right) \right] \right)^2$ Remember that the Fisher information is given by $I \left( \theta \right) = E_{\theta} \left[ \frac{\partial}{\partial \theta} \log f \left( y ; \theta \right) \right]^2 = - E_{\theta} \left[ \frac{\partial^2}{\partial \theta^2} \log f \left( y ; \theta \right) \right]$ Then the ratio simplifies to $\frac{1}{I \left( \theta \right)}$. This is the Cramér-Rao bound and therefore the variance of the most efficent estimator (the smallest possible variance in the class considered here).

For a very detailed and rigorous treatment of the above, I recommend "Large sample estimation and hypothesis testing" by Newey and McFadden which is freely available on the internet. Look at theorem 3.1 p.2143 for the variance of the original M-estimatorr, theorem 3.3 p.2146 for the speical case of the score function and lemma 5.4 p.2174 for why that is the minimal variance estimator in that class.

Otherwise, the book by Young and Smith on statistical inference has a very simple short proof about the Cramér-Rao bound in the scalar case.

  • 0
    @Sam Condition 2 is probably two strong. In any case, an additional assumption on the set of $\theta$'s being compact is required even for a weaker version. Condition 3 follows by Cauchy-Schwarz from the first condition. If you are still having difficulties, I think it is a good idea to ask a new question on these particular issues to clarify that. (question with a real-analysis tag)2012-12-21