3
$\begingroup$

This question comes from the proof of Neyman's factorization theorem in Robert V. Hogg, Joseph W. McKean, Allen T. Craig, "Introduction to Mathematical Statistics", 6th edition, pp 376-377.

enter image description here

In the proof, a one-to-one transformation is used which is indicated by the red line. But I could not understand why such a one-to-one transformation surely exists. Can you tell me?

Thank you for any help!

  • 0
    $y_2,\dots,y_n$ can be chosen to be anything so that the transformation $(x_1,\dots,x_n)\mapsto (y_1,\dots,y_n)$ is bijective. The choice is not going to afect the proof.2012-06-21
  • 0
    I can't get it. Could you please explain in a bit more detail? For example, if $u_1$ is not bijective, how to construct such a one-to-one transformation?2012-06-21
  • 0
    Yes, you seem to be right.2012-06-21
  • 0
    The transformation of x1, x2, ...xn to y1 is the sufficient statistic given in the problem. But I am not sure how the other yis are constructed to get the 1-1 transformation from the xs to the ys.2012-06-22
  • 0
    This proof seems to be handwavy on purpose. For example, $y_{1}$ might not be differentiable and then you would not be able to get the Jacobian. The important claim is that ``if T is sufficient and T(x) = t then the density of t is proportional to the density of x''. There are some problems with that... the density of $t$ with respect to what? Is $T$ necessarily continuous? etc... I can show this rigorously using Measure Theory but I don't know if that would be good for you.2012-06-26
  • 0
    @madprob Yes please. I'll appreciate you if you can come up with one (correct) measure-theoretic proof. I learnt Measure Theory one year ago although at present I have forgotten most of them :-( I'll try my best to understand the proof. Thank you.2012-06-27

2 Answers 2

2

I wrote this stuff as my personal notes in some class as an adaptation of the proof in Theory of Statistics by Mark Schervish.

We use this definition of Sufficiency:

$\textbf{Definition 1.1}:$ A statistic $T$ is $\textit{sufficient}$ for $\Theta$ if $\forall A \in \sigma(X)$, $\exists$ a version of $P_{\theta}(A|T)$ functionally independent of $\theta$. In this case, we call $P_{\theta}(A|T)$ by $P(A|T)$.

This is the more abstract version of $P_{\theta}(X=x|T=t)$ not depending on $\theta$.

Next, we need these Lemmas (I can write the proof if it helps):

$\textbf{Lemma 1.1}$: If $\forall \theta \in \Omega$, $P_{\theta} << \nu$ for a $\sigma$-finite $\nu$, then $\exists (c_{i})_{i=1}^{\infty}$ in $[0,1]$ s.t. $\sum_{i=1}^{\infty}{c_{i}} = 1$ and $(\theta_{i})_{i=1}^{\infty}$ in $\Omega$ s.t. $\nu^{*} := \sum_{i=1}^{\infty}{c_{i}P_{\theta_{i}}}$ is a probability measure and $\forall \theta \in \Omega$, $P_{\theta} << \nu^{*} << \nu$.

$\textbf{Lemma 1.2}$: Suppose $\exists$ $\sigma$-finite $\nu$ on $(\chi,\beta_{X})$ s.t. $\forall \theta \in \Omega, P_{\theta} << \nu$ and that $T$ is sufficient for $\Theta$. Take $\nu^{*}$ as in Lemma $1.1$. $\forall \theta \in \Omega$, $\nu^{*}(A|T)$ is a $\nu^{*}$-version of $P_{\theta}(A|T)$.

$\textbf{Lemma 1.3}$: Suppose $\exists$ $\sigma$-finite $\nu$ on $(\chi,\beta_{X})$ s.t. $\forall \theta \in \Omega, P_{\theta} << \nu$ and take $\nu^{*}$ as in Lemma $1.1$. There $\exists$ functions $m_{1} \& m_{2,\theta}$ s.t. $\forall \theta \in \Omega$, a $\nu$-version of $\frac{dP_{\theta}}{d\nu}$ has $\forall x \in \chi$, $\frac{dP_{\theta}}{d\nu} = m_{1}m_{2,\theta}$, $m_{1}$ functionally independent of $\theta$ and $m_{2}$ $\sigma(T)$-measurable iif $\forall \theta \in \Omega$, there exists a $\sigma(T)$-measurable $\nu^{*}$-version of $\frac{dP_{\theta}}{d\nu^{*}}$. Whenever there is no confusion, we will also call this version by $\frac{dP_{\theta}}{d\nu^{*}}$.

$\textbf{Proof}$: $\textbf{Stage}$ 1: ``if''

$ \ $

First, observe that:

$$\frac{d\nu^{*}}{d\nu} = \sum_{j}{c_{j}\frac{dP_{\theta_{j}}}{d\nu}} =^{\nu} m_{1}\sum_{j}{c_{j}m_{2,\theta_{j}}}$$

Next, observe that, since $\nu^{*} << \nu$:

$$m_{1}m_{2,\theta} =^{\nu^{*}} \frac{dP_{\theta}}{d\nu^{*}} = \frac{dP_{\theta}}{d\nu} \frac{d\nu^{*}}{d\nu} =^{\nu^{*}} \frac{dP_{\theta}}{d\nu} m_{1}\sum_{j}{c_{j}m_{2,\theta_{j}}}$$

Observe that $A = \{x \in \chi: m_{1}\sum_{j}{c_{j}m_{2,\theta_{j}}} = 0\}$ is such that $\nu^{*}(A) = 0$, see\footnote{$\nu^{*}(A) = \int_{A}{d\nu^{*}} = \int_{A}{\frac{d\nu^{*}}{d\nu}d\nu} = \int_{A}{m_{1}\sum_{j}{c_{j}m_{2,\theta_{j}}}d\nu} = 0$}. Hence,

$$\frac{dP_{\theta}}{d\nu^{*}} =^{\nu^{*}} \frac{m_{1}m_{2,\theta}}{m_{1}\sum_{j}{c_{j}m_{2,\theta_{j}}}} = \frac{m_{2,\theta}}{\sum_{j}{c_{j}m_{2,\theta_{j}}}}$$

Since $\frac{m_{2,\theta}}{\sum_{j}{c_{j}m_{2,\theta_{j}}}}$ is $\sigma(T)$-measurable, the proof is complete.

$ \ $

$\textbf{Stage}$ 2: ``only if''

$ \ $

We wish to prove that there exist $m_{1}$ and $m_{2,\theta}$ s.t. $\forall A \in \sigma(X)$,

$$\int_{A}{\frac{dP_{\theta}}{d\nu}d\nu} = \int_{A}{m_{1}m_{2,\theta}d\nu}$$

$$\int_{A}{\frac{dP_{\theta}}{d\nu}d\nu} =^{R.N} \int_{A}{dP_{\theta}} =^{R.N} \int_{A}{\frac{dP_{\theta}}{d\nu^{*}}d\nu^{*}} =^{R.N.} \int_{A}{\frac{dP_{\theta}}{d\nu^{*}}\frac{d\nu^{*}}{d\nu}d\nu}$$

Taking $m_{1} = \frac{d\nu^{*}}{d\nu}$ and $m_{2,\theta} = \frac{dP_{\theta}}{d\nu^{*}}$, the proof is complete.

$ \ $

$ \ $

$\textbf{Theorem 1.1}$ (Fisher-Neyman Factorization): Suppose $\exists$ $\sigma$-finite $\nu$ on $(\chi,\beta_{X})$ s.t. $\forall \theta \in \Omega, P_{\theta} << \nu$. Then $T$ is sufficient for $\Theta$ iif $\exists$ functions $m_{1} \& m_{2,\theta}$ s.t. $\forall \theta \in \Omega$, a $\nu$-version of $\frac{dP_{\theta}}{d\nu}$ has $\forall x \in \chi$, $\frac{dP_{\theta}}{d\nu} = m_{1}m_{2,\theta}$, $m_{1}$ functionally independent of $\theta$ and $m_{2}$ is $\sigma(T)$-measurable.

$ \ $

$\textbf{Proof}:$ $\textbf{Stage}$ 1: ``only if''

$ \ $

We wish to find that $m_{1}$ and $m_{2}$ as in the Theorem such that, $\forall A \in \sigma(X)$, $\forall \theta \in \Omega$:

$$\int_{A}{\frac{dP_{\theta}}{d\nu}d\nu} = \int_{A}{m_{1}m_{2,\theta}d\nu}$$

Take an arbitrary $A \in \sigma(X)$ and consider $\nu^{*}$ as in Lemma $1.1$.

$$\int_{A}{\frac{dP_{\theta}}{d\nu}d\nu} =^{R.N} E_{P_{\theta}}(I_{A}) = E_{P_{\theta}}(P_{\theta}(A|T)) = E_{\nu^{*}}(P_{\theta}(A|T) \frac{dP_{\theta}}{d\nu^{*}}) =^ {L.2} E_{\nu^{*}}(\nu^{*}(A|T) \frac{dP_{\theta}}{d\nu^{*}}) =^{T.L.}$$

$$= E_{\nu^{*}}(\nu^{*}(A|T) E(\frac{dP_{\theta}}{d\nu^{*}}|T)) := E_{\nu^{*}}(\nu^{*}(A|T) m_{2,\theta}) = E_{\nu^{*}}(E_{\nu^{*}}(I_{A}m_{2,\theta}|T)) =^{T.L.} E_{\nu^{*}}(I_{A}m_{2,\theta}) =^{R.N.}$$

$$= E_{\nu}(I_{A}\frac{d\nu^{*}}{d\nu}m_{2,\theta}) = \int_{A}{\frac{d\nu^{*}}{d\nu}m_{2,\theta}d\nu} := \int_{A}{m_{1}m_{2,\theta}d\nu}$$

$\textbf{Stage}$ 2: ``if''

$ \ $

We will show that $\forall \theta \in \Omega$, $\nu_{*}(A|T)$ is a $P_{\theta}$-version of $P_{\theta}(A|T)$. Since $\nu^{*}(A|T)$ is functionally independent of $\theta$, the proof will be complete. Take an arbitrary $\theta$, since $\nu_{*}(A|T)$ and $P_{\theta}(A|T)$ are $\sigma(T)$-measurable, we wish to show that:

$$\forall B \in \sigma(T), \int_{B}{P_{\theta}(A|T)dP_{\theta}} = \int_{B}{\nu^{*}(A|T)dP_{\theta}}$$

Take an arbitrary $B \in \sigma(T)$:

$$\int_{B}{P_{\theta}(A|T)dP_{\theta}} = E_{P_{\theta}}(I_{A}I_{B}) =^{R.N.} E_{\nu^{*}}(I_{A}I_{B}\frac{dP_{\theta}}{d\nu^{*}}) =^{T.L.} E_{\nu^{*}}(E_{\nu^{*}}(I_{A}\frac{dP_{\theta}}{d\nu^{*}}|T)I_{B}) =^{L.3}$$

$$= E_{\nu^{*}}(E_{\nu^{*}}(I_{A}|T)\frac{dP_{\theta}}{d\nu^{*}}I_{B}) = E_{\nu^{*}}(\nu^{*}(A|T)I_{B}\frac{dP_{\theta}}{d\nu^{*}}) =^{R.N.} E_{P_{\theta}}(\nu^{*}(A|T)I_{B}) = \int_{B}{\nu^{*}(A|T)dP_{\theta}}$$

Which completes the proof of Theorem $1.1$.

  • 0
    Thank you very much! I'll try my best to understand it. Thank you again!2012-06-28
  • 0
    Good Luck ;) and feel free to ask for any clarifications.2012-06-28
1

The proof is essentially the same as the one in the discrete case. Let $X = X_{1},\ldots,X_{n}$ be random variables with a discrete distribution.

$T$ is sufficient for $\theta$ if $P_{\theta}(X=x|T=t)$ does not depend on $\theta$.

If $T$ is sufficient,

$$P_{\theta}(X=x) = P_{\theta}(X=x,T=t) = P_{\theta}(T=t)P_{\theta}(X=x|T=t) = g_{\theta}(t)f(x)$$

If $P_{\theta}(X=x) = g_{\theta}(t)f(x)$,

$$P_{\theta}(T=t) = \sum_{x: T(x)=t}{P_{\theta}(X=x)} = g_{\theta}(t)\sum_{x: T(x)=t}{f(x)}$$

Hence,

$$P_{\theta}(X=x|T=t) = \frac{P_{\theta}(X=x,T=t)}{P_{\theta}(T=t)} = \frac{P_{\theta}(X=x)}{P_{\theta}(T=t)} = \frac{g_{\theta}(t)f(x)}{g_{\theta}(t)\sum_{y: T(y)=t}{f(y)}} = \frac{f(x)}{\sum_{y: T(y)=t}{f(y)}}$$

which does not depend on $\theta$.

This idea is also more or less the one the book you posted was trying to follow.