1
$\begingroup$

I have a conceptual doubt in the subject of Sufficient Statistics, based on the next problem:

Let $X_1, X_2, ...,X_n$ be a (random) sample of independent and identically distributed random variables of the next poblation:

$f(x|\theta)=I_{[\theta,\theta +1]} (x) \quad (\theta>0$).

a) Find a sufficient statistic for $\theta$.

Here, I computed the joint density:

$ f(\textbf{x})|\theta)=\prod_{i=1}^{n} I_{[\theta,\theta +1]} (x_i) $

Plus: $\theta\leq X_i\leq\theta+1 \quad \forall i=1,2,...,n$

So: $\theta\leq X_{(1)}\leq X_i\leq X_{(n)}\leq \theta+1 \quad \forall i=1,2,...,n$ .

Then the natural thing is to use the statistic:

$T(\textbf{X})=(X_{(1)},X_{(n)})=(t_1,t_2)$.

So my doubt is: which of these are the correct equivalent expressions to use the Factorization theorem:

  1. $ f(\textbf{x})|\theta)=I_{[X_{(1)},X_{(n)}]} (x_i) \cdot 1$

(Is the expression at the left of the "1" a function of $T$ and $\theta?$

  1. $f(\textbf{x})|\theta)=I_{[\theta,X_{(n)}]} (t_1) \cdot I_{[X_{(1)},\theta +1]} (t_2)\cdot 1$

(Is the thing at the left of the "1" a functon of $\theta$, even thought $\theta$ only appears at the interval of the indicator function?)

And my last question would be:

In general, the dimension of the minimal sufficient statistic is greater or equal than the dimension of $\theta$?

1 Answers 1

2

The joint density is completely described by the function you wrote, without needing to restrict the observations, since by definition, $$\mathbb 1_S(x) = \begin{cases} 1, & x \in S \\ 0, & x \not \in S. \end{cases}$$ So the expression $$f(\boldsymbol x \mid \theta) = \prod_{i=1}^n \mathbb 1_{[\theta, \theta+1]}(x_i) = \mathbb 1_{[\theta,\theta+1]}(x_{(1)}) \mathbb 1_{[\theta,\theta+1]}(x_{(n)})$$ is adequate. Then the factorization theorem allows us to choose a function $\boldsymbol T(\boldsymbol x)$ such that $$f(\boldsymbol x \mid \theta) = h(\boldsymbol x) g(\boldsymbol T(\boldsymbol x) \mid \theta).$$ Then $\boldsymbol T$ is called sufficient for $\theta$. In this case, $h(\boldsymbol x) = 1$: there are no factors of the joint density that are independent of the parameter. if we choose $\boldsymbol T(\boldsymbol x) = (x_{(1)}, x_{(n)})$, then the choice $$g(t_1, t_2 \mid \theta) = \mathbb 1_{[\theta,\theta+1]}(t_1) \mathbb 1_{[\theta,\theta+1]}(t_2)$$ yields the desired factorization. It is not necessary to restrict the intervals on which the indicator functions are $1$ any further than the original $[\theta, \theta+1]$, because $x_{(1)}$ is really just a shortcut for $\min(x_1, x_2, \ldots, x_n)$ and similarly $x_{(n)} = \max(x_1, x_2, \ldots, x_n)$; i.e., $$\boldsymbol T(\boldsymbol x) = (\min \boldsymbol x, \max \boldsymbol x),$$ and it is always true that $x_{(1)} \le x_{(n)}$.

Regarding your last question, I suppose it may not be true if $\boldsymbol \theta = (\theta_1, \ldots, \theta_p)$ exhibits some kind of dependence among the parameters; but perhaps such a case could be regarded as contrived.

  • 0
    Thanks a lot heropup, I've understand the dependence of the joint density with theta. I suppose this yields to the fact that if I have something like: $1_{[t_1,t_2]}(something \quad not \quad depending \quad on \quad \theta)$ It would also be a function of the transformation but not of $\theta$, right?2017-02-24