I know the definition of regular family is that the exponential family whose natural parameter space is an open set. But I don't know why do we need such a definition. What's the difference of the behaviour of the regular family and non-regular family? Do you have any idea about it?
What's the motivation for the definition of "regular family" in exponential family distribution?
1 Answers
Short story: the "regular family" condition arises naturally when you want the sufficient statistic $T(X)$ to be minimal. In plain English, for regular exponential families, any sufficient statistic is a function of $T(X)$.
From Keener's textbook:
Suppose $\mathcal{P}$ is an $s$-parameter exponential family with densities $p_\theta(x)=e^{\langle \eta(\theta), T(x)\rangle- B(\theta)} h(x)$ for $\theta \in \Omega$.
$T$ is a sufficient statistic, by the factorization theorem. Is it minimal sufficient? One way to show minimal sufficiency is to show "$p_\theta(x) \propto_\theta p_\theta(y)$ implies $T(x)=T(y)$," where $\propto_\theta$ means "for fixed $x$ and $y$, $p_\theta(x)/p_\theta(y)$ is a constant when viewed as a function of $\theta$."
In this case, $p_\theta(x) \propto_\theta p_\theta(y)$ implies $$\langle \eta(\theta), T(x)\rangle = \langle \eta(\theta), T(y)\rangle + c$$ for all $\theta$, where $c$ is constant in $\theta$ but maybe a function of $x$ and $y$.
If we take two points $\theta_1,\theta_2 \in \Omega$ and apply the above to each, and subtract the two resulting expressions, we get $$\langle \eta(\theta_1)-\eta(\theta_2), T(x) - T(y) \rangle = 0.$$ If the exponential family is full rank, then $\eta(\theta_1)-\eta(\theta_2)$ can point in any direction in $\mathbb{R}^s$ with appropriately chosen $\theta_1$ and $\theta_2$, so $T(x)=T(y)$ necessarily since $T(x)-T(y)$ must be orthogonal to all directions. Thus $T$ is a minimal sufficient statistic. Without the full rank assumption we cannot conclude this.
Minimal sufficient vs minimal representation
We have $T(X)$. It is a minimal sufficient statistic if any other sufficient statistic $F(X)$ can be written as a function of $T(X)$, i.e. $F(X) = g(T(X))$.
$T(X)$ is a minimal representation if $\langle v, T(x)\rangle$ is not a constant, when viewed as a function of $x$. This is to ensure that each distribution is associated with one $\eta(\theta)$. If $\langle v, T(x)\rangle$ were a constant, then replacing $\eta(\theta)$ with $\eta(\theta)+v$ does not change the distribution (the normalization constant will change appropriately). If a representation is not minimal, then I think you can reduce the dimension $s$ of the sufficient statistic.
I am not sure how the two notions are related. I think a minimal sufficient statistic can be a minimal rep. or a non-minimal rep.
-
0thanks a lot for you answer! But I am still a little confused of ⟨η(θ),T(x)⟩=⟨η(θ),T(x)⟩+c. Could you explain why it is not ⟨η(θ),T(y)⟩+c? Thanks! – 2017-02-01
-
0@southdoor You are right, that was a typo. – 2017-02-01
-
0thanks for your clarification! I have one more question. Is it the term minimal sufficient referred in the answer is the same concept with minimal representation? And I know the opposite one is overcomplete representation. – 2017-02-01
-
0@southdoor See my edit. I am not sure they are related. – 2017-02-01