3
$\begingroup$

Assume a single real-valued variable, and frequent irregular observations of its value over a series of time-spans. In each time span, I'm assuming the samples are from a distribution of values which the variable assumes during that time span... I intend to compare distributions between time-spans, hoping to find patterns that are otherwise obscured by the detail of the irregular sampling.

The most obvious way I can see to characterise these distributions is the first n central moments... so, for example, I can calculate approximations of the Mean, Variance, Skew and Kurtosis for the sample data. I am aware, however, of distributions which do not have meaningful definitions of central moments (Cauchy distributions, for example) and recognise that trying to characterise such a distribution by approximated central moments is unlikely to provide useful insights.

Aside from central moments, what other approaches might I use to try to classify distributions?

1 Answers 1

2

Characteristic functions are often used. The characteristic function of the probability distribution of a real random variable $X$ is $ \psi_X(t) = \mathbb{E}\left(e^{itX}\right). $ If $t$ is real, this always exists. There's an immense literature on this topic.

In one sense this next topic won't answer your question: you seem to want something that is still there when moments don't exist. But as an alternative to central moments these are of interest. The $n$th cumulant $\operatorname{cum}_n(X)$ of a probability distribution of a random variable exists only if the $n$th moment exists. It shares with the $n$th moment and the $n$th central moment the property of $n$th-degree homogeneity: $\operatorname{cum}_n(cX) = c^n \operatorname{cum}_n(X)$. When $n>1$, it shares with the $n$th central moment the property of translation-invariance: $\operatorname{cum}_n(X+c)=\operatorname{cum}_n(X)$. (If $n=1$, we have equivariance rather than invariance: $\operatorname{cum}_1(X+c)=\operatorname{cum}_1(X)+c$.) Finally, it shares with the second and third central moments the "cumulative property": for $X_1,\ldots,X_n$ independent, we have $\operatorname{cum}_n(X_x+\cdots+X_n)=\operatorname{cum}_n(X_1)+\cdots+\operatorname{cum}_n(X_1)$. Higher central moments, i.e. $n\ge4$, don't have that property.

The second and third cumulants are in fact the second and third central moments; the first cumulant is the expectation; the fourth cumulant is $ \operatorname{cum}_4(X) = \mathbb{E}\left( (X - \mathbb{E}(X))^4\right) - 3\left(\operatorname{var}(X)\right)^2. $

See the Wikipedia article.

  • 0
    Thanks for your answer. The snag as I see it with a Characteristic-function approach is that I'd need to have a model for the system in order to pick a function which I'd then fit to my data. My situation is that I have data; I think treating it as a series of distributions is sensible - but I have no credible model for the system I'm measuring. I've already read about Cumulants - but discounted these as I'd assumed they'd offer no advantage over central moments... In what circumstances would Cumulants work best?2011-11-29