1
$\begingroup$

I am interested in calculating the Hellinger distance $H(f,g)$ between two Beta distributions $f$ and $g$ of which I already know the parameters for. I am aware that you can calculate it directly using the 2-norm of discrete distributions. But it would be nicer to have full analytical expression.

The wikipedia page that I link to give nice expression for Gaussian, exponential, Weibull, and Poisson distributions. But how to derive similar expressions for:

  1. the 2-parameter Beta distribution defined on $[0,1] \in \mathbf{R}$?
  2. the 4-parameter Beta distribution having arbitrary support in $\mathbf{R}$?

The last option is also known (atleast for me) as the Pearson Type I distribution (was this the origin of the Beta distribution?). I have used the pearsrnd function in MATLAB and much of my data seems to fit a type I distribution.

I just need it for univariate statistics. Please remember the factor $1/2$ as I need the distance in the range $[0,1]$. Whether or not the expression gives me the squared distance is not so important.

Addendum:

I tried to solve it directly in Mathematica 7 usintg Integrate. I created two functions: (1) hellingerDistanceA that implements the integration directly and (2) hellingerDistanceB that evaluates the expression given by Sasha below. The answer by Sasha seems to be correct:

hellingerDistanceA[a_, b_, c_, d_] :=   1 - Integrate[     Sqrt[       Times[         PDF[BetaDistribution[a, b], x],         PDF[BetaDistribution[c, d], x]       ]     ],     {x, 0, 1},     Rule[Assumptions, {Element[x, Reals]}]   ]  hellingerDistanceB[a_, b_, c_, d_] :=   1 - Divide[     Beta[(a + c) / 2, (b + d) / 2],     Sqrt[       Times[         Beta[a, b],         Beta[c, d]       ]     ]   ]  hellingerDistanceA[1/2, 1/2, 5, 1] // N (* gives 0.251829... *) hellingerDistanceB[1/2, 1/2, 5, 1] // N (* also gives 0.251829... *) hellingerDistanceA[2, 2, 2, 5] // N (* gives 0.148165... *) hellingerDistanceB[2, 2, 2, 5] // N (* also gives 0.148165... *) 

1 Answers 1

3

Let $X$ and $Y$ be independent beta random variables, such that $X \sim \operatorname{Beta}(a_1, b_1)$ and $Y \sim \operatorname{Beta}(a_2, b_2)$. Then $ \begin{eqnarray} H(X, Y) &=& 1 - \int_0^1 \sqrt{ f_X(t) f_Y(t) } \mathrm{d} t \\ &=& 1 - \frac{1}{\sqrt{B(a_1,b_1) B(a_2,b_2)}} \int_0^1 t^{ (a_1 + a_2)/2 -1} (1-t)^{(b_1 + b_2)/2-1} \mathrm{d} t \\ &=& 1 - \frac{B\left(\frac{a_1+a_2}{2}, \frac{b_1+b_2}{2}\right)}{\sqrt{B(a_1,b_1) B(a_2,b_2)}} \end{eqnarray} $

Calculations with 4-parameter beta random variables will be similar. The integrals might be expressible in terms of incomplete beta functions.

  • 0
    Good, because then everything works out :) Btw, great tools you people are building at Wolfram Research.2012-07-02