4
$\begingroup$

I have a sequence of IID random variables $X_1, X_2, \dots, X_n$. In this particular case, each of the variables is Lévy distributed with PDF

$f(x) = (\lambda / 2 \pi x^3)^{-1/2} \exp(-\lambda/2x)$

for $x > 0$, and $f(x) = 0$ otherwise.

I'm trying to find the probability, given constants $\tau > 0$ and $b < n$, that there exists an interval of length $\tau$ which contains at least $b$ of the $n$ random variables.

My first approach was to use order statistics; for example, if $X_{(1)}, X_{(2)}, \dots, X_{(n)}$ are the order statistics, the probability that the $b$ smallest fall in an interval of length $\tau$ could be found using the joint distribution of $X_{(1)}$ and $X_{(b)}$. From the Wikipedia article,

$f_{X_{(j)},X_{(k)}}(x,y)dx\,dy=n!{[F_X(x)]^{j-1}\over(j-1)!}{[F_X(y)-F_X(x)]^{k-1-j}\over(k-1-j)!}{[1-F_X(y)]^{n-k}\over(n-k)!}f_X(x)f_X(y)\,dx\,dy$

This could then be integrated over the simplex $0 \leq x \leq y \leq x + \tau$, and the result could be repeated for each group of $b$ random variables and summed. However, I feel that this approach leads to double counting (for example, both $X_{(1)} \dots X_{(b)}$ and $X_{(n-b+1)} \dots X_{(n)}$ could fall in disjoint intervals of length $\tau$) and also seems to be difficult to obtain in closed form.

The other approach I considered was to use the joint density of all order statistics:

$f_{X_{(1)},\ldots,X_{(n)}}(x_1,\ldots,x_n)\,dx_1\cdots dx_n=n!f_X(x_1)\cdots f_X(x_n)\,dx_1\cdots dx_n$

However, I can't determine how to express the region of integration in any meaningful way. Any thoughts or pointers would be appreciated!

2 Answers 2

1

The nonexistence of such interval is equivalent to $X_{(i+b-1)} > X_{(i)}+r$ for $i=1\ldots n-b+1$. So for the probability of the complement of your event, integrate $n! f(x_1) \ldots f(x_n)$ over the region defined by $ x_{b-1} > x_{b-2}> \ldots >x_2 > x_1>0 $ and $ x_b > \max(x_{b-1}, x_1 + r), \ldots, x_n > \max(x_{n-1}, x_{n-b+1} + r). $

  • 0
    Thanks, that makes a lot of sense. This helps when using numerical approximations, but the bounds don't seem to lead to a closed form solution - I suppose this is unlikely anyways.2011-05-25
1

As said before, there is no simple closed form solution. But one can prove some upper and lower bounds, which yield a precise value in a semi-explicit range of values of $b$, $n$ and $\tau$.


We start with some notations. For $x\le y$, call $g(x,y)=P(x\le X_1\le y)$ the integral of the density function $f$ of $X_1$ from $x$ to $y$. For any subset $I$ of $\{1,2,\ldots,n\}$ of size $b$, call $A_I=[R_I\le\tau]$ where $R_I$ is the range of the sample $(X_k)$ over $I$, that is, $ A_I=[R_I\le\tau],\qquad R_I=\max\{X_k;k\in I\}-\min\{X_k;k\in I\}. $ Using the fact that the event $[x\le\min\{X_k;k\in I\},\max\{X_k;k\in I\}\le y]$ has probability $g(x,y)^b$, one sees that $P(A_I)=\alpha_b$ for every $I$ of size $b$, with

$ \alpha_b=\int bf(x)g(x,x+\tau)^{b-1}\mathrm{d}x. $

The event $A$ that there exists an interval of length (at most) $\tau$ which contains (at least) $b$ values from the sample of size $n$ is $ A=\bigcup_IA_I,\quad A_I=[R_I\le\tau], $ where the union is over the subsets $I$ of $\{1,2,\ldots,n\}$ of size $b$. By the inclusion-exclusion principle, S-S'\le P(A)\le S,\quad S=\sum_{I}P(A_I),\ S'=\sum_{I\ne J}P(A_I\cap A_J). For every $I$ of size $b$, $P(A_I)=\alpha_b$. For every $I\ne J$ of size $b$, $A_I$ and $A_J$ are independent if $I\cap J=\emptyset$ and positively correlated otherwise, hence $P(A_I\cap A_J)\ge \alpha_b^2$. Finally,

$ {n\choose b}\alpha_b-\frac12\left({n\choose b}^2-{n\choose b}\right)\alpha_b^2\le P(A)\le{n\choose b}\alpha_b. $

This reads approximately as $p-\frac12p^2\le P(A)\le p$ with $p=\displaystyle{n\choose b}\alpha_b$ hence this estimation of $P(A)$ is as precise as the upper bound $p$ is small.