0
$\begingroup$

Suppose a set of $m$ integers from $0$ to $n-1$. The integers are uniformly distributed and unique in the set ($n \gg m$). Then, put all the integers into a list an sort that list: $$x_0 < x_1 < \dots{} < x_{m-1}$$

What is the probability distribution of $x_i$, and of $d_i := x_{i+1} - x_i$ ?

I stumbled on a very similar question, but it only discusses the minimal and average value of $d_i$. It does however present an interesting way to think of it:

Give a rope of size $n$, I cut it on $m$ places chosen uniformly at random (such that, in my case, no two cuts are done at the same spot). What is the probability distribution of the position of the $i^{th}$ cut ? What is the probability distribution of the size of the rope bits ?

I ran some simulations, and $d_i$ seems to have an exponential, maybe Gamma distribution (which is intuitive), and $x_i$ is bell-shaped (probably hypergeometric, which again is rather intuitive) but I cannot figure it out on paper. Any thoughts ?

  • 0
    It's a widespread misconception that whether a question is a duplicate depends on the *answers* to the question it duplicates. The *answers* to the question you linked to only discuss the minimal and average value of $d_i$, but the *question* is about the distribution. The reason your question is not a duplicate of that question is not that that question hasn't been fully answered, but that you also ask about the distribution of the $x_i$.2012-11-29
  • 0
    The elements are also unique in the other question; that's what "without replacement" means.2012-11-30

1 Answers 1

2

If $x_i=x$, then $x_0,\dotsc,x_{i-1}\in\{0,\dotsc,x-1\}$ and $x_{i+1},\dotsc,x_{m-1}\in\{x+1,\dotsc,n-1\}$. There are $\binom xi\binom{n-x-1}{m-i-1}$ different choices, so the probability is

$$ P(x_i=x)=\frac{\binom xi\binom{n-x-1}{m-i-1}}{\binom nm}\;. $$

For the distribution of $d_i$, add another option $n$ and another unique integer $x_m$, consider the options $0$ to $n$ arranged in a circle, cut the circle by removing $x_m$ and renumber the remaining options beginning with $0$ next to the cut and ending with $n-1$ on the other side of the cut.

What remains is a linear arrangement distributed exactly as in your scenario, but the symmetry we had before we broke it by removing $x_m$ allows us to conclude that all $m+1$ runs of unused options are on the same footing and thus have the same distribution.

But the distribution of the first run of unused options, the ones less than $x_0$, is easy to determine: The probability that $x_0=d-1$ and the remaining $m-1$ numbers are greater is

$$ P(d_i=d)=\frac{\binom{n-d}{m-1}}{\binom nm} $$

for $1\le d\le n-m+1$ and $0$ otherwise.

  • 0
    Thanks for the answer! I actually had that after I posted the question, but I was not able to check that $\sum_{i=0}^{N-1} \Pr(x_i = x) = 1$, and somehow started doubting on it. The way you put it makes sense though, and I did that sum numerically which seems to be right! A note on $d$: it seems that $\Pr(d_i = 0) = \frac m {N-m+1} > 0$ which makes no sense if elements are unique... Yet you did take that into account in the pdf of $x_i$. Any idea ?2012-11-30
  • 0
    @doc: I added the bounds for $d$. You can check (e.g. using Wolfram|Alpha) that the sum of $P(d_i=d)$ over those values of $d$ is $1$.2012-11-30
  • 0
    Indeed, thanks for this great answer :-)2012-11-30