0
$\begingroup$

Let $X_1, \ldots,X_k$ represent k integers between $1$ and $n$ drawn randomly and without replacement (i.e., there are never repeated numbers). What is the PMF of $Y$, the random variable representing the nearest neighbor distance between draws?

This kind of problem falls into the general setting of computing distances between random variables. Is there any general formalism to work with? I have no idea how to start.

Thanks in advance.

  • 0
    Yes. Uniformou, Butantã without juxtaposition or the results (not sure what the name for this is)2012-07-25

2 Answers 2

1

Update: I have modified the answer using a more sensible definition of $Y$. In terms of the gaps, we only care that $g_j\geq r$ for $2\leq j\leq k$ and we don't care about the size of $g_1$ or $g_{k+1}$. In this case, the number of suitable compositions is $n-(r-1)(k-1)\choose k$, and $\mathbb{P}(Y\geq r)={{n-(r-1)(k-1)\choose k}\over {n\choose k}}.$


I have interpreted the problem as follows: you are interested in $Y$, the smallest gap between the sampled values $X_1,\dots,X_k$. If this isn't what you want, you may want to clarify the question.

Let's look more closely at the gaps between the $X$'s. In addition to the gaps $G_1=X_{(1)}$ and $G_j=X_{(j)}−X_{(j−1)}$ for $2≤j≤k$, it is convenient to introduce the final gap $G_{k+1}=(n+1)−X_{(k)}$. Here the bracket notation refers to order statistics.

The random vector $(G_1,G_2,\dots,G_{k+1})$ is a random composition of the number $n+1$. That is, all outcomes $(g_1,g_2,\dots,g_{k+1})$ with $g_1+g_2+\cdots+g_{k+1}=n+1$, $g_j≥1$ are equally likely. There are $n\choose k$ such compositions, as found using stars and bars.

Similarly, the number of compositions with all the $g_j\geq r$ is $n-(r-1)(k+1)\choose k$. Therefore, $\mathbb{P}(Y\geq r)={{n-(r-1)(k+1)\choose k}\over {n\choose k}}.$ From this you can work out any property of the random variable $Y$.

See this question as well: What is the distribution of gaps?

  • 0
    I am defining $Y$ to be the *smallest* of all the $G$'s. I'm not quite sure if that is what you wanted.2012-07-25
0

The removal of numbers with uniform distribution leaves the marginal distribution for the remaining draws uniform, so all the nearest-neighbour distances have the same distribution as the first one, which is just

$p(|X_1-X_2|=d)=2\frac{n-d}{n(n-1)}$

for $d=0\ldots n-1$.

  • 0
    Hi @joriki. I really appreciate the answer, but not only did I not understand your argument, but computer simulations I have performed also show a clearly non-linear PMF. Also, in your explanation, where does the fact that there has been k draws enter? This is clearly relevant (taking, for instance, the extreme cases $k=1$ and $k=n$). Thanks again for the attention.2012-07-25