4
$\begingroup$

I have an oracle that samples uniformly from sets of size up to around n or n2.

How do I efficiently approximate a uniform distribution on the set of non-decreasing sequences of length k whose terms are from {1,2,…,n} (using the oracle)?

This set has about (n+k−1) choose k elements, O(n^k), and is too big to directly use the uniform oracle.

Sample values that give trouble are (n=31,k=26) and (n=31,k=100). In particular, k will always be a bit large compared to n, so there is a substantial difference between non-decreasing (what I want) and increasing.


Things that don't work:

Pick the first term uniformly, then pick the second term uniformly from the [a1…n], then pick the third term uniformly from [a2…n], etc. The resulting distribution is too heavily skewed to be useful (1≤1≤…≤1 has probability (1/n)k, while n≤n≤…≤n has probability 1/n).

Sampling uniformly from {1,2,…,n}k and then sort the result is also too heavily skewed. For n=k, 1≤2≤…≤n is n! times more likely than 1≤1≤…≤1.

Sampling uniformly from {1,2,…,n}k and discarding unsorted sequences is not efficient enough (something in the range of k! oracle calls per sample produced; for n=31, k=26, it is 89704535825984961898313 calls per sample).


Another approach that is fine by me:

What is an efficiently computable bijection between {1,2,…,Binomial(n+k−1,k)} and the set of non-decreasing sequences of length k whose terms are positive integers less than or equal to n?

I can sample uniformly at random from {1,2,…,m} for m up to nk using a different oracle. Hence if I can efficiently convert from positive integers to sequences, then that is a good solution.

  • 0
    I'm now fine with a solution that describes a solution for strictly increasing (or decreasing) sequences, as I can convert between non-decreasing and increasing easily.2011-03-21

2 Answers 2

3

Each increasing sequence

$1 \le a_1 < a_2 < \ldots < a_k \le n+k-1$

corresponds to a non-decreasing sequence

$1 \le a_1 \le a_2-1 \le a_3-2 \le \ldots a_k - (k-1) \le n$

and this correspondence is reversible -- in other words, it's a bijection between increasing sequences of length $k$ in $\{1 ,2, \ldots, n+k-1\}$ and nondecreasing sequences of length $k$ in $\{1, 2, \ldots n \}.$

So if you can generate increasing sequences, you're done.

A well-known way to generate increasing sequences or random subsets $S$ of $[n] = \{ 1, 2, \ldots, n \}$ of size $k$ is based on the observation that $n \in S$ with probability $k/n$ (by symmetry, or by explicit enumeration). So determine whether $n \in S$ according to this rule, and then generate a subset of $[n-1]$ of size $k$ or $k-1$ according to whether $n \not \in S$ or $n \in S$.

(I apologize for the fact that this doesn't explicitly use your oracle. I hope this is still helpful.)

  • 0
    To be honest, this is probably in Knuth.2011-03-21
3

Using your oracle, you find an integer $X$ uniformly distributed from $1$ to ${n+k-1 \choose k}$. You then find $Y$ such that ${Y+k-2 \choose k} < X \le {Y+k-1 \choose k}$. $Y$ is then the largest term in your desired sequence, and you are left trying to find a non-decreasing sequence with $k-1$ terms were the largest term is no more than $Y$.

So you do it again, and again, ... but you can avoid taking any new random numbers and instead use the difference between the random number you have and the binomial number you compared it with.

For example, if you want a random sequence with $k=6$ terms where the largest is no more than $n=9$, then ${6+9-1 \choose 9} = 3003$; your random number might be for example $957$ and ${12 \choose 6} = 924 < 957 \le 1716 = {13 \choose 6}$, so your largest term is 8. Now you want a random sequence with 5 terms where the largest is no more than 8; your remaining random number is $957-924 = 33$ and ${7 \choose 5} = 21 < 33 \le 56 = {8 \choose 5}$, so your next largest term is 4. Now you want a random sequence with 4 terms where the largest is no more than 4; your remaining random number is $33-21=12$ and ${5 \choose 4} = 5 < 12 \le 15 = {6 \choose 4}$, so your next largest term is 3. Now you want a random sequence with 3 terms where the largest is no more than 3; your remaining random number is $12-5=7$ and ${4 \choose 3} = 4 < 7 \le 10 = {5 \choose 3}$, so your next largest term is 3. Now you want a random sequence with 2 terms where the largest is no more than 3; your remaining random number is $7-4=3$ and ${2 \choose 2} = 1 < 3 \le 3 = {3 \choose 2}$, so your next largest term is 2. Now you want a random sequence with 1 term where the largest is no more than 2; your remaining random number is $3-1=2$ and ${1 \choose 1} = 1 < 2 \le 2 = {2 \choose 1}$, so your next largest term is 2. So your whole sequence would be $2,2,3,3,4,8.$

This is also a computable bijection. [Next time I will choose a shorter example - sorry]

  • 0
    Thanks! This is what I ended up doing after reading Knuth. It's working pretty well I think. How do you efficiently find Y? I'm starting at 0 and working up by 1, computing binomial coefficients afresh each time. It takes a little longer than I'd like.2011-03-21