4
$\begingroup$

I've asked a similar (but not identical) question on SO here https://stackoverflow.com/questions/5984102 but the answers aren't that helpful for I need very fast indexing (and maybe that my question wasn't clear enough).

I precomputed a lot of values that took days to compute (with the load spread on several computers). Now I've got a huge lookup table that I need to frequently access and that access has to be as fast as possible.

Basically, I've got a problem like this: I precomputed C(38,6) values (i.e. 2 760 681 values) that are now store in a big lookup table. Memory ain't an issue. Precomputing ain't an issue. The "order" ain't an issue and I can re-order in any way I like.

Also, C(38,6) is just an example, I could be needing C(42,7) as well, etc.

When I did the precomputing, I basically looped like this (the actual code is more complicated for the load needed to be spread across several machines and for the actual computation aren't here but here's the meat of it):

for (int i = 0; i < n; i++) {     for (int j = i + 1; j < n; j++) {         for (int k = j + 1; k < n; k++) {             for (int l = k + 1; l < n; l++) {                 for (int m = l + 1; m < n; m++) {                     for (int o = m + 1; o < n; o++) {                        ...                         }                 }             }         }     } } 

The only thing that matters is to have the fastest lookup time possible.

I've got i,j,k,l,m and n all in [0..37] (or [1..38], doesn't matter) and I need to access the corresponding precomputed element in my huge lookup array as fast as possible.

Is there a closed formula for this that is fast to compute?

What I've got now is a binary search: my i shall always be lower than j, j lower than k etc. I then created an arbitrary index doing:

(i + 1) * (n exp 5) + (j + 1) * (n exp 4) + (k + 1) * (n exp 3) + (l + 1) * (n exp 2) + (m + 1) * n + o 

This is probably overkill but guarantees no collisions. I then do a binary search on that index and find where the precomputed value is stored.

This is kinda fast but I was wondering if there was a faster way to "reach" my precomputed index?

Basically because I don't know enumerative combinatorics well enough I'm precomputing my own index and using that index to do a binary search. This is fast but I'm looking for something faster, like a closed formula (I doubt anything doing "enumerations" and i++ or things like that would be faster than those few multiplication to compute that index + the binary search).

I think the complexity of my method is log(n) but I'm just not sure.

  • 1
    @ToCat: I had to do this recently and used the explanation in Knuth's TAOCP. It is fairly efficient and is more or less the same as the other 3 answers here. Basically you want to read: http://en.wikipedia.org/wiki/Combinatorial_number_system and the chapters by Knuth referenced there.2011-05-23

4 Answers 4

4

The squashed order is a nice order for this sort of thing.

Clarification: This algorithm takes a $k$-subset $0\leq a_1 to a unique index number from $0$ to ${n \choose k}-1$.

Given $k$ integers, $0\leq a_1, the squashed order index of this sequence is ${a_1\choose 1} + {a_2\choose 2} + {a_3 \choose 3} ... + {a_k \choose k}$. If your maximum value for $a_k$ is $n$, you can precompute $m \choose i$ for $m\leq n$ and $i\leq k$ to increase performance of this computation.

If $A$ and $B$ are sets of $k$ natural numbers, then $A$ comes before $B$ in the squashed order if the largest number that is in the symmetric difference of $A$ and $B$ is in $B$.

Aside from the ease of computing the index of a set, the other nice thing about the order is that the index does not depend on $n$.

See, for example: Combinatorics of finite sets, by Ian Anderson, p. 113.

4

If we use 1 based, the critical observation is that of the $\binom{38}{6}=\frac{38!}{32!6!}$ combinations, $\binom{37}{5}$ of them will start with $1$, $\binom{36}{5}$ of them will start with 2, and so on. So given $(9,12,14,24,33,37)$, the combinations with $9$ in the first place will start after $\binom{37}{5}+\binom{36}{5}+\binom{35}{5}+\binom{34}{5}+\binom{33}{5}+\binom{32}{5}+\binom{31}{5}+\binom{30}{5}$, because that is how many start with $1$ through $8$. Then within the $9$'s, the ones with $10$ in the second position number $\binom{28}{4}$ and the ones with $11$ in the second position number $\binom{27}{4}$, so the ones starting with $(9,12)$ start after that. So this is after $\sum_{i=1}^{8}\binom{38-i}{5}+\sum_{i=10}^{11}\binom{38-i}{4}$. The ones with $13$ in the third position number $\binom{25}{3}=\binom{38-13}{3}$. The pattern should be clear. When you get to the end, you will have the index in the table. You can also go the other way, so you don't have to precompute your table-as you say next week you could need the same thing for combinations of $\binom{42}{7}$. To find the millionth combination, the ones with a leading $1$ are $\binom{37}{5}=435897$ and the ones with a leading $2$ are $\binom{36}{5}=376992$ so we want the $187111^{\text{st}}$ one starting with $3$. Then the second place $4$'s are $\binom{34}{4}=46736$ and we find the second item is $9$ and we want the $4625^{\text{th}}$ and so on giving (3,9,11,16,27,36)

1

One could make use of the concepts as explained here: http://msdn.microsoft.com/en-us/library/aa289166%28v=vs.71%29.aspx He goes from the lexicographic index to the combination; I think this question is how to go the other way. Example implementation in java...

// from http://introcs.cs.princeton.edu/java/96optimization/Binomial.java.html static long binomial(int n, int k) {     final long[][] binomial = new long[n + 1][k + 1];      for (int i = 1; i <= k; i++) binomial[0][i] = 0;     for (int j = 0; j <= n; j++) binomial[j][0] = 1;      for (int i = 1; i <= n; i++)         for (int j = 1; j <= k; j++)             binomial[i][j] = binomial[i - 1][j - 1] + binomial[i - 1][j];      return binomial[n][k]; }  static long lexIndex(int n, int... tuple) {     final int k = tuple.length;      // map tuple to its combinadic     final int[] c = new int[k];     for (int i = 0; i < k; ++i) c[i] = (n - 1) - tuple[i];      // calculate the lex index of that combinadic     Arrays.sort(c);     long result = 0;     for (int i = 0; i < c.length; i++) result += binomial(c[i], i+1);      // calculate that index's dual     return (binomial(n, k) - 1) - result; } 

Give it the size of the set and the combination in question, it returns the lexicographic index. From the example in the comments:

lexIndex(38, 9,12,14,24,33,37) = 2321663 

Also could make it faster by calculating the binomial array ahead of time

0

It is pretty absurd to store this at all! There are various ways of writing down functions $f:[0,\binom nk]\to\mathcal{P}(\{1,2,\dots,n\}$ which are bijections, and computable orders of magnitude faster than any look-up involving I/O (you can surely do it without even getting a caché miss…)

  • 0
    ok cool, thanks for your help, I'd +1 mod but I don't have enough point yet to upvote answers :(2011-05-22