4
$\begingroup$

I've asked a similar (but not identical) question on SO here https://stackoverflow.com/questions/5984102 but the answers aren't that helpful for I need very fast indexing (and maybe that my question wasn't clear enough).

I precomputed a lot of values that took days to compute (with the load spread on several computers). Now I've got a huge lookup table that I need to frequently access and that access has to be as fast as possible.

Basically, I've got a problem like this: I precomputed C(38,6) values (i.e. 2 760 681 values) that are now store in a big lookup table. Memory ain't an issue. Precomputing ain't an issue. The "order" ain't an issue and I can re-order in any way I like.

Also, C(38,6) is just an example, I could be needing C(42,7) as well, etc.

When I did the precomputing, I basically looped like this (the actual code is more complicated for the load needed to be spread across several machines and for the actual computation aren't here but here's the meat of it):

for (int i = 0; i < n; i++) {     for (int j = i + 1; j < n; j++) {         for (int k = j + 1; k < n; k++) {             for (int l = k + 1; l < n; l++) {                 for (int m = l + 1; m < n; m++) {                     for (int o = m + 1; o < n; o++) {                        ...                         }                 }             }         }     } } 

The only thing that matters is to have the fastest lookup time possible.

I've got i,j,k,l,m and n all in [0..37] (or [1..38], doesn't matter) and I need to access the corresponding precomputed element in my huge lookup array as fast as possible.

Is there a closed formula for this that is fast to compute?

What I've got now is a binary search: my i shall always be lower than j, j lower than k etc. I then created an arbitrary index doing:

(i + 1) * (n exp 5) + (j + 1) * (n exp 4) + (k + 1) * (n exp 3) + (l + 1) * (n exp 2) + (m + 1) * n + o 

This is probably overkill but guarantees no collisions. I then do a binary search on that index and find where the precomputed value is stored.

This is kinda fast but I was wondering if there was a faster way to "reach" my precomputed index?

Basically because I don't know enumerative combinatorics well enough I'm precomputing my own index and using that index to do a binary search. This is fast but I'm looking for something faster, like a closed formula (I doubt anything doing "enumerations" and i++ or things like that would be faster than those few multiplication to compute that index + the binary search).

I think the complexity of my method is log(n) but I'm just not sure.

  • 1
    Am I understanding correctly: you calculated the combinations of $\binom{38}{6}$, so the first is (1,2,3,4,5,6), the second is (1,2,3,4,5,7), and the last is (33,34,35,36,37,38). You can look up a particular one, say the millionth, quickly as it is just an array. You would like a function that given an ordered list like (9,12,14,24,33,37) would return the index in the list-this one might be entry 238388? You could write a function to go the other way as well, so you don't have to store the table.2011-05-21
  • 0
    @Ross Millikan: yup exactly (1,2,3,4,5,6) or (0,1,2,3,4,5) depending if it's one- or zero-based. But I don't care that much about storing the table, I care about very fast lookup. So I need to compute 238388 faster than the way I'm computing it now: basically I'm doing a binary search and I find 238388. This is already fast, but I need faster : ) Just to be clear: I'm doing a binary search on my arbitrary-but-sorted-and-non-colliding indices that I keep in an int[] (or long[]) and this would give me 238 388. Memory ain't that much an issue. Speed is.2011-05-21
  • 0
    @Ross Millikan: also, the code is not in the question (it takes hundreds if not thousands of lines) but the reason why I need this index is that because at this index I'm storing a value extremely complicated to compute which I need to "find back" as fast as I can.2011-05-21
  • 1
    If I understand right, when your code produced all the output, you put that output one value at a time into a one-dimensional array. But couldn't you have put it into a six-dimensional array instead? Then if you wanted the output corresponding to $i=9$, $j=12$, $k=14$, $l=24$, $m=33$, and $n=37$, you could just lookup $\mbox{output}[9][12][14][24][33][37]$.2011-05-21
  • 0
    I think you're asking the wrong question. The right question would be "What data structure should I be using?" I suspect that the answer to that is a trie.2011-05-21
  • 0
    @alex: A six dimensional array might have $38^6 \approx 3$ billion potential elements, over a thousand times the one dimensional array; mainly because of the ordering, but also because of duplicate values.2011-05-21
  • 0
    @alex.jordan: it's exactly as Henry said... C(n,k) is pretty small but n**k can get quite huge.2011-05-21
  • 0
    @Peter Taylor: I'd rather keep that thing relatively easy :-/ A trie ain't exactly easy. As of now I've got a huge *int[]* and a *log(n)* binary, which isn't bad but I'd like to do better :-/2011-05-21
  • 1
    @ToCat: I had to do this recently and used the explanation in Knuth's TAOCP. It is fairly efficient and is more or less the same as the other 3 answers here. Basically you want to read: http://en.wikipedia.org/wiki/Combinatorial_number_system and the chapters by Knuth referenced there.2011-05-23

4 Answers 4