Suppose I have a multiset of numbers. I'm interested in the number of unique n-tuples that can exist using the numbers from this multiset. Now of course a closed form is of interest here, but what I'm really after is some way to distinguish how "full" the set of unique n-tuples is, perhaps related to the information theory notion of entropy. The more repeats that exist in the multiset, the less full it is; the more unique numbers there are, the more unique n-tuples exist. For argument's sake, let's call this measure dimensionality.
Some motivation and explanation: I'm interested in tying parts of information theory back into run-time complexity analysis (and possibly reinventing the wheel along the way). When computer scientists think of algorithms for problems like sorting, they often consider questions like "How many comparisons need to be done to sort this multiset of numbers?" In that case, a comparison is analogous in some sense to a two-tuple. Many other problems, I believe, are also largely based on the number of unique n-tuples of a multiset. Often times these problems are approached with what I consider to be crude tools where worst-case and average-case are considered, and often little more detail is provided (or at least this what my limited college education has provided me with). Back to the sorting example for a second: there are certainly differences between sorting a multiset of 100000 unique numbers or 100000 identical numbers (to use a tad bit of hyperbole), but how do you measure the in-between cases? I believe that the measure provided here can potentially tie back into providing a more fine-grained approach to run-time complexity analysis.
In summary:
- I'll accept any answer that gives some pre-existing measure of "dimensionality" as used in this context.
- I'll give upvotes for closed forms of the number of unique n-tuples. I'm sure this is a well-studied problem by somebody, so I don't want to mark these as answers for a while in case a measure exists.
- I'd love to hear if anybody has applied some notion of entropy to analyzing run-time complexities for given datasets.