2
$\begingroup$

Recently I discussed an experiment with a friend. Assume we start a random experiment. At first there is an array with size $100,000$, all set to $0$. We calculate at each round a random number modulo $2$ and select one random position in that array. If the number in the array is $1$, nothing is changed and otherwise the pre-computed value is set. The question is: how many distinct hash values would we have added in $1$%, $5$%, $50$%, $95$%, $99$% of all cases?

Example: $4$ rounds with array of size $10$:

Array                     Position   random number [0,...,0]                    5              0 [0,...,0]                    7              1 [0,...0,1,0,0,0]             6              1 [0,..0,.1,1,0,0,0]           6              0 [0,..0,.1,1,0,0,0]           2              0 

First we considered this a somehow simple problem, but after thinking for some hours, searching the web, and asking some math students, we couldn't find a solution. Do you know a probability distribution for this problem?

Remark: Was also posted on Math Overflow and got its answer there.

  • 0
    I've quoted the MO answer with attribution as community wiki. I'd rather not see the only answer to a question be contained entirely in an offsite link. If you want to discuss what should be the correct procedure to deal with questions that get answered offsite, you can raise the question on meta, but I see little downside to this approach.2010-08-03

1 Answers 1

2

As answered by T. on MathOverflow.

This is equivalent to (among other names) the Coupon Collector problem. Your are asking about the distribution of the number of coupons collected after t steps, when the total number of possible coupons is n.

http://en.wikipedia.org/wiki/Coupon_collector%27s_problem

ADDED: this and related distributions are also studied under other names such as Birthday Problem, random mappings, and random hashing. Kolchin-Sevastyanov-Chistyakov Random Allocations, Knuth The Art Of Computer Programming, vol. 2, and Flajolet & Sedgwick Analytic Combinatorics all discuss these problems and may contain the precise asymptotics of the distribution you are looking for.

III.10 in Flajolet and Sedgewick gives the Poisson answer $1−\exp(−t/n)$ when the ratio is held constant, but other asymptotic regimes are also of interest especially in hashing problems. Birthday problem is when $t=O(n^{1/2})$ and one gets statistics of the number of collisions. For t=n^k with 1/2