3
$\begingroup$

I am thinking of this for more than two weeks now and did not find any help in the literature:

I have this experiment: There is an urn with 9 balls inside, numbered 1-9. You draw a ball and record the number. After that you put the ball back inside the urn. You do this until you have drawn one number 8 times (it does not matter, which one). How many draws do you expect to do until you get one number 8 times?

The minimum number of draws is obviously 8 and the maximum is 64, but i did not find any distribution for this kind of problem.

The only thing I know from simulation is, that the expected value is about 39.309. Any Ideas?

  • 0
    There's probably a more elegant way to do it, but one method that works is to let $E[n_1,n_2,\cdots,n_9]$ denote the answer conditioned on having seen each number $i$ exactly $n_i$ times. There is then an obvious backwards recursion.2017-02-09
  • 0
    I found out, that each number is distributed negative binomial. (r=8 and p=1/9), so the expected vaule for each number is 72. But I don't know how I can conclude from this to the expeced vaule of any number being drawn 8 times.2017-02-09
  • 0
    That is certainly true, but I don't see how it helps. Running the recursion shouldn't be too terrible...granted there are $8^9$ states but A. that's manageable and B. there are a huge number of symmetries. Still, it kind of feels like there should be a better method. After all, the recursion wouldn't be feasible if you replaced $8$ with $1000$.2017-02-09
  • 0
    Well I am looking for an analytical solution, since I know the result from a simulation. something like the expected vaule of the maximum of 9 negative binomial distributiosn to be 8.2017-02-09
  • 0
    I get it. But, well, let's look at $2$ instead of $8$. The non-trivial states are now just determined by counting the distinct numbers you've seen, so let $E[n]$ be the expected number it will take assuming you have seen $n$. Then $E[9]=1$ and $E[n]=\frac {9-n}9\times (E[n+1]+1)+ \frac n9\times 1$ That is easy to resolve, and you get $E[0]=4.458315745$. That doesn't suggest anything to me (though maybe it does to someone else).2017-02-09

1 Answers 1

0

This problem is quite simple combinatorially but finding closed forms is difficult, indeed the intermediate results indicate there may not be any. Suppose we treat the case of $n$ coupons where we wait until some coupon has been seen $n-1$ times. We have from first principles that the probability for this to happen after $m$ draws is given by

$$P[T=m] = \frac{1}{n^m} (m-1)! [z^{m-1}] \frac{d}{du} \left.\left(\sum_{q=0}^{n-3} \frac{z^q}{q!} + u\frac{z^{n-2}}{(n-2)!} \right)^n\right|_{u=1}.$$

This is

$$P[T=m] = \frac{n}{n^m} (m-1)! [z^{m-1}] \frac{z^{n-2}}{(n-2)!} \left(\sum_{q=0}^{n-2} \frac{z^q}{q!}\right)^{n-1}.$$

We can now compute the expectation as follows.

F := n -> z^(n-2)/(n-2)!*add(z^q/q!, q=0..n-2)^(n-1);

X :=
proc(n)
local FF;
    option remember;

    FF := expand(F(n));
    add(m*n/n^m*(m-1)!*coeff(FF, z, m-1), m=n-1..1+(n-2)*n);
end;

For $n=9$ we thus obtain

> X(9);                                     
           96899089924114484187946852578422805520046700098386996168
           --------------------------------------------------------
           2465034704958067503996131453373943813074726512397600969

> evalf(%);
                                      39.30942219

The form of this result indicates there may not be a simple answer. If there were any possibility of potential cancellation it would have appeared at this point.

Code.

#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  int n = 6, trials = 1000; 

  if(argc >= 2){
    n = atoi(argv[1]);
  }

  if(argc >= 3){
    trials = atoi(argv[2]);
  }

  assert(1 <= n);
  assert(1 <= trials);

  srand48(time(NULL));
  long long data = 0;

  for(int tind = 0; tind < trials; tind++){
    int dist[n]; int steps = 0;

    for(int cind = 0; cind < n; cind++){
      dist[cind] = 0;
    }

    while(1){
      int coupon = drand48() * (double)n;

      steps++;

      if(dist[coupon] == n-2)
        break;
      dist[coupon]++;
    }

    data += steps;
  }

  long double expt = (long double)data/(long double)trials;
  printf("[n = %d, trials = %d]: %Le\n", 
         n, trials, expt);

  exit(0);
}