Consider a situation with a bag with infinity number of balls. Each ball is of some color. Number of colors is finite but it is not known. Balls are drawn from the bag one by one and checked for the color. We want to stop drawing balls when there is small probability that we will find color that is not already drawn. Or more exact, we want to be very sure (with large probability) that there is small portion of balls in bag with colors that are not already drawn. Is there a statistical model to describe when to stop?
Bag with infinite number of colored balls
-
0Fyi this situation is sometimes calle$d$ the *p$a$lette* extreme. For ex$a$mple, Mehlum's "Islan$d$ problem revisited" 2009. Preprint > http://folk.uio.no/hmehlum/publications/TASMehlum.pdf – 2012-12-11
2 Answers
This problem often comes up in Biology when one is interested in estimating the number of species in an area based on some survey. There are a wide variety of methods developed to suit the specific application in hand. For example, it is very hard to come up with a good estimator for the number of biological species in a rain forest since there are too many "extremely rare" species to take into account. There is no one unique estimator that is globally optimal and you would have to customize your own estimator based on your needs. A good starting point is this link and references therein.
-
0@Dinesh, +1, can you please provide key citations? – 2012-12-11
If you assume there are N colored balls, with equal frequency p=1/N, then this should be pretty do-able. Of course, you're desired confidence interval will affect the answer.
Without working out all the details, consider that you have drawn the point where you see a few colors 2 or 3 times. By considering how many balls you have drawn, and how many uniques have been witnessed, you can make an estimate of that N within a confidence bounds.
I can give the details if you like, but if these assumptions are acceptable you can probably take it from here?