We have a box with $m=1\,000\,000$ cards. Each card contains one word. The words are repeated so there is a relatively small number of $n$ unique words. $n$ is unknown.
If we get a sample of $k=5000$ cards, we find that there is $42$ unique words in our sample.
With this information, we know $P(n\geq42) = 1$.
How can we know $P(n\geq43)$, $P(n\geq44)$..., and so on?
Is this problem common, and does it have a "common name"?
PS: we have the information on the frequency of each of our $42$ words for the $5000$ card sample, they can be used for the solution if it is relevant. Lets call this frequencies $f_1, f_2, \dots,f_{42}$.