3
$\begingroup$

If i have a known population ($N$ marbles of which $M$ are black) and draw $n$ samples without replacement the probability to draw $x$ black marbles is given by the hypergeometric distribution.

Is there a way to get probabilities for the total number of blacks $M$ from the number of black marbles $x$ in my sample?

  • 0
    Under suitable assumptions one can. Assume for example that $N$ is fixed, and Alicia decided on how many of these will be black, $0$ to $N$, using a *uniform* distribution, or any other known distribution. Then based on sample proportion, we can calculate the probabilities Alicia decided to put in $k$ black.2012-12-05
  • 0
    Can you get any quantitative results without assumptions on the distribution of the $M$ (i.e. Alicia's choice)? I want to find a maximum number of the remaining black marbles (i.e. $M-x$), that holds except some small probability.2012-12-05
  • 0
    You always have to make at least a tacit assumption, despite what Frequentist statisticians sometimes say. If I were doing it, I would set it up as a Bayesian problem with a conjugate [Beta-Binomial](http://en.wikipedia.org/wiki/Beta-binomial_distribution) prior on M.2012-12-05
  • 0
    I do not have any insight on how to attack the problem without a prior.2012-12-05

1 Answers 1

1

The probability of extracting exactly $x$ black marbles in a sample of size $n$ from a population of $N$ marbles of which $M$ are black can be calculated as:

$$P(X=x|n,N,M) = \frac{\binom{M}{x}\binom{N-M}{n-x}}{\binom{N}{n}}$$

If you assume that all values of $M$ compatible with your result are equally likely (this would be André's prior distribution assumption, I believe), you can use the exact same formula with a different twist, considering the total number of black marbles, $M$, as the independent variable, and the number of black marbles in your sample, $x$, as a parameter instead:

$$f(M) = P(X=x|n,N,M) = \frac{\binom{M}{x}\binom{N-M}{n-x}}{\binom{N}{n}}$$

The above function computes the relative likelihood of a value of $M$, and the value that maximizes it, is the maximum likelihood estimator of $M$ for the population. If you plot the above function for all possible values, $M\in[x,N-n+x]$, after normalization you'll get a probability distribution which can be used to compute the probabilities you are after.

  • 0
    Right, this corresponds to a uniform prior on M.2012-12-05