1
$\begingroup$

I have a bag of size $B$ and I am drawing $k$ balls at a time in each draw. The $k$ draw is uniform over all the balls in the bag. I get $k_1 (\leq k)$ red balls in the first draw. I want to know what is the expected $new$ red balls in the second draw.

Similarly if I get $k_1$, $k_2$... $k_n$ in the n-th draw, what should be my expectation of getting $new$ red balls in the next draw $k_{n+1}$ ? I am unable to analyse the different cases with the important factor $B$.

Clarification : The $k$ balls in a single draw are distinct. But the next draw is sampling with replacement.

  • 1
    Your emphasis on *new* suggests to me this is drawing with replacement. Are you willing to make an assumption about the prior distribution (before any draws) of the proportion of the $B$ balls which are red?2017-02-24
  • 0
    @Henry Thanks for pointing this out. Yes, the draws are with replacement. But the $k$ balls in a single draw are distinct.2017-02-24
  • 0
    What is the significance of the "red"? Are the balls of different colors? If so, how many are red? Are the balls indistinguishable (save for their color)?2017-02-24
  • 0
    @lulu Yes balls are of different color. I am trying to estimate the red colors in the bag via the samples.2017-02-25
  • 0
    The answer depends on whether you want a Bayesian or frequentist answer. See http://math.stackexchange.com/questions/40319/maximum-likelihood-estimate-of-hypergeometric-distribution-parameter for a maximum likelihood calculation, and see https://en.wikipedia.org/wiki/Conjugate_prior to costruct your Bayesian posterior and priors.2017-02-28
  • 0
    @learner Are you only caring about estimating the total number of red balls in the bag? (Basing this off comment: " I am trying to estimate the red colors in the bag via the samples. ")2017-03-03
  • 0
    @cgage yes. I am just concerned about red balls.2017-03-03

1 Answers 1

1

I would go with an uninformative prior - giving every option $[0,1,...,B]$ the same probability, a priori: Denote the prior probability function as $g(\cdot)$, and the unknown number of actual red balls in the bag as $S$, then $$g(s)=P\{S=s\}=P\{\text{There are }s\text{ red balls in the bag}\}=\frac{1}{B+1}$$.

Each trial follows a hypergeometric distribution, denote by $f(r|S)$ the probability of drawing $r$ red balls from the bag, given there are $S$ red balls in the bag. $$ f(r|S)=\frac{{{S}\choose{k}}{{B-S}\choose{k-r}}}{{{B}\choose{k}}} $$ Then the likelihood of $S=s$ is $$ L(s|k_1,...,k_n) = \prod_{i=1}^{n}f(k_i|s) $$

Denote $h(S|k_1,k_2,...,k_n)$ the posterior probability of actually having $S$ red balls in the bag (remember that this is unknown), given the $n$ trials results. $$ h(S|k_1,k_2,...,k_n)=\frac{L(S|k_1,...,k_n)\cdot g(S)}{\displaystyle{\sum_{s=1}^{B}(L(s|k_1,...,k_n)\cdot g(s))}} $$

BOTTOM LINE:

Given $n$ trials with $k_1,k_2,...,k_n$ red balls drawn, the probability of drawing $k_{n+1}=r$ in the next draw is $$p(k_{n+1}=r|k_1,k_2,...,k_n)=\sum_{s=0}^{B}f(r|s)\cdot h(s|k_1,...k_n)$$