0
$\begingroup$

I am working on a problem like this: Suppose that there are $N$ students in the class. $n_1$ students take course A, $n_2$ take course B and $n_{12}$ take courses both A and B. What is the MLE of $N$?

I did in the following way: assume that $p_1$ is the probability that a student take Course A, and $p_2$ is the probability that a student take Course B, then $n_1\sim Bin(N,p_1)$ and $n_2\sim Bin(N,p_2)$ so that in case $N$ is known we will get $\hat p_1=n_1/N$ and $\hat p_2=n_2/N$.

Similarly for $\hat p_{12}=n_{12}/N$.

Notice that $\hat p_{12}=\hat p_1 \hat p_2$, we will have the following equation $(n_1/N)(n_2/N)=n_{12}/N$, from which we get $\hat N=n_1 n_2/n_{12}$.

But I am wondering whether this reasoning is correct as I assumed $N$ is known first.

  • 0
    @Sasha, as mentioned by Sasha, could we just set the problem by multinomial distribution with $(N,p_1p_2,p_1(1-p_2),p_2(1-p_1),(1-p_1)(1-p_2))$? Then we have observations as $n_1$, $n_2$, $n_12$2011-12-13

2 Answers 2

1

Suppose that $N$ students decide independently of each other to enroll in (i) class A only, (ii) class B only, (iii) both class A and class B, and (iv) neither class A nor class B, with probabilities $p$, $q$, $r$, and $s$ respectively where $p+q+r+s=1$. It is observed that $n_1-n_{12}$ students are enrolled in class A only, $n_2-n_{12}$ students are enrolled in class B only, and $n_{12}$ students are in both classes. As noted by Sasha, we have a multinomial distribution, and the likelihood of this observation is thus $q(N; n_1, n_2, n_{12}) = \frac{N!p^{n_1-n_{12}}q^{n_2-n_{12}}r^{n_{12}}s^{N-n_1-n_2 + n_{12}}}{(n_1-n_{12})!(n_2-n_{12})!n_{12}!(N-n_1-n_2 + n_{12})!}.$ To find the value of $N$ that maximizes $q(N; n_1, n_2, n_{12})$, we look at the ratio $\frac{q(N; n_1, n_2, n_{12})}{q(N-1; n_1, n_2, n_{12})} = \frac{Ns}{N-n_1-n_2+n_{12}}$ and note that the ratio is greater than $1$, (that is, q(N; n_1, n_2, n_{12}) > q(N-1; n_1, n_2, n_{12})), if $N < \frac{n_1+n_2-n_{12}}{1-s} = \frac{n_1+n_2-n_{12}}{p+q+r},$ and smaller than $1$, (that is, $q(N; n_1, n_2, n_{12}) < q(N-1; n_1, n_2, n_{12})$), if the above inequality is reversed. In other words, the maximum-likelihood estimate of $N$ is $\hat{N} = \frac{n_1+n_2-n_{12}}{p+q+r} = \frac{\text{total enrollment in classes A and B}}{P(\text{student enrolls in A or B or both})}$ where floors and ceilings have been ignored for simplicity of exposition. Of course, if we do not know $p+q+r$, this estimate is not very useful. Varying $p+q+r$ to find the maximum value of $\hat{N}$ is futile. It leads to an estimate of an infinite number of students none of whom take courses A or B. So much for higher education!

0

As requested, assuming that taking course A and taking course B are independent and thaT your binomial model is correct:

You are trying to maximize

$\binom{{N}}{n_1}p_1^{n_1}(1-p_1)^{{N}-n_1}\binom{{n_1}}{n_{12}}p_2^{n_{12}}(1-p_2)^{{n_1}-n_{12}}\binom{{N-n_1}}{n_2-n_{12}}p_2^{n_2-n_{12}}(1-p_2)^{{N}-n_1-(n_2-n_{12})} $

$=\frac{N!}{n_{12}! ({n_1}-n_{12})! ({n_2}-n_{12})! (N-n_{1}-{n_2}+n_{12})! }p_1^{n_1}(1-p_1)^{{N}-n_1}p_2^{n_2}(1-p_2)^{{N}-n_2}.$

As you say, for given $n$ this is maximised when $\hat p_1=\frac{n_1}{N}$ and $\hat p_2=\frac{n_2}{N}$ so you are trying to find $N$ to maximise

$\tfrac{N!}{n_{12}! ({n_1}-n_{12})! ({n_2}-n_{12})! (N-n_{1}-{n_2}+n_{12})! }\left(\tfrac{n_1}{N}\right)^{n_1}\left(1-\tfrac{n_1}{N}\right)^{{N}-n_1}\left(\tfrac{n_2}{N}\right)^{n_2}\left(1-\tfrac{n_2}{N}\right)^{{N}-n_2}.$

Personally I would then use numerical methods on actual data.

  • 0
    Yes I got some idea from your equations. As mentioned by Sasha, there is a fact that $N=n_0+n_1+n_2-n_{12}$, we may model it as multinomial distribution.2011-12-13