A work team is made up of five Computer Engineers and nine Computer Technologists. If five team members are randomly selected and assigned a project, what is the probability that the project team will include exactly three Technologists?
Can I use a binomial distribution for this exercise?
-
0Seems you should use negative binomial, because the people you choose for the team are not replaced. – 2017-02-19
-
0But it does not say that they are independent. That is why my doubt. – 2017-02-19
-
2My mistake, I meant to say use Hypergeometric distribution, not negative binomial (I mix these up for some reason). Hypergeometric distribution takes into account drawing without replacement. As you are not replacing the team members after drawing, the trials are not independent (each draw depends on who has already been drawn). – 2017-02-19
-
0At some point I came up with Hypergeometric. But erroneously discard it. Thank you very much. – 2017-02-19
1 Answers
Not binomial. No, you cannot use a binomial distribution because sampling is without replacement. (No team member can be chosen to fill two places doing the project.) If a Technologist is chosen first (probability 9/14), then the probability of choosing another Technologist on the next draw is different (probability 8/13), and so on. So draws ('trials') are not independent.
Use hypergeometric. As Commented by @Dave this can be solved using a hypergeometric distribution. Let $X$ be the number of technologists chosen, then
$$P(X = 3) = \frac{{9 \choose 3}{5 \choose 2}}{{14 \choose 5}} = 0.4196.$$
Computational issues. Because the factorials involved are relatively small numbers, this probability can be evaluated using a basic calculator.
In R statistical software, the PDF of a hypergeometric distribution is dhyper:
the arguments are the number of Technologists of interest, the number
of Technologists in the team, the number of non-Technologists in the team,
and the number chosen for the project.
So the computation looks like this:
dhyper(3, 9, 5, 5)
## 0.4195804
Note: Alternatively, using binomial coefficients:
choose(9, 3)*choose(5,2)/choose(14,5)
## 0.4195804
However, when numbers are large (as for example, if the project team were chosen from a department of 100 computer scientists), the latter method may overflow the arithmetic capacity of the computer (or calculator). By contrast, dhyper is programmed to avoid overflow, if possible.
Graph of the distribution. Here is a bar graph of the distribution of $X$, which can take integer values between $0$ and $5$ inclusive. (Choosing no Technologists is a possibility, although unlikely. There enough Engineers to cover all five people chosen for the project.)
We see that three is the most likely number of Technologists to be chosen (emphasized in red color.)
