4
$\begingroup$

An Internet search engine looks for a keyword in 9 databases, searching them in random order. Only 5 of these databases contain the given keyword. What is the probability that it will be found in at least 2 of the first 4 searched databases?

What I tried is: Let X: the event that the keyword is to be found in at least 2 databases, and Y: the event that we search the first 4 databases, then we need to find a conditional probability $P(X\geq 2 | Y)=P(X= 2,3,\text{ or }4 | Y)=\frac{P(\{X\geq 2\} \cap Y)}{P(Y)}$ but then I don't know how to continue?

Any help...

3 Answers 3

3

Added: Henry's and my answers, while the approaches are different, actually give the same result! To see why, let's do the general case with $N$ databases, $M$ of which contain the keyword, and us choosing $K$ databases. Let $X$ be the number of the $K$ databases chosen that contain the keyword. For $P(X = x)$, Henry's and my logic give the following answers.

$\text{ Henry: } P(X = x) = \frac{\binom{M}{x}\binom{N-M}{K-x}}{\binom{N}{K}}; \text{ me: } P(X = x) = \frac{\binom{K}{x} \binom{N-K}{M-x}}{\binom{N}{M}}.$

But

$\frac{\binom{M}{x}\binom{N-M}{K-x}}{\binom{N}{K}} = \frac{M!}{x! (M-x)!} \frac{(N-M)!}{(K-x)! (N-M-K+x)!} \frac{K! (N-K)!}{N!}$ $= \frac{K!}{x! (K-x)!} \frac{(N-K)!}{(M-x)! (N-K-M+x)!} \frac{M! (N-M)!}{N!} = \frac{\binom{K}{x} \binom{N-K}{M-x}}{\binom{N}{M}}.$

Thus the two answers give the same result.


(Original answer.)

I would approach it differently. Let $X$ be the number of the first four databases in which the keyword is found. You want $P(X \geq 2)$.

$P(X = 2) = \frac{\binom{4}{2} \binom{5}{3}}{\binom{9}{5}}$ because there are $\binom{4}{2}$ ways that 2 of the first 4 databases can contain the keyword, $\binom{5}{3}$ ways that 3 of the last 5 databases can contain the keyword, and $\binom{9}{5}$ ways that 5 of the 9 total databases to contain the keyword.

I'll let you calculate $P(X = 3)$ and $P(X = 4)$.

(FYI, $X$ has what's called a hypergeometric distribution.)

  • 0
    @Jess: I was looking at the problem from the point of view of placing 5 keywords into 9 databases. So we are choosing 5 items (databases) in which to place the keywords.2011-09-10
3

You are going to have to work out the probabilities of getting the keyword exactly $n$ times, and either add up the probabilities for $n=2$, $3$ or $4$, or subtract from $1$ the probabilities for $n=1$ or $0$.

If $N$ is the number of dictionaries where the keyword is found then $\Pr(N=n) = \frac{{5 \choose n}{4 \choose 4-n}}{9 \choose 4}$ since you have to choose $n$ dictionaries from $5$ with the keyword and $4-n$ from $4$ without, choosing $4$ from $9$ overall.

So for $n=0,1,2,3,4$ this gives probabilities $\frac{1}{126},\frac{20}{126},\frac{60}{126},\frac{40}{126},\frac{5}{126}$.

Add up the last three (and change to lowest terms) and you have your solution.

  • 0
    I've added a proof that our answers give the same result.2011-09-10