0
$\begingroup$

I've encountered an interesting probability question, and need some guidance to how to approach it. It might be obvious and that I'm not to seeing it. Thank you in advance.

Suppose $X = \{1,...,n, n+1,...,m+n\}$, e.g., a set with $m+n$ items. Assume;

(i) I randomly select a subset $M$. Each subset is equally likely, say it has $k$ items. What is the probability of an item from $\{1,...,n\}$ occurring with an item from $\{n+1,...,n+m\}$?

(ii) Suppose I sample with replacement from $X$ now to construct $M$, i.e., $M$ can contain duplicate items. Is the above probability the same or different?

  • 0
    Not sure the replacement case is well defined. After all, if you replace elements then there is no limit on the size of the "subset" $M$ you can create. For the first, for a fixed size $k$ the computation isn't bad, and then you can sum over $k$. I don't imagine that the sum simplifies in any nice way.2017-01-07

1 Answers 1

1

For $k>0$ the probability that all the elements are from the first $n$ is $\binom nk / \binom {m+n}k$. Of course this is $0$ if $k>n$. Similarly, the probability that they are all from the final $m$ is $\binom mk / \binom {m+n}k$. Thus you can compute the probability that a randomly chosen subset of length $k$ contains at least one of each type is $$P_k=1-\frac {\binom nk +\binom mk}{ \binom {m+n}k}$$

And $P_0=0$ by inspection.

Of course there are $2^{m+n}$ subsets in total and there are $\binom {m+n}k$ subsets of size $k$. Thus the answer you want is $$\frac 1{2^{m+n}}\times \sum_{k=0}^{m+n} \binom {m+n}k P_k=\frac 1{2^{m+n}}\times \sum_{k=0}^{m+n}\left(\binom {m+n}k -\binom nk -\binom mk\right)$$

Not sure the "with replacement case" is well defined. After all, there is no limit on the size of the "subset" you can make if we allow replacement. Of course, if you fix the size then the same technique as above will work, you just need to refigure the probabilities. Clearly for $k>\max(m,n)$the non-replacement case yields $1$ where the replacement case does not. in general, for $k>1$ the probability you seek is higher without replacement.