1
$\begingroup$

I'm going to rewrite my original question to make it a bit clearer:

Assume I have some set $P$ with $||P|| = N$ unique elements. I also have $S$ multisets, $(m_1, ..., m_S)$, of cardinality $L$, consisting of elements in $P$ chosen with uniform probability. We call a multiset, $m_i$, 'distinguishable' if it contains at least $k$ elements, though not necessarily distinct elements, that exist in no other multiset.

What is the probability of all $M$ multisets being 'distinguishable' according to this definition?


While sampling $S$ times with replacement from the set $P$, we can state the probability of never choosing the same element twice as:

Prob( $S$ unique selections from $P$ ) = $\prod \frac{(N - i)}{N}$ for $i = 0$ to $(S - 1)$

Or equivalently, we can calculate the probability that the multiset of $S$ sampled elements contains all unique elements as:

Prob( $S$ unique selections from $P$ ) = $\prod ((1-(\frac{1}{N - i}))^{(S - 1 - i)})$ for $i = 0$ to $(S - 1)$

  • 0
    There was a question yesterday that was formulated differently, about "pruning" of multisets, but AFAIR was effectively the same question. I can't find it anymore; it may have been deleted. Was that question by you, too?2011-04-08
  • 1
    @joriki: didn't you remember the number of the user? ;-)2011-04-08
  • 0
    @joriki, yes, alas, that was me... number 8861. I wanted to spend more time thinking about the question before posting it here, so I put up the 'pruning' example, then decided to take it down after about 20 minutes. Sorry if that was a faux pas.2011-04-08
  • 0
    @user8861: I wouldn't go so far as to call it a faux pas, but you could have mentioned it briefly in the question so that people who'd seen the other question wouldn't go looking for it to mark this as a duplicate.2011-04-08
  • 0
    @joriki, fair enough, and thanks for looking at the original version! The reason I didn't mention it was mostly because I didn't want to confuse things, and the site told me that ~5 people looked at the original. I'm actually a bit surprised anyone noticed this as a rephrasing.2011-04-08
  • 0
    So, how do you *fill* $m$ bags with $p$ marbles by sampling *with* replacement? :)2011-04-09
  • 0
    @cardinal, I meant that the box 'N' can be approximated as having an inexhaustible supply of 'N' unique marbles. When you pick from the box, each marble is selected with probability 1/N.2011-04-10
  • 0
    I know. I was just teasing. :)2011-04-10
  • 0
    In your last equation, the last term in the product is still 0, as noted by Douglas Zare. The correct one would be $\prod (1-\frac{i}{N})$ with no exponent.2011-04-13
  • 0
    @Ross, how is the last term '0'? If i = (S-1) then the last term is '1' since (1 - (1/(N - i)))^0 = 1...2011-04-13
  • 0
    @Ross, when I accidentally dropped the a set of parentheses in my first writeup, the last term was [1 - (1/(N - i))^0], which, of course, does equal zero.2011-04-13

1 Answers 1

2

This seems like an interesting problem even for $k=1$, which I would solve before attacking the more general version.

Your first calculation for the probability that a multiset has distinct elements is correct, although you are using $P$ rather than $N$ to mean $\#P$.

Your second calculation is unjustified and incorrect. You can compare it with the correct one. Suppose $N=10$ and $\#S=4$. The probability that $4$ draws are distinct is $\frac{10}{10} \times \frac{9}{10} \times \frac{8}{10} \times \frac {7}{10} = \frac{504}{1000} = 0.504$. Your second expression says $(1-\frac{1}{10}^3) \times (1-\frac{1}{9}^2) \times (1-\frac{1}{8}^1) \times (1-\frac{1}{7}^0)$. That last term is $0$, which makes the whole product $0$. If we leave it out, we get $\frac{259}{300} = 0.86333...$ which is quite different from the correct value.

It looks like you use a product like the second one in the rest of your calculations, so whatever error led you to it seems to have propagated, although I am rather suspicious of some of the other steps, too.

  • 0
    "you are using P rather than N to mean #P." - fixed! Thanks for the heads up.2011-04-10
  • 0
    My original expression was missing a set of parentheses, and should now be fixed...2011-04-13