0
$\begingroup$

I need some help with a problem that i think it involves combinatorics and integration.

Given: a number k of labels, a number n of objects, a number c of labeling cycles; Each of the n objects receives by random assignment with replacement (pardon my R-like language) one of the labels. This is repeats c times (number of labeling cycles). At the end each object will have a combination of c labels out of k available labels. In the process some of the objects will end up having identical label combinations. Let's call these events barcode collisions.

My question is how to derive a formula to calculate barcode collision rate given k labels, n objects and c labeling cycles?

N.B. This applies to a real problem involving labeling biological cells for RNA sequencing. The known variables in this techniques are the number of labels, the number of labeling cycles and the number of cells. HOWEVER here the number of cells is not the same as n in the problem description. This number of cells includes barcode collisions therefore the real number of cells that were analysed is actually n + (collision rate)/2.

1 Answers 1

1

After all cycles, an object has one among $\binom{k}{c}$ label configurations.

There are $\binom{k}{c}^2$ ways that 2 objects can be configurated. Among these there are $\binom{k}{c}$ cases where both object has the same labels. So the probability that 2 objects constitude a barcode collision $$\binom{k}{c}^2/\binom{k}{c}=\binom{k}{c}$$