2
$\begingroup$

A real problem from the hospital.

Suppose we have $n$ blood samples to test HIV. We don't want test them one by one, since the cost is huge and most people don't have a positive result. So we try to merge them into groups, and then we just use the machine to test groups. Assume we have $k$ positive in $n$ samples, but we don't know $k$. Does a algorithm of $O(k\log_2n)$ tests (times of using the machine) exist?

  • 0
    If this is a theoretical question fine. But if this procedure is really going to be used then there are clinical standards for blood tests which must be followed. The hospital violates the approved clinical standards at its on risk. To develop a new procedure and get it approved is a massive amount of work. Immediately the point comes to mind that there must be some lower limit of detectability. So if you dilute one positive blood sample with a million clean ones then it would seem unlikely that the whole pooled sample would test positive. So how many can you mix? That is why you need an appro2012-01-29
  • 0
    @MaxW: this does not directly answer the question, so it is better as a comment.2012-08-16

1 Answers 1

2

Hint: if $k=1$ a binary tree (split the samples in half, mix each half, test) works. If $k \ll n$, suppose you do a binary tree and quit testing any batch that shows negative?

  • 0
    Thanks, @Ross. k = 1 is obvious. And if k > 1, the binary tree trick might useful, but I cannot see how to get the result. The worst case is the fist k nodes are all positive? If we do it recursively, we cannot determine which step we have $T_{n/2}$ or $2T_{n/2}$2012-01-20
  • 0
    @user22997: IThe worst case is the $k$ samples are paired until they all separate. One easy case is if $k=2^p$. Then you have to test all the samples until stage $p+1$, when you have $k$ trees. But if $k \ll n$ that isn't so many, so you can ignore it. In fact, it is no more than $k$. Of course, if $k=n$, you wind up testing $2n-1$ samples instead of $n$.2012-01-20
  • 0
    Hi, @Ross: in the worst case, I got a complexity of 2k+2k($\log_2(n/k)$), it seems not O(k$\log_2n$).2012-01-20
  • 0
    @user22997: that is why you need $k \ll n$. If every sample is contaminated, you would have to test them all, doing $k=n$ tests. If you believe contaminated samples are rare, you will do the binary search and test twice that many. This is good if $k \lt \log_2 n$. If you have a feeling for the density of positive samples (and it is higher), you can improve this.2012-01-21
  • 0
    it's an interesting problem. i would like to know how you use binary tree trick if k << n.2012-01-26
  • 0
    @rose: split the samples in two groups, mix a bit from each within a group and test. If either test is negative, you have found n/2 clean samples. Otherwise, split each group in half and repeat. By $\log_2 k$ splits, you will have lots of clean groups. At each layer, you have at most $k$ samples to test, so the total of tests is less than $k \log_2 n$2012-01-26
  • 0
    @Ross: Thanks! So, you split n into two n/2 groups, then get a mix inside each group (for n/2 samples) and test it, and depending whether both are negative or not, you will continue with two or one recursive calls, right? And you finish when you have a group of 2 samples. At the very last level you will do at most k tests and you get klogn.But this is in case when k << n or if k=O(logn). If k is very close to n , we can do this in linear time. But what do you think about other cases? When k is between logn and n? Can we improve the linear algorithm?2012-01-26