1
$\begingroup$

I'm onto a problem about binary sequence similarity for which I have not found any existing solution. I want to share it and the approaches I have taken, although none of them convince me.

Consider a partition of a range from 1 to N, with K subsets represented as binary sequences: a value of 1 at index i in sequence j means that subset j contains the range's value i. For example, for N=7 and K=3, a possible partition could be k1=1001100, k2=0110000, k3=0000011.

Here, an appropriate distance function should regard as similar those pairs of sequences that contain 1's at close indexes.

Cases with (equally) similar pairs of sequences ("..." means many 0's)

-Case A

*1010000101000...0000101

*0101000010100...0001010

-Case B (both more sparse)

*100010001001000...0000101

*010001000100100...0001010

-Case C (less regions)

*1010000101000...0000000

*0101000010100...0000000

-Case D (different sparseness)

*1101100110110...0011011

*0010000001000...0000100

Case with not so similar sequences, but still quite similar:

-Case M

*1010000101000...1110000

*0101000010100...0001110

Case with rather different sequences:

-Case S

*1010000101000...1110...0000

*0101000010100...0000...0111

I have tried:

*Measuring the distance between each 1 in a sequence to the closest 1 in the other sequence, and summing them. Has the problem that sequences may have different amount of 1's. I tried correcting directly for the amount of 1's and more, but found no solution.

*For two sequences X and Y, using the variation in amount of information between X and xor(X,Y).

*The edit distance.

*Local alignment, where a pair of 1's in the same final position has a positive score.

*How probable it is that a model that generates sequence X also generates sequence Y, but I don't know how to model this.

I hope to hear any comments you may have.

Cheers,

David

  • 0
    I don't understand how you're using the term "partition". Usually a partition of a set contains subsets of that set; in your case it appears to contain elements of the set; I think you should add a definition to clarify that.2012-08-23
  • 0
    I mistakenly wrote partitions instead of subsets. Additionally, I added an example, although the key of the problem is better seen in the subsequent examples. Thanks for the comment.2012-08-23
  • 0
    If I understand correctly what you mean, then you forgot to change one of the occurrences of "partition" to "subset".2012-08-23
  • 0
    You are right. It is corrected now.2012-08-24

2 Answers 2