1
$\begingroup$

I'm working on a genetic algorithm and would like to map each function to a set of "codons". So + -> 011. Given this, I would like to figure out how easy it would be for any given codon set to mutate into another codon set.

If you were to take two combinations, say 012 and 111, they would have a distance of 2 since because it would take 2 mutations to move from one to the other. Eg. 012 - > 011 - > 111.

So, here's my question : given an (n, k) combination set, what is the average distance between any two combinations?

I think you might be able to model this as a small world network, but if there is a better approach I'm all ears.

  • 0
    Can you please add an explanation of what is an "$(n,k)$ combination set"?2012-07-27
  • 0
    Also, would you say that `002` and `000` have distance 1, or distance 2?2012-07-27
  • 0
    What I meant by (n, k) is a set with n possible values and k elements. So, a set using one 0 and 1, and with 3 elements / spots (Eg 000,100,101,etc) would be considered (2,3) combination. A (3,2) combination, on the other hand would be something like 12,00,22,02,etc.2012-07-27
  • 0
    002 and 000 would have a distance of 1 since you would only need to change one value to convert one to the other.2012-07-27
  • 1
    @caffein To clarify, does an $(n,k)$ combination set always contain exactly $n^k$ different strings? Also, it sounds like you are computing distance purely with mutation and not transpositions, etc? In this case you can use the term "Hamming distance".2012-07-27
  • 0
    @ErickWong For and $(n,k)$ set the number of combinations should be $(n+k-1, k) => (n+k-1)!/(k!((n+k-1)-k))!) = (n+k-1)!/(k!(n-1)!)$ However there WILL be $n^k$ number of strings, but because order doesn't matter, it would contain the above number of unique combinations. for instance a $(2/2)$ combination set would have $2^2 = 4$ unique strings, but only $(2+2-1)!/(2!(2-1)!) = 3!/(2!(1!)) = 6/2 = 3$ number of unique combinations. These are 00, (01/10) and 11.2012-07-28
  • 0
    Sorry if the last looks confusing, I'm new to latex, but will try to make it clearer. $$(n,k) => (n+k-1,k) => \frac{(n+k-1)!}{k!((n+k-1)-k)!} = \frac{(n+k-1)!}{k!(n-1)!}$$ $$\frac{(2+2-1)!}{2!(2-1)!} = \frac{3!}{2!(1!)} = \frac{6}{2} = 3$$2012-07-28

1 Answers 1

1

You seem to be talking about $k$-tuples with $n$ possible values. If two of those are randomly chosen, the probability that they differ in the $j$'th position is $1 - 1/n$, and the expected Hamming distance between them is $k(1-1/n)$.

  • 0
    That's not quite right for what I'm looking for. For instance, for an n = 2, k = 2 combination set, the above equation would give 2(1 - 1/2) = 1. However, the way I figure it, it should be more 4/3. Here's how: $00 <-> (01/10) <-> 11$ . Which gives distances of (1,2),(1,1) and (1,2) and averages of 3/2, 2/2, and 3/2. So, the overall average distance would be (3/2 + 2/2 + 3/2) / 3 = ( 8 / 2 ) / 3 = 4/3 . Does that seem right? Perhaps I'm missing something here, or maybe there's a way to alter the above equation to work with what I'm looking for.2012-07-28
  • 0
    The average distance in your example is $1$ if you take into account the possibility that the distance could be $0$. If you're asking for the average distance between two distinct $k$-tuples, you should multiply my answer by $n^k/(n^k-1)$.2012-07-29
  • 0
    Excellent! I think that I wasn't taking into account that some tuples can be represented in more ways than others. This works perfectly though. Thanks a bunch.2012-07-30