0
$\begingroup$

A DNA molecule can be represented as a string of symbols $A, C, G$ and $T$, such as $GGATAATTCTAG\ldots GACCGTACCC$.

For the purposes of this question, we will assume that all DNA molecules contain the same (large!) number $N$ of symbols. Thus, a DNA molecule is an $N$-tuple $x = (x_1,\ldots , x_N)$ where $x_i\in\{A,C,G,T\}$ for each $i$.

Define the distance between two DNA molecules $x, y$ as the number of elements $i$ in ${1, \ldots ,N}$ such that $x_i\neq y_i$.

Prove that this defines a metric on the set of DNA molecules.

Tutor defined a set $S(x,y)=\{1,3,6\}$. Why is this? The rest of the working after that is fine.

  • 0
    See [Hammi$n$g dista$n$ce](http://e$n$.wikipedia.org/wiki/Hamming_distance).2011-10-28

1 Answers 1

6

Hint: for a problem like this, where you are asked to show that some object (this measure of distance) belongs to some class (metrics) you are asked to verify the definition of the class. I found them very useful to check my understanding of a definition. So just go down the requirements for a metric and see if it works. Clearly the distance between two strings is $\ge 0$, is only $0$ when the strings are identical, and is symmetric. The only one that takes a bit of work is the triangle inequality.

  • 0
    Presumably that was the list of positions where they disagree. Then the distance would be the size of $S$, which is $3$. Does that work?2011-11-04