1
$\begingroup$

I have done a coding exercise where the problem was to compute the maximal length of a common substring given two strings. Consider strings as finite sequences with elements in the English alphabet and substrings as subsequences in the usual way. For two strings $a=a_1\cdots a_m$ and $b=b_1\cdots b_n$ it seems that $m + n = l + s$ holds, here $l$ is the length of a longest common substring and $s$ is the length of the shortest string having both $a$ and $b$ as substrings. I can prove $s \leq m + n - l$ but can't work out the reverse inequlity to get the equality. Feels like I am missing some really simple counting argument ?

Edit: I should not have written "as usual". For the string $x = x_1\cdots x_k$ a substring is any $x_{i_1}\cdots x_{i_l}$ with $i_1 < \ldots and the empty string.

  • 0
    @Henning: I agree, and I’d not use it that way myself, but it’s clearly what was intended.2011-12-22

1 Answers 1

1

Your conjecture is not true. Consider for example $a=$xyx and $b=$zyz. Then the longest common substring is y, and we have $m=3$, $n=3$, $l=1$, $s=6$, since the shortest strings that contain both xyx and zyz as substrings are xyxzyz and zyzxyx.

On the other hand, if (as suggested by @sibilant) substrings can consists of non-consecutive characters, such that ac counts as a "substring" of abc, then you can prove $l \ge m+n-s$ by considering a string $k$ of length $s$ having both $a$ and $b$ as substrings. Fix particular substring embeddings of $a$ and $b$ in $k$. Each letter in $k$ is part of either $a$ or $b$ (because a letter that is in neither can be omitted, contradicting the minimality of $s$). The letters of $k$ that are used in both $a$ and $b$ are seen to form a common substring of $a$ and $b$; the number of such letters is $a+b-s$.

  • 0
    The minimal string for which zyz and xyx and subsequence is not zyzxyx or xyxzyz but infact it is xzyzx. In this case the formula $m + n = l + s$ is true.2012-11-12