1
$\begingroup$

I have done a coding exercise where the problem was to compute the maximal length of a common substring given two strings. Consider strings as finite sequences with elements in the English alphabet and substrings as subsequences in the usual way. For two strings $a=a_1\cdots a_m$ and $b=b_1\cdots b_n$ it seems that $m + n = l + s$ holds, here $l$ is the length of a longest common substring and $s$ is the length of the shortest string having both $a$ and $b$ as substrings. I can prove $s \leq m + n - l$ but can't work out the reverse inequlity to get the equality. Feels like I am missing some really simple counting argument ?

Edit: I should not have written "as usual". For the string $x = x_1\cdots x_k$ a substring is any $x_{i_1}\cdots x_{i_l}$ with $i_1 < \ldots

  • 0
    I don't see why this should be true (unless a substring doesn't have to appear contiguously). For example, if $a$ = "AB" and $b$ = "CBD" then $m = n = 2$, the length of the longest common substring (that is, "B") is 1, but the length of the shortest string having both $a$ and $b$ as substrings is $5$ (that would be "ABCBD" or "CBDAB")....2011-12-22
  • 0
    @sibilant: It’s clear that they *don’t* have to appear contiguously: that’s the point of ‘[c]onsider ... substrings as subsequences in the usual way’.2011-12-22
  • 0
    @Brian, but it is confusing because that is _not_ the "usual way" to use the word "substring".2011-12-22
  • 0
    @Henning: I agree, and I’d not use it that way myself, but it’s clearly what was intended.2011-12-22

1 Answers 1

1

Your conjecture is not true. Consider for example $a=$xyx and $b=$zyz. Then the longest common substring is y, and we have $m=3$, $n=3$, $l=1$, $s=6$, since the shortest strings that contain both xyx and zyz as substrings are xyxzyz and zyzxyx.

On the other hand, if (as suggested by @sibilant) substrings can consists of non-consecutive characters, such that ac counts as a "substring" of abc, then you can prove $l \ge m+n-s$ by considering a string $k$ of length $s$ having both $a$ and $b$ as substrings. Fix particular substring embeddings of $a$ and $b$ in $k$. Each letter in $k$ is part of either $a$ or $b$ (because a letter that is in neither can be omitted, contradicting the minimality of $s$). The letters of $k$ that are used in both $a$ and $b$ are seen to form a common substring of $a$ and $b$; the number of such letters is $a+b-s$.

  • 0
    Thanks. How do you see that the letters in $k$ used in both $a$ and $b$ appear in the same order as they do in $a$ and $b$ making them form a common substring ?2011-12-22
  • 0
    @Andrew: That's part of what it means for $a$ and $b$ to be substrings of $k$, isn't it?2011-12-22
  • 0
    Yes it is. Right now I just don't see the steps clearly. If $k$ is of maxmal length with both $a$ and $b$ as subsequences we want $|k| \geq |a|+|b| - l\;$ where $l$ is the length of a longest common subsequence. Will try again tomorrow.2011-12-22
  • 0
    @Andrew: $k$ is of _minimal_ length, not maximal. And I don't actually show that the common subsequence is longest (it is, but that is not necessary for this direction of the proof).2011-12-22
  • 0
    The minimal string for which zyz and xyx and subsequence is not zyzxyx or xyxzyz but infact it is xzyzx. In this case the formula $m + n = l + s$ is true.2012-11-12