Fast String Correction with Levenshtein-Automata
This paper uses the terms descendent and stroke, and I can't find definitions for them by googling. Could you recommend a book, or page.
Grazie.
Fast String Correction with Levenshtein-Automata
This paper uses the terms descendent and stroke, and I can't find definitions for them by googling. Could you recommend a book, or page.
Grazie.
I don’t know whether this terminology is standard, so I can’t give you a reference, but I can explain how it’s used in the paper. I think that the easiest way to explain is by an example.
Consider the words $u=$ abcde and $v=$ aebbfg; the Levenshtein distance between them is $5$, as illustrated by this sequence of edit operations:
abcde
abde delete c
abbe substitute b for d
aebbe insert e
aebbf substitute f for e
aebbfg insert g
(Other sequences of length $5$ are possible.) The letters of $v$ are of three types.
Those that were in $u$ and were never changed: these are the a and the first b. These letters are the descendants of the a and b in $u$, respectively. (In the paper these letters are the ones ‘not touched by any edit operation’ in the discussion immediately following Lemma 2.0.2.)
Those that are the result of substituting a letter for a letter in $u$: these are the second b and the f. These letters are the descendants of the d and e in $u$, respectively.
Those that were inserted: these are the e and the g. These are not descendants of any letter of $u$.
Note also that the c of $u$ has no descendant in $v$, because it was deleted.
Now construct a bipartite graph $G(u,v)$. Its vertices are the letters of $u$ as one of the parts and the letters of $v$ as the other part. Connect a letter $x$ in $u$ to a letter $y$ in $v$ if and only if $y$ is the descendant of $x$. $G(u,v)$ is the paper’s trace representation, and a stroke in the paper’s terminology is simply one of these edges. In my example the picture looks like this:
a e b b f g
| / | |
a b c d e
(I put $u$ on the bottom and $v$ on the top in order to follow the convention used in Figure 2.1 of the paper.) $G(u,v)$ has four edges; in the paper’s terminology, this trace representation has four strokes. As noted in the discussion immediately following Lemma 2.0.2, no strokes cross, each letter of $u$ that has no attached edge was deleted, and each letter of $v$ that has no attached edge was inserted.