Fast String Correction with Levenshtein-Automata
This paper uses the terms descendent and stroke, and I can't find definitions for them by googling. Could you recommend a book, or page.
Grazie.
Fast String Correction with Levenshtein-Automata
This paper uses the terms descendent and stroke, and I can't find definitions for them by googling. Could you recommend a book, or page.
Grazie.
I don’t know whether this terminology is standard, so I can’t give you a reference, but I can explain how it’s used in the paper. I think that the easiest way to explain is by an example.
Consider the words $u=$ abcde
and $v=$ aebbfg
; the Levenshtein distance between them is $5$, as illustrated by this sequence of edit operations:
abcde abde delete c abbe substitute b for d aebbe insert e aebbf substitute f for e aebbfg insert g
(Other sequences of length $5$ are possible.) The letters of $v$ are of three types.
Those that were in $u$ and were never changed: these are the a
and the first b
. These letters are the descendants of the a
and b
in $u$, respectively. (In the paper these letters are the ones ‘not touched by any edit operation’ in the discussion immediately following Lemma 2.0.2.)
Those that are the result of substituting a letter for a letter in $u$: these are the second b
and the f
. These letters are the descendants of the d
and e
in $u$, respectively.
Those that were inserted: these are the e
and the g
. These are not descendants of any letter of $u$.
Note also that the c
of $u$ has no descendant in $v$, because it was deleted.
Now construct a bipartite graph $G(u,v)$. Its vertices are the letters of $u$ as one of the parts and the letters of $v$ as the other part. Connect a letter $x$ in $u$ to a letter $y$ in $v$ if and only if $y$ is the descendant of $x$. $G(u,v)$ is the paper’s trace representation, and a stroke in the paper’s terminology is simply one of these edges. In my example the picture looks like this:
a e b b f g | / | | a b c d e
(I put $u$ on the bottom and $v$ on the top in order to follow the convention used in Figure 2.1 of the paper.) $G(u,v)$ has four edges; in the paper’s terminology, this trace representation has four strokes. As noted in the discussion immediately following Lemma 2.0.2, no strokes cross, each letter of $u$ that has no attached edge was deleted, and each letter of $v$ that has no attached edge was inserted.