0
$\begingroup$

I have an alphabet A = {a,b,c} and a pattern P = "abcaab".

The task is to build a finite automaton of the transition function (delta) for {0,6} (the length of the pattern) and each element of the alphabet A.

My incorrect solution:

enter image description here

New correct solution:

enter image description here

I've highlighted the field that I am having trouble with in yellow. States run along the top and my alphabet along the side.

Although I believe my solution is correct I don't understand how that field is calculated.

The rest seem obvious since if there isn't a match return to the initial state 0. If 'a' is the next input return to 1 unless in states 3 or 4. However whilst in the accepting state if the next input is 'b' I need to return to state 2.

I know this is because 'a' is guaranteed to before the 'b' otherwise state 6 would not have been reached. What I don't know however is how this is pre-calculated in the COMPUTE-TRANSITION-FUNCTION algorithm.

More details can be found on page 996-1002 of Introduction to Algorithms Third Edition.

I used the table above to match strings in the following text:

enter image description here

  • 1
    I skimmed your initial comment and missed the point. My apologies for the confusion.2011-12-11

2 Answers 2

1

It might be easiest to think of this as minimizing a NFA that recognizes .*abcaab. The NFA has one state for each position in the search string you could have reached yet, and the states in your DFA will consist of sets of the NFA states. By coincidence (because you're only searching for a fixed string) each of the DFA states can be identified by the highest NFA state it contains, and you'll then only need to keep track of which other states you could also have reached.

Thus, in your state 4 you have already seen something that ends with abca, but that really mean that what you have seen so far ends with abca and a both. So If you see an a in this situation you move to state 5 (abcaa), but if you see a b in state 4, you need to discard abca and work on the NFA state b. So a b in state 4 should lead to state 2 rather than to state 0. Otherwise you won't find the match in abcabcaaba.

Also, if you get to state 6 you have already seen all of abcaab. If you move to state 2 for a b in this situation you will have seen abcaabb with two bs at the end, which is not what state 2 expects. Is that a typo in your transition table? I notice that your example moves to state 3 rather than state 2 here.

Depending on exactly how your automaton formalism works, I suspect that you don't actually want to have a state 6 at all. Instead seeing a b in state 5 should accept and move to state 2 because once you have matched abcaab you still have an ab at the end that could be the start of yet another match.

Alternatively, you might want to move to state 6 and have state 6 be an accepting state with exactly the same transitions out of it as for state 2.

  • 0
    @HenningMakholm You are correct. I've updated the question so that others looking will know the correct solution.2011-12-12
1

Figures are a log easier to look at then a table when analyzing finite automata's. I have taken the liberty of constructing the automata given into something more recognizable:

automata

Where state 0 and 6 are start and accepting states, respectively and all unlabeled transitions go back to state 0.

My version of Cormen, Leiserson and Rivest is older than yours (the algorithm you reference is on page 868 in mine) but I believe the content is the same. Here is the algorithm they present:

$\text{Compute-Transition-Function}(P, \Sigma)$
1 $\ \ \ \ m \leftarrow length[P] $
2$\ \ \ \ {\bf \text{for } } q \leftarrow 0 {\bf \text{ to }} m $
3$\ \ \ \ \ \ \ \ {\bf \text{do for }} \text{each character } a \in \Sigma $
4$\ \ \ \ \ \ \ \ \ \ \ \ {\bf \text{do }} k \leftarrow \text{min}(m+1, q+2)$
5$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\bf \text{repeat }} k \leftarrow k - 1 $
6$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\bf \text{until }} P_k \sqsupset P_q a $
7$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \delta(q,a) \leftarrow k $
8$\ \ \ \ {\bf \text{return }} \delta $

They want to construct a DFA of length one longer than the pattern, $P$. The state transition is stored in $\delta$. $\delta(q,a) \leftarrow k$ adds an edge with label $a$ from state $q$ to state $k$. They loop through each state and for each state loop through each character in the alphabet to determine where that state should go.

I believe the problem you are having is with the suffix function on line 6, where the suffix function is '$\sqsupset$'.

The suffix function is defined as follows:

$ w \sqsupset x \rightarrow \exists y \ \text{ s.t. } x = yw $

meaning, $w$ can be found as a suffix of $x$.

In words, the loop that line 6 is contained in is saying "Find the maximum suffix of the strings shorter than the current state $q$ that start at $P[0]$". Once that is found, we know we can 'skip ahead' and transition to that state instead of the beginning. There are some fence post issues and that's why they have the min function there, but the intuition still holds.

By considering all strings, $P_k$, that start at the beginning of the pattern shorter than the current position of the read pattern, $P_q a$, we test each $P_k$ to be a suffix of $P_q a$ and choose the maximum. This is just a more formal and generalized way of expressing your intuition.

  • 1
    @Ash, graphviz. I'm happy to provide source but the dot file is pretty basic2011-12-12