|PREV PACKAGE NEXT PACKAGE||FRAMES NO FRAMES|
|a.yyInput||must be implemented by a scanner object to supply input to the parser.|
|b.yyInput||must be implemented by a scanner object to supply input to the parser.|
|many.yyInput||must be implemented by a scanner object to supply input to the parser.|
|a||direct error recovery.|
|b||indirect error recovery.|
|many||generated error recovery.|
|a.yyException||thrown for irrecoverable syntax errors and stack overflow.|
|b.yyException||thrown for irrecoverable syntax errors and stack overflow.|
|many.yyException||thrown for irrecoverable syntax errors and stack overflow.|
Step 5: Error Recovery.
An input error results when the current input does not match
Step 4 demonstrated that the parser will throw
yyException when it detects the
first input error thus terminating the parser.
This behavior is often undesirable. Therefore, there is a special reserved input symbol error which will match an input error (and which significantly changes the parsing algorithm as discussed below). As a first example, consider a.pj which defines a sequence of numbers and permits an input error within the sequence:
seq: // empty | seq Number | seq error;Some experimenting shows that this grammar will cope with any kind of input, because whenever there is an input error the grammar is already expecting it, thanks to the error symbol specified in the right place:
$ mkdir step5 $ java -jar pj.jar -jay -scanner:static - a.pj > step5/a.java $ javac step5/a.java $ java step5.a 1 - 2 - 3 4 5 - 6 (1) at "-": syntax error, expecting Number (1) at "-": syntax error, expecting Number
The input shown above contains three errors: the three minus signs. However, the parser only produces two error messages. Careful inspection of the trace shows why:
$ java step5.a 2>1 | > egrep 'left to recover|expecting|rule .3.' 1 - 2 - 3 4 5 - 6 shift from state 1 to 3 0 left to recover (1) at "-": syntax error, expecting Number reduce state 2 uncover 0 rule (3) seq : seq error shift from state 1 to 3 2 left to recover reduce state 2 uncover 0 rule (3) seq : seq error shift from state 1 to 3 2 left to recover shift from state 1 to 3 1 left to recover shift from state 1 to 3 0 left to recover (1) at "-": syntax error, expecting Number reduce state 2 uncover 0 rule (3) seq : seq errorThree input errors are duly found and matched to the grammar using rule 3. However, the parser maintains a counter yyErrorFlag which is set to three whenever an input error is detected and decremented by one back to zero whenever a real input is matched. A syntax error message is produced only if the counter is set from zero to three. This technique only reports errors that are separated by three or more valid input symbols and avoids annoying bursts of many similar messages.
error is useful if the exact position of an input error can be anticipated. However, as it stands one would have to add rules replacing just about every input symbol by error in order to make the parsing process reasonably robust. The resulting grammars would certainly be unreasonable and likely to be loaded with conflicts. Therefore, the parsing process is changed when an input error needs to be matched.
step 1 that
jay can produce
a file y.output describing the parser that it constructs.
y.output documents the parsing process: The parser starts
in state 0. Every input 'shifts' to a new state which is placed
on the state stack. y.output shows that conceptually a
marker is moved through the applicable rules. Once the marker
reaches the end of a rule, the rule is 'reduced', i.e., its states
are removed from the stack, and by 'goto' the parser reaches a
new state reflecting the nonterminal to which the rule is reduced.
Therefore, at any time the state stack contains all the marked
positions in all the applicable rules.
If an input error happens, the parser tries to match error not only in the top state on the stack, but also in all states, one after another, that are below the top state. As the parser searches for a match for error, the state stack is popped. Parsing can continue, with or without an error message, if a match is found before the stack is empty:
b.pj defines a sequence of numbers, each followed by a semicolon, and permits an input error within the sequence:
seq: // empty | seq item | seq error; item: Number ';';This grammar does not explicitly anticipate that a number is not followed by a semicolon. However, some experimenting shows that this grammar will still cope with any kind of input, because the error search will always eventually uncover rule 3 where the grammar expects an input error:
$ java -jar pj.jar -jay -scanner:static - b.pj > step5/b.java $ javac step5/b.java $ java step5.b 1; 2 + 3; 4 - 5; 6; * 7; (1) at "+": syntax error, expecting ';' (1) at "-": syntax error, expecting ';' (1) at "*": syntax error, expecting Number
The trace shows that for two of the three input errors the state stack has to be popped in order to match error:
(1) at "+": syntax error, expecting ';' pop state 3 on error (1) at "-": syntax error, expecting ';' pop state 3 on error (1) at "*": syntax error, expecting Number
The previous example suggests important principles for an efficient placement of error: If error is used close to the start symbol of the grammar it is likely that the parser will not terminate by throwing yyException. If error is used in iterations it is likely that the parser will recover from input errors within the iterated items.
Step 2 introduced the optional and iterative
|name ?||specifies that name appears at most once.|
|name *||specifies that name appears zero or more times.|
|name +||specifies that name appears one or more times.|
|name / item||specifies that name appears one or more times, separated by a (non-iterative) item such as a literal symbol.|
many.pj is a simple example that exercises each notation:
program: // zero or more | program many | program some | program opt | program list; many: "many" m*; // zero or more 'm' some: "some" s+; // one or more 's' opt: "opt" o?; // zero or one 'o' list: "list" l / ','; // one or more 'l', separated by commas m: "m"; s: "s"; o: "o"; l: "l";If the option -recover is specified
pjwill add error rules when the iterative constructs are expanded. Inspecting the
jayinput is a good way to see how typical iterations can be made robust:
$ java -jar pj.jar -recover many.pj
Zero or more items require one extra rule to be completely robust:
ms: // zero or more | ms m | ms error ;One or more items require two extra rules to catch an input error in either the first or a subsequent item:
ss: s // one or more | ss s | error | ss error ;A delimited list requires more rules:
ls: l // one or more, separated by ',' | ls ',' l | error // likely to introduce r/r conflicts // | ls error | ls error l | ls ',' error ;This shows the limits of automating error recovery: A trailing input error without a preceding delimeter is likely to cause reduce/reduce conflicts when iterative constructs are nested.
pjcannot provide exact control over the order of rules presented to
jay. It is not at all clear that these conflicts would be resolved in a useful fashion. Therefore,
pjdoes not provide totally robust recovery for delimeted lists.
The same argument, unfortunately, applies to optional items, too:
opt_o: // null or one | o // likely to introduce r/r conflicts // | error ;error is a reserved token that can be specified within rules where it has the usual effect, see . However, if a right hand side includes error explicitly its action will return null.
pjsupports a systematic way to use error within all iterations; therefore, explicit use of error should hardly be necessary.
If the property pj.recover is true
error tokens and manages yyErrorFlag so that
the iteration rules are very likely to recover from any input
error. Unfortunately, the technique tends to introduce shift/reduce
conflicts into the resulting grammar.
The conflicts should be investigated but they are likely to be harmless -- an input error is usually reduced in the innermost active iteration, i.e., most of the remaining parse process will still work as intended.
|PREV PACKAGE NEXT PACKAGE||FRAMES NO FRAMES|