|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
See:
Description
| Interface Summary | |
|---|---|
| expr.yyInput | must be implemented by a scanner object to supply input to the parser. |
| If.yyInput | must be implemented by a scanner object to supply input to the parser. |
| inc.yyInput | must be implemented by a scanner object to supply input to the parser. |
| Class Summary | |
|---|---|
| expr | recognition for lines with arithmetic expressions. |
| expr.yyLex | |
| If | lexical analysis for lines with open if statements and arithmetic expressions with identifiers. |
| If.yyLex | |
| inc | recognition for lines with arithmetic expressions. |
| inc.yyLex | |
| Exception Summary | |
|---|---|
| expr.yyException | thrown for irrecoverable syntax errors and stack overflow. |
| If.yyException | thrown for irrecoverable syntax errors and stack overflow. |
| inc.yyException | thrown for irrecoverable syntax errors and stack overflow. |
Step 4: Checking Recognition.
Step 2 presented a typical grammar for lines
with arithmetic expressions. Step 3 showed how
pj can be used to generate and test a scanner that assembles
input characters to represent symbols such as a Number.
A parser takes input symbols from a scanner and matches them to
a grammar. It is now time to generate a parser and check if
plausible lines with expressions actually conform to the grammar.
The good news is that all it takes is to add a scanner to a grammar to be able to check if the grammar and plausible programs fit together. The bad news is that there could be very little information if things go wrong.
Parsing is the process of matching input symbols to a grammar.
The grammar has a start symbol; for pj this is the first
nonterminal in the input. A syntax tree uses the start symbol as
the root, the branch nodes are nonterminal symbols, and the leaves
are terminal symbols. The tree is ordered: the descendants of
each branch node must form one right hand side of the grammar
rule for the node's nonterminal.
Considering the grammar expr.pj from step 1, the rule
term: '+' term | '-' term | '(' sum ')' | Number;
permits the following tree fragments:
The sequence of leaves of a syntax tree is called a sentence; it consists only of terminal symbols. A parser is faced with a potential sentence, i.e., a sequence of terminal symbols provided by a scanner. The parser has to compute (but not necessarily store) a syntax tree to prove that the sequence is in fact a sentence. For example, for expr.pj the input
1 - 2
results in the following syntax tree:
As an aside: Depending on the grammar, a syntax tree might not be unique. The grammar with the single rule
sum: Number | sum '+' sum;
can produce the following different syntax trees for the same sentence:
Such a grammar is called ambiguous. Step 2 showed
that a similarly left- and right-recursive grammar badexpr.pj has lots of
shift/reduce conflicts.
If jay does not report conflicts for a grammar, a syntax
tree computed for a sentence will be unique; therefore, jay
can prove that a grammar is not ambiguous. (The converse is not
true: jay is not happy with the grammar badbits.pj from step 1 but the grammar does produce unique syntax trees.)
expr.pj is
the grammar for lines with arithmetic expressions developed
previously. If pj generates a scanner with
JLex and a parser with jay it also includes a main
program to test the parser.
$ mkdir step4
$ java -jar pj.jar -jay -scanner:static - expr.pj > step4/expr.java
The parser remains silent if the input is acceptable:
$ javac step4/expr.java $ java step4.expr - 10 + 20 * 30 / (40 % 7)
Incorrect input produces a more violent reaction:
$ java step4.expr
- 10 ** 2
(1) at "*": syntax error, expecting '(' '+' '-' Number
Exception in thread "main" step4.expr$yyException: irrecoverable syntax error
at step4.expr.yyparse(expr.java:348) ...
The generated parser produces a single, reasonably meaningful
error message, but then it throws a yyException. Step 5 will deal with discovering more then one error.
If a parser is checked interactively, the input line causing the error is usually easy to find. However, real grammars and real programs are a bit bigger and it is helpful if the parser error message contains some information to locate the position of the error. The scanner needs to be asked, because the parser has no idea where the unexpected input symbol came from.
Step 3 discussed the interface between the scanner
and the parser. To keep the scanner as simple as possible there
is no explicit provision for a position indication. If the scanner
reads, e.g., from a graphical user interface a position indication
might not even be a string value. However, step 3
illustrated that both, a scanner generated by pj and a
scanner derived from StreamTokenizer, provide a
good textual position indication as the result of
toString().
Such a position indication should be made part of the parser
error message which is always produced by the method yyerror(String,
String[]) which jay generates as part of the parser.
Therefore, if pj generates a scanner, it also generates
a main program to test the parser which overrides the error message
method:
public static void main (String[] args) throws yyException, IOException {
final yyInput scanner = new yyLex(System.in);
expr parser = new expr() {
public void yyerror (String message, String[] expecting) {
super.yyerror(scanner+": "+message, expecting);
}
};
parser.yyparse(scanner);
}
While the example above demonstrated an error reaction by the parser, the actual error was caused by defective input, not by a defective grammar. Using expr.pj there simply is no way to build a syntax tree for
-10 ** 2
because a single * is an input symbol and there is no rule or combination of rules that permits two * adjacent to each other.
inc.pj adds identifiers and pre- and postfix increment operations to the arithmetic expressions:
expr: expr '+' expr | ... | Number | Id | "++" Id | Id "++" ;
$ java -jar pj.jar -jay -scanner:static - inc.pj > step4/inc.java
A classic C puzzle is the following:
$ javac step4/inc.java $ java step4.inc a +++++ b (1) at "++": syntax error, expecting '\n' '%' '*' '+' '-' '/'
The problem is the greedy scanner, as a trace of the scanner shows:
$ java step4.inc'$'yyLex scanner token value a +++++ b (1) at "a" Id (1) at "++" _0 (1) at "++" _0 (1) at "+" '+' (1) at "b" Id (1) at "\n" '\n'
Scanners usually try to assemble the longest possible input symbols; therefore, the five + are assembled into two consecutive postfix increment operations, an arrangement that is not supported by the grammar. The following does not provoke an error:
$ java step4.inc a +++ ++ bA similar puzzle is
$ java step4.inc a +++ bThe input matches the grammar, but it might be useful to know if ++ is applied to a or to b. Tracing the scanner provides a clue
$ java step4.inc'$'yyLex scanner token value a +++ b (1) at "a" Id (1) at "++" _0 (1) at "+" '+' (1) at "b" Id (1) at "\n" '\n'The increment operator is assembled before the addition operator; therefore, it will have to be matched to the grammar following a and not preceding b.
However, there is a more reliable way to check the exact behavior of a parser:
$ java -jar pj.jar -jay -scanner:static - -trace -verbose inc.pj > step4/inc.java
If the -trace option is set for pj, the parse
method yyparse() can be called with an additional
parameter, an observer yyDebug that will be informed about every
action that the parser takes:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Compilation and execution of the
parser now requires the archive yydebug.jar which
contains the package jay.yydebug defining the observer
interface yyDebug. The package also contains
a simple implementation yyDebugAdapter which
writes messages to diagnostic output and is included by pj if the -trace option is set:
$ javac -classpath .:yydebug.jar step4/inc.java $ java -cp .:yydebug.jar step4.inc push state 0 value null reduce state 0 uncover 0 rule (3) lines : goto from state 0 to 2 push state 2 value null a +++ b lex state 2 reading Id value null shift from state 2 to 3 0 left to recover push state 3 value null lex state 3 reading _0 value null shift from state 3 to 12 0 left to recover push state 12 value null reduce state 12 uncover 2 rule (16) expr : Id _0 ...
There is a lot of output which provides the most information if it is carefully analyzed together with the parser description in y.output. Often it is sufficient to just extract the lines beginning with lex and reduce which show which input symbol the parser reads and which right-hand side of a rule is used to compute the syntax tree:
$ java -cp .:yydebug.jar step4.inc | > egrep '^(lex|reduce)' reduce state 0 uncover 0 rule (3) lines : a +++ b lex state 2 reading Id value null lex state 3 reading _0 value null reduce state 12 uncover 2 rule (16) expr : Id _0 lex state 11 reading '+' value null lex state 18 reading Id value null lex state 3 reading '\n' value null reduce state 3 uncover 18 rule (14) expr : Id reduce state 24 uncover 2 rule (5) expr : expr '+' expr ...Clearly, the increment operator (which has been given the name _0 by
pj) is connected to the first identifier,
a, using right-hand side (16) and the result
is connected into a sum using right-hand side (5) much
later.
Tracing the parser also provides the definitive answer to the
dangling else question. If.pj was originally introduced in
step 1 and meanwhile combines arithmetic expressions with
if statements without a trailing fi. The grammar
has the usual shift/reduce conflict:
$ java -jar pj.jar -jay -scanner:static - -trace -verbose If.pj > step4/If.java
jay: 1 shift/reduce conflict.
Tracing demonstrates how an else is attached in a nest of if statements:
$ javac -classpath .:yydebug.jar step4/If.java $ java -cp .:yydebug.jar step4.If | > grep '^reduce.*stmt' if 1 then if 2 then 3; else 4; reduce ... rule (7) stmt : expr ';' reduce ... rule (7) stmt : expr ';' reduce ... rule (6) stmt : __if expr __then stmt __else stmt reduce ... rule (5) stmt : __if expr __then stmt reduce ... rule (9) opt_stmt : stmt reduce ... rule (2) line : opt_stmt '\n'
The parser always computes the syntax tree bottom-up, i.e., beginning with the leaves and ending with the root. The fact that right-hand side (6) is used before (5) documents that the else is attached to the innermost if statement.
yyAnim is another implementation of the
observer interface for the parser which provides animation. pj includes animation if a flag value is attached to -trace.
The value includes 1 if standard input and 2
if standard output should be simulated in a text area in the
animation window.
$ java -jar pj.jar -jay -scanner:static - -trace:3 If.pj > step4/expr.java
3 allows for interactive use of the parser animation:
$ javac -classpath .:yydebug.jar step4/expr.java
$ java -cp .:yydebug.jar step4.expr
In this case input can be typed into the text area at the bottom. A line can be edited using backspace and control-U; once newline or control-D is typed the line is made available to the scanner. Following Unix conventions, control-D at the beginning of a line acts as end of file.
The remaining areas show the operation of the parser. The comment area shows more or less the same information as the trace described earlier. The animation stops if something changes in an area where the checkbutton is set. A click on the continue button continues the animation.
Once a scanner is generated or hand-crafted, pj can
be used to test if plausible programs match a proposed grammar.
If there are error messages, scanner and parser can be traced to
see exactly how the grammar and the program fit together. It is
particularly helpful to compare the trace and y.output to
follow the parser operation.
If a grammar has conflicts, they should be carefully checked. It might be advisable to write little programs and trace their recognition by the parser to verify that conflict resolution works as intended.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||