Package step6

Step 6: Building Trees.

See:
          Description

Interface Summary
a.yyActions interface between grammar and actions.
a.yyInput must be implemented by a scanner object to supply input to the parser.
b.yyActions interface between grammar and actions.
b.yyInput must be implemented by a scanner object to supply input to the parser.
expr.yyActions interface between grammar and actions.
expr.yyInput must be implemented by a scanner object to supply input to the parser.
nest.yyActions interface between grammar and actions.
nest.yyInput must be implemented by a scanner object to supply input to the parser.
typed.yyActions interface between grammar and actions.
typed.yyInput must be implemented by a scanner object to supply input to the parser.
 

Class Summary
a collecting a list of numbers.
a.yyLex  
a.yyTree tree factory.
b collecting a list of numbers.
b.yyLex  
b.yyTree tree factory.
expr collecting an arithmetic expression as a nested list which can be displayed as an indented tree.
expr.yyLex  
expr.yyTree tree factory.
expr.yyTree.Add  
expr.yyTree.Div  
expr.yyTree.Lit  
expr.yyTree.Minus  
expr.yyTree.Mod  
expr.yyTree.Mul  
expr.yyTree.Sub  
nest collecting an arithmetic expression as a nested list.
nest.yyLex  
nest.yyTree tree factory.
Node base class for tree nodes, instrumented to display an indented tree.
typed collecting an arithmetic expression as a nested list.
typed.yyLex  
typed.yyTree tree factory.
typed.yyTree.Add  
typed.yyTree.Div  
typed.yyTree.Lit  
typed.yyTree.Minus  
typed.yyTree.Mod  
typed.yyTree.Mul  
typed.yyTree.Sub  
 

Exception Summary
a.yyException thrown for irrecoverable syntax errors and stack overflow.
b.yyException thrown for irrecoverable syntax errors and stack overflow.
expr.yyException thrown for irrecoverable syntax errors and stack overflow.
nest.yyException thrown for irrecoverable syntax errors and stack overflow.
typed.yyException thrown for irrecoverable syntax errors and stack overflow.
 

Package step6 Description

Step 6: Building Trees.

Typing Tokens

A token directive can describe many possible input symbols. If a program is to be processed further the scanner must provide information about each symbol that is actually recognized. Therefore, if a classname is specified as part of a token directive the scanner uses the recognized string to construct an object of the class and sends it on to the parser.

For example, a.pj describes a comma-separated list of numbers. Once the token directive is typed

%token <Integer> Number {digits}
digits  = [0-9]+
the scanner will (in this case) represent the individual numbers as Integer objects. The only restriction is that the class must have a constructor with a String as the only parameter.
$ mkdir step6
$ java -jar pj.jar -jay -scanner:static - a.pj > step6/a.java
$ javac step6/a.java
$ java step6.a'$'yyLex
1, 2
scanner         token   value
(1) at "1"      Number  1
(1) at ","      ','
(1) at "2"      Number  2

Collecting Tokens

As soon as the input file is annotated with a classname pj adds actions to all grammar rules and jay arranges that an action is executed whenever a rule is reduced:

list: Number
    { $$ = $1; }
  | list ',' Number
    {{ java.util.ArrayList list = new java.util.ArrayList();
       list.add($1);
       list.add($3);
       $$ = list;
    }}
  ;
By default, the actions use ArrayList objects to collect the typed tokens and the objects returned from other nonterminals whenever the right-hand side of a rule contains more then just one such symbol.
$ java step6.a
1, 2
[1, 2]

As a result, a program is represented as nested lists eventually containing the objects created because of the typed token directives. The graphical trace of a few successive test cases clearly demonstrates that the nest of lists is constructed bottom-up with the intermediate stages stored on the stack maintained by the parser.

$ java step6.a
1
1
$ java step6.a
1, 2
[1, 2]
$ java step6.a
1, 2, 3
[[1, 2], 3]
Of course, left recursion leads to short, deeply nested lists. b.pj demonstrates that an iterative construct can be used to avoid this:
$ java -jar pj.jar -jay -scanner:static - b.pj > step6/b.java
$ javac step6/b.java
$ java step6.b
1
[1]
$ java step6.b
1, 2
[1, 2]
$ java step6.b
1, 2, 3
[1, 2, 3]

Typing Branch Nodes

Nested lists can be used to represent trees. nest.pj illustrates how the usual grammar for arithmetic expressions can be used to construct a nested list for an expression which contains the number operands:

$ java -jar pj.jar -jay -scanner:static - nest.pj > step6/nest.java
$ javac step6/nest.java
$ java step6.nest
- 10 + 20 * 30 / (40 % 7)
[10, [[20, 30], [40, 7]]]

The structure of the nest closely matches the structure of the expression represented as a tree; each individual list corresponds to an operator. Unfortunately, the nature of the operator is lost.

One way to capture the operators is to represent them by subclassing the lists that correspond to the operators. typed.pj contains rules annotated with what amounts to classnames:

<ArrayList> expr: expr '+' expr <Add>
              | expr '-' expr   <Sub>
              | expr '*' expr   <Mul>
              | expr '/' expr   <Div>
              | expr '%' expr   <Mod>
              | '+' expr
              | '-' expr        <Minus>
              | '(' expr ')'
              | Number          <Lit>
              ;
The classname preceding a rule specifies the common base class for all alternatives. pj uses the classname following an alternative to generate a subclass to collect the symbols for that particular alternative. Unless specified otherwise, single symbols are just passeed through; iterative constructs always result in ArrayList objects.

Unfortunately, the result does not immediately look like a big improvement:

$ java -jar pj.jar -jay -scanner:static - -tree:static - typed.pj > step6/typed.java
$ javac step6/typed.java
$ java step6.typed
- 10 + 20 * 30 / (40 % 7)
[[[10]], [[[20], [30]], [[40], [7]]]]
$ java step6.typed
+ 10 + 20 * 30 / (40 % 7)
[[10], [[[20], [30]], [[40], [7]]]]

The two examples above differ in the sign which is applied to 10. The output does indicate a difference in representation: - is represented by an additional list; in fact, a Minus object.

Node.java and expr.pj show how a common base class for all nodes in the tree can be instrumented to provide an indented listing of the tree:

$ java -jar pj.jar -jay -scanner:static - -tree:static - expr.pj > step6/expr.java
$ javac step6/Node.java step6/expr.java
$ java step6.expr
- 10 + 20 * 30 / (40 % 7)
Add
  Minus
    Lit 10
  Div
    Mul
      Lit 20
      Lit 30
    Mod
      Lit 40
      Lit 7

Factory Pattern

As soon as a classname is added to a token directive or a rule, pj creates an interface yyActions with methods which the parser calls whenever an annotated rule is reduced. For example, for expr.pj

%token <Integer> Number ...
%%
<Node> expr: expr '+' expr <Add>
         | ...
         | '(' expr ')'
         | Number          <Lit>
         ;
the interface is
public class expr {
    public interface yyActions {
      Node Add (Node expr, Node expr2);
      ...
      Node Lit (Integer Number);
    }

    public Node yyparse (yyInput yyLex, yyActions yyAct)
        throws java.io.IOException, yyException {
      ...

and the parser should be called with an implementation of the interface. yyActions can be implemented, e.g., to support immediate evaluation.

pj can be instructed to implement the interface as an inlined (optionally static) tree factory

$ java -jar pj.jar -jay -scanner:static - -tree:static - expr.pj > step6/expr.java

or to replace a line containing only %% with the tree factory:

$ cat Tree.template
package step6;
public class Tree implements expr.yyActions {
%%
}
$ java -jar pj.jar -tree Tree.template expr.pj
package step6;
public class Tree implements expr.yyActions {
  public Node Add (Node expr, Node expr2) { // factory method
    return new Add(expr, expr2);
  }

  protected static class Add extends Node { // class for branch node
    protected Add (Node expr, Node expr2) {
      add(expr);
      add(expr2);
    }
    public Node expr () { return (Node)get(0); }
    public Node expr2 () { return (Node)get(1); }
  } ...

A base class for each generated class is expected to have a trivial constructor and to support the methods void add(Object) and Object get(int) to add and inspect descendants, respectively; any class implementing List is suitable as a base class.

Either tree factory can be subclassed and one or all factory methods can be overridden to build other trees, e.g., from subclasses of the inner classes, etc. A tree can be interpreted, or a subclassed tree can be extended, for type checking or evaluation, etc. jag was designed specifically to generate visitors which operate on trees based on the List interface.