Language Processing
v2.0

Package vig

Visitor generator.

See:
          Description

Interface Summary
Parser.Tree.Visit what each tree class will do.
Parser.Tree.Visitor what a visitor must do: receive each tree class separately.
 

Class Summary
Coder visitor code generator.
Control visitor code generator: manage match structures.
Parser parser and main program for visitor generator.
Parser.Tree tree factory
Parser.Tree.Actions tree node class.
Parser.Tree.Braces tree node class.
Parser.Tree.Group tree node class.
Parser.Tree.Gt tree node class.
Parser.Tree.Index tree node class.
Parser.Tree.Lt tree node class.
Parser.Tree.Match tree node class.
Parser.Tree.Null tree node class.
Parser.Tree.Parens tree node class.
Parser.Tree.Ref tree node class.
Parser.Tree.Rhs tree node class.
Parser.Tree.Rule tree node class.
Parser.Tree.Rules tree node class.
Parser.Tree.Type tree node class.
Parser.Tree.Var tree node class.
yyLex scanner for visitor generator input.
 

Package vig Description

Visitor generator.

This is the homepage of vig, a Java preprocessor implementing a visitor design pattern with inheritance-based pattern matching for List based trees.

Overview

vig was designed to simplify the implementation of visitors in Java and to move from an interface- to a rule-oriented architecture. vig reads tables of patterns to be recognized in a tree and actions to be executed at the appropriate points in the tree. The actions can, among other things, decide whether and how to visit subtrees.

vig's ultimate ancestor is Wilcox' template coder which was developed around 1970 as part of Cornells implementation of PL/I. The key difference is that vig only uses classes to specify patterns.

vig is implemented with oops3 as a preprocessor emitting visitors as self-contained Java code. Earlier implementations were interpreters, but compilation has the advantage that powerful actions can be written in Java while the selection mechanism still remains very transparent. Effectively, preprocessing extends Java with a control structure for tree processing, but is substantially simpler to implement then a language extension inside a Java compiler.

vig is a relatively simple preprocessor that facilitates the coding of a visitor with a pattern-based table search. vig performs a few semantic checks, e.g., it only permits one table entry per class and does not permit references to subtrees that were not mentioned in a pattern. The scanner is a bit tricky: states are introduced to use different regular expressions to analyze Java code between rule tables, rules, and Java code in actions.

vig is still an experimental tool. It is a successor to jag which has been used to present the principles of reasonably efficient code generation for typical machine architectures in a compact fashion, for type checking, and tree transformation and interpretation.

Usage

vig visitors operate on trees. Leaves may belong to arbitrary classes but all other nodes must implement the List interface. As an example, here is an annotated grammar expr.rfc which oops3 compiles into a frontend which converts an arithmetic expression into a serialized tree suitable for processing with visitors generated with vig:

package eg; public class Parser { %% # skip patterns <void> Comment = '#.*'; <void> WhiteSpace = '[ \r\t\n]+'; # token patterns <Integer> Number = '[0-9]+'; # phrases <> sum: product (add | subtract)*; <Add:Commutative> add: '+' product; <Subtract> subtract: '-' product; <> product: term (multiply | divide)*; <Multiply:Commutative> multiply: '*' term; <Divide> divide: '/' term; term: Number | '(' sum ')'; %% public interface Commutative { } // label commutative operations. }

postfix.vig can be compiled into a visitor which presents the values of leaf nodes and the class names of branch nodes of any List-based tree in postorder:

package eg; import java.io.ObjectInputStream; import java.util.List; public class Postfix { public void postfix (Object node) { %% List: ... { for (Object sub : $0) postfix(sub); // visit subtrees System.out.print(" "+$0.getClass().getSimpleName()); // visit branch node }; Object: { System.out.print(" "+$0); }; // visit leaf node %% } /** tries to dump tree presented as standard input. */ public static void main (String... args) throws Exception { new Postfix().postfix(new ObjectInputStream(System.in).readObject()); System.out.println(); } }

The source file defines a class Postfix with a main program and a visitor method postfix which must have a parameter named node -- the object to be visited. The body of the visitor is a table of rules, enclosed in %%, which vig translates into a switch and a data structure for the selection mechanism. Here, the first rule applies to classes implementing List and objects with an arbitrary number of descendants. The second rule applies to objects of any class descending from Object.

Rules usually include actions, Java code to be executed when an object is visited. vig will replace items involving $ with references to the visited object or descendants -- this is why the visited object must be named node. Here, $0 is used to refer to the visited object, i.e., node; unless instructed otherwise vig supplies appropriate casts.

The example can be compiled and executed as follows:

$ mkdir eg $ java -jar oops3.jar -generate expr.rfc > eg/Parser.java $ javac -cp .:oops3.jar eg/Parser.java $ java -cp vig.jar:oops3.jar < postfix.vig > eg/Postfix.java $ javac eg/Postfix.java $ java -cp .:oops3.jar eg.Parser | > java eg.Postfix 1 + 2 * 3 / ( 4 - 5 ) 1 2 3 Multiply 4 5 Subtract Divide Add

A source file can contain several visitor methods. Stack is another visitor for the same kind of tree which generates pseudo code for a stack machine:

package eg; import java.io.ObjectInputStream; import java.util.List; import eg.Parser.Tree.*; public class Stack { public void visit (Object node) { // postorder traversal of binary tree %% List: Object Object { visit($1); visit($2); code($0); }; Integer: { code($0); }; %% } public void code (Object node) { // stack machine operation %% Add: ... { System.out.println("add"); }; Subtract: ... { System.out.println("sub"); }; Multiply: ... { System.out.println("mul"); }; Divide: ... { System.out.println("div"); }; Number: { System.out.println("load "+$0); }; %% } public static void main (String... args) throws Exception { new Stack().visit(new ObjectInputStream(System.in).readObject()); } }

Stack has two visitor methods. visit arranges for a postorder traversal of the tree; the rules only apply to branch nodes with two descendants and Integer leaves, but the frontend guarantees that. Nodes are visited again using code and that method emits a machine instruction corresponding to the node class. For the input shown above:

load 1 load 2 load 3 mul load 4 load 5 sub div add

Actions are selected by matching node and descendant classes with inheritance acting as a wildcard mechanism. The example can be coded to require less stack space during execution by taking advantage of addition being commutative. Therefore, the frontend was instructed to generate the Add and Multiply classes as implementing a trivial interface Commutative. If the rule

Commutative: Number List { visit($2); visit($1); code($0); };

is added to visit above, the traversal order is changed for addition and multiplication operations with a leaf as a left operand and the generated code is more efficient:

load 2 load 3 mul load 4 load 5 sub div load 1 add

While Stack is certainly a trivial example, it illustrates how vig supports a rule-based approach to visitor design: a visitor can be implemented with a few general rules initially and refined with more rules covering special cases and optimizations later. A visitor method will throw a RuntimeException if it cannot find any rule for a particular node value.

Input Format

Input to vig is a Java source file which may include one or more rule tables, each surrounded by %%. A rule table is intended to form the body of a method which should have a parameter named node. White space is immaterial, comments follow Java conventions.

A rule table entry consists of a class name followed by a colon and a sequence of one or more right-hand sides separated by |; each table entry must be terminated with a semicolon. When an object, i.e., the value of the parameter named node, is visited, the class of node, implemented interfaces, and superclasses, each with increasing distance from the initial class, are looked up in the rule table. The order of rule table entries in the input does not influence the rule lookup. Once a rule with a matching class or interface is found for the visited node, the right-hand sides of the rule are considered in order.

A right-hand side consists of a pattern and an action; either, but not both, can be omitted. A pattern consists of a list of zero or more class or interface names, optionally followed by three periods. The visited node matches the pattern if one of the following conditions is met:

Once the first right-hand side with a matching pattern is found, its action is executed. If there is no action following a pattern the search for a matching table entry continues by considering another class related to the visited node, i.e., the present sequence of right-hand sides is abandoned and a more general class or interface with a new sequence of right-hand sides is looked up. If the rule search ends without executing an action a RuntimeException is thrown. Note that patterns are searched sequentially within a rule table entry, i.e., patterns with more specific classes must be placed near the beginning of the rule table entry for a node class.

An action is Java code enclosed in braces. Within an action references to the visited node or its descendants (if any) can be specified as follows:

$0 references the visited node, cast to the class or interface name of the rule table entry.
$n references the nth descendant of the node, where n is a positive number; the reference is cast to the corresponding class or interface name in the pattern if available.
$id
$(expr)
references the descendant of the node selected by the value of id or expr (which should be non-negative).
$<type>0
$<type>n
$<type>id
$<type>(expr)
this notation specifies an explicit cast to type. Any cast is suppressed if type is omitted.

Code Architecture

vig copies the Java code between the rules tables. Each rule table is converted into a switch which contains the actions with each action's source line number as a case label -- two actions should not appear on the same source line. The Java code from the actions is copied as well, with references replaced as described above.

Following all Java code from the source, vig generates a non-public class with a unique name based on a UUID. This class contains the information from the rule patterns, represented with two inner classes. It also contains a static method which is called from each switch to perform the rule search described above.

This architecture results in self-contained visitors, but each file generated by vig does contain a physical copy of the search algorithm. vig uses templates from a Properties file to perform code generation; the property vig.properties can be defined to change the stem which vig uses to load this file. This makes it relatively easy to change the search algorithm and access to it.

The search algorithm will output a trace as diagnostic output if the property vig.verbose is set to true when a visitor is executed. In the previous, optimized example:

select Add Serializable Commutative@15: 15 select Divide Serializable AbstractList List@12: 12 select Multiply Serializable Commutative@15 AbstractList List@12: 12 select Integer@13: 13 select Integer Comparable Number@27: 27 load 2 ...

This shows each search, the class and interface names which are tried, the source line of the rule which is selected, and the source line of the action which is returned.

Projects

Downloads

Version:
2.0.0, January 2008.


(c) 2008 Axel T. Schreiner