Department of Mathematics and Computer Science
University of Osnabrück, Germany
http://www.inf.uos.de/axel/talks/cb-tools/ [ pdf ] [ code ] [ code.zip ]
Object-oriented approaches can be used to advantage in numerous places in compiler construction. This talk gives an overview and then demonstrates some freely available tools:
a Java-based implementation of lex for lexical analysis
two implementations of yacc supporting bottom-up syntax analysis
a Java-based generator supporting top-down syntax analysis
an EBNF-based generator supporting top-town syntax analysis
The tools should come in handy even if you are not planning to build a compiler next week.
symbol table and symbols can be objects
processor is generated from specification, takes sentence to tree
tree can consist of objects --
messages for semantic analysis and code generation
image can consist of (persistent) objects --
messages for interpretation and/or code generation
JLex is a largely compatible implementation of lex / flex in Java and targeted to Java.
JLex accepts a table of regular expressions and actions -- expressed in Java -- and produces a scanner class with a method yylex() that reads the longest input to match one of the regular expressions and executes the corresponding action.
Many, at times cryptic options preceding the table configure the scanner class, the return type of yylex() , etc. There are some problems in dealing with full Unicode.
JLex can be used to write filters and to create scanners for compilers.
CUP is a somewhat compatible implementation of yacc/bison in Java and targeted to Java.
CUP accepts an LALR(1) grammar -- expressed in BNF -- and actions -- expressed in Java -- and produces a parser class with a method parse() that matches a sentence over the grammar and executes the actions as the phrases are reduced.
Many, at times cryptic and ordered options preceding the grammar configure the parser class, etc. The action syntax is quite baroque. Error recovery is not implemented correctly.
CUP can be used to write parsers for compilers.
|
library for building Number trees for arithmetic expressions |
|
$ java -classpath java_cup.jar java_cup.Main \
-parser Expression -interface < Expression.cup
$ java -classpath JLex.jar JLex.Main Scanner.lex
$ mv Scanner.lex.java Scanner.java
$ javac -classpath .:ck.jar:java_cup.jar Scanner.java Expression.java
$ java -classpath .:ck.jar:java_cup.jar Expression -c |
byte short int long float double
jay is Berkeley yacc implemented in C and retargeted to Java.
jay accepts an LALR(1) grammar -- expressed in BNF -- and actions -- expressed in Java -- and produces a parser class with a method yyparse() that matches a sentence over the grammar and executes the actions as the phrases are reduced.
jay can be used just like CUP but avoids numerous shortcomings. jay acts as a filter and many aspects of the Java targeting can be modified. There is a debug interface which is implemented to provide animation and tracing without changes to the source.
$ jay < skeleton Expression.jay > Expression.java
$ java -classpath JLex.jar JLex.Main Scanner.lex
$ mv Scanner.lex.java Scanner.java
$ javac -classpath .:ck.jar javac Scanner.java Expression.java
$ java -classpath .:ck.jar Expression -c | java -classpath ck.jar ck.Go
byte short int long float double
JavaCC is a lexical analyzer and parser generator implemented in Java and targeted to Java. It was written at Sun and is distributed by Metamata.
JavaCC accepts an LL(n) grammar and actions -- expressed in a mixture of EBNF and Java -- and produces a scanner and a parser class with recursive descent parsing methods that match phrases over the grammar and execute the actions as the phrases are reduced.
JJDoc attempts to extract the EBNF grammar from the JavaCC source. This effort tends to be futile, because Java actions and error recovery can interoperate with the parsing process.
JJTree is a tree builder implemented in Java and targeted to JavaCC . It is distributed with JavaCC .
JJTree accepts JavaCC input annotated for tree building and optionally produces the sources for problem-specific tree node classes and a Visitor interface.
Many options preceding the grammar configure the system. The syntax is incredibly baroque as it extends Java by EBNF and tree annotations with very context-dependent environments for actions, etc. Only very coarse error recovery is implemented, based on exceptions.
JJTree can be used to translate character sequences into parse trees.
|
library for building Number trees for arithmetic expressions |
|
$ java -classpath JavaCC.zip COM.sun.labs.jjtree.Main Expression.jjt
$ sed '/^public/s/{.*/{ Number val;/' Lit.java > xx && mv xx Lit.java
$ java -classpath JavaCC.zip COM.sun.labs.javacc.Main Expression.jj
$ java -classpath .:ck.jar Expression -c |
byte short int long float double
-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13
oops is a parser generator implemented in Java and targeted to Java.
oops accepts an LL(1) grammar -- expressed in EBNF -- and actions -- expressed as Java classes implementing a Goal or Reduce interface -- and produces presistent objects with recursive descent parsing methods that match a sentence over the grammar and message action objects as symbols are accepted and phrases reduced.
Rule receives parse() and creates object implementing Goal from non-terminal's class. Goal receives shift() with input symbol or completed Rule . Finally, Rule sends reduce() and Goal 's result is shifted for Rule to superior Goal .
User must provide
Goal
classes for non-terminals. Retargeting is possible by changing packages.
GoalAdapter
and
GoalDebugger
are trivial default implementations;
Reduce
is an alternative interface providing only an array of shifted values.
|
oops/lines.java |
|
|
library for building Number trees for arithmetic expressions |
|
$ java -classpath JLex.jar JLex.Main Scanner.lex
$ mv Scanner.lex.java Scanner.java
$ javac -classpath ..:oops.jar:ck.jar Scanner.java \
lines.java sum.java product.java term.java
$ java -classpath ..:oops.jar:ck.jar oops.boot.Oops oops.ebnf > oops.ser
$ java -classpath ..:oops.jar:ck.jar -Dlines.gen=true \
oops.Compile oops.ser 'oops.Scanner$Input' |
byte short int long float double
-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13
$ java -classpath ..:oops.jar:ck.jar \
oops.Compile -d oops.ser 'oops.Scanner$Input'
lines : [{ ( [ sum ] "\n" ) }] .
lookahead {[empty], "-", "+", "(", "\n", Number}
sum : ( product [{ ( sum.add | sum.sub ) }] ) .
lookahead {"-", "+", "(", Number}
product : ( term [{ ( product.mul | product.div | product.mod ) }] ) .
lookahead {"-", "+", "(", Number}
term : ( Number | ( "+" term ) | term.minus | ( "(" sum ")" ) ) .
Two newer, commercially supported tools should be considered: Metamata's Parse and Terence Pratt's ANTLR .