Java, Objects, and Tools
in Compiler Construction

Axel-Tobias Schreiner

Department of Mathematics and Computer Science

University of Osnabrück, Germany

http://www.inf.uos.de/axel/talks/cb-tools/ [ pdf ] [ code ] [ code.zip ]

Object-oriented approaches can be used to advantage in numerous places in compiler construction. This talk gives an overview and then demonstrates some freely available tools:

a Java-based implementation of lex for lexical analysis

two implementations of yacc supporting bottom-up syntax analysis

a Java-based generator supporting top-down syntax analysis

an EBNF-based generator supporting top-town syntax analysis

The tools should come in handy even if you are not planning to build a compiler next week.

Contents

See Objects in compiler construction.

See Lexical analysis with patterns: JLex.

See Bottom-up syntax analysis: CUP.

See Bottom-up syntax analysis: jay.

See Top-down syntax analysis and tree building: JavaCC, JJTree.

See Top-down syntax analysis with objects: oops.

See Summary.

Links

JLex homepage

1.2.5 5/16/00

CUP homepage

0.10j 7/99

jay homepage

0.7 1998

JavaCC homepage

2.0 2000 (fake)

oops homepage

2000

German lecture notes

1998

Objects in compiler construction

 

symbol table and symbols can be objects

 

processor is generated from specification, takes sentence to tree

tree can consist of objects --
messages for semantic analysis and code generation

image can consist of (persistent) objects --
messages for interpretation and/or code generation

Lexical analysis with patterns: JLex

JLex is a largely compatible implementation of lex / flex in Java and targeted to Java.

JLex accepts a table of regular expressions and actions -- expressed in Java -- and produces a scanner class with a method yylex() that reads the longest input to match one of the regular expressions and executes the corresponding action.

Many, at times cryptic options preceding the table configure the scanner class, the return type of yylex() , etc. There are some problems in dealing with full Unicode.

JLex can be used to write filters and to create scanners for compilers.

Examples

JLex/Num.lex

numbers the lines of a source file

JLex/Pat.lex

shows typical patterns

Incantations

$ java -classpath JLex.jar JLex.Main Num.lex

$ mv Num.lex.java Num.java

$ javac Num.java

$ java Num

hello, world

1 hello, world

Bottom-up syntax analysis: CUP

CUP is a somewhat compatible implementation of yacc/bison in Java and targeted to Java.

CUP accepts an LALR(1) grammar -- expressed in BNF -- and actions -- expressed in Java -- and produces a parser class with a method parse() that matches a sentence over the grammar and executes the actions as the phrases are reduced.

 

 

n : a b { $$ = $1 + $2; }

Many, at times cryptic and ordered options preceding the grammar configure the parser class, etc. The action syntax is quite baroque. Error recovery is not implemented correctly.

CUP can be used to write parsers for compilers.

Example

CUP/Expression.cup

builds a tree for arithmetic expressions

CUP/Scanner.lex

JLex -based scanner frontend

ck/Node.java

library for building Number trees for arithmetic expressions

ck/Go.java

executes serialized trees

Incantations

$ java -classpath java_cup.jar java_cup.Main \

-parser Expression -interface < Expression.cup

$ java -classpath JLex.jar JLex.Main Scanner.lex

$ mv Scanner.lex.java Scanner.java

$ javac -classpath .:ck.jar:java_cup.jar Scanner.java Expression.java

$ java -classpath .:ck.jar:java_cup.jar Expression -c |

java -classpath ck.jar ck.Go

32768 * 2147483648L + 128

4 * 9223372036854775807L

byte short int long float double

-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13

-4 -4 -4 -4 3.6893488E19 3.6893488147419103E19

Bottom-up syntax analysis: jay

jay is Berkeley yacc implemented in C and retargeted to Java.

jay accepts an LALR(1) grammar -- expressed in BNF -- and actions -- expressed in Java -- and produces a parser class with a method yyparse() that matches a sentence over the grammar and executes the actions as the phrases are reduced.

jay can be used just like CUP but avoids numerous shortcomings. jay acts as a filter and many aspects of the Java targeting can be modified. There is a debug interface which is implemented to provide animation and tracing without changes to the source.

 

Example

jay/skeleton

Java targeting template

jay/Expression.jay

builds a tree for arithmetic expressions

jay/Scanner.lex

JLex -based scanner frontend

Incantations

$ jay < skeleton Expression.jay > Expression.java

$ java -classpath JLex.jar JLex.Main Scanner.lex

$ mv Scanner.lex.java Scanner.java

$ javac -classpath .:ck.jar javac Scanner.java Expression.java

$ java -classpath .:ck.jar Expression -c | java -classpath ck.jar ck.Go

32768 * 2147483648L + 128

4 * 9223372036854775807L

byte short int long float double

-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13

-4 -4 -4 -4 3.6893488E19 3.6893488147419103E19

Top-down syntax analysis and tree building: JavaCC, JJTree

JavaCC is a lexical analyzer and parser generator implemented in Java and targeted to Java. It was written at Sun and is distributed by Metamata.

JavaCC accepts an LL(n) grammar and actions -- expressed in a mixture of EBNF and Java -- and produces a scanner and a parser class with recursive descent parsing methods that match phrases over the grammar and execute the actions as the phrases are reduced.

JJDoc attempts to extract the EBNF grammar from the JavaCC source. This effort tends to be futile, because Java actions and error recovery can interoperate with the parsing process.

JJTree is a tree builder implemented in Java and targeted to JavaCC . It is distributed with JavaCC .

JJTree accepts JavaCC input annotated for tree building and optionally produces the sources for problem-specific tree node classes and a Visitor interface.

Many options preceding the grammar configure the system. The syntax is incredibly baroque as it extends Java by EBNF and tree annotations with very context-dependent environments for actions, etc. Only very coarse error recovery is implemented, based on exceptions.

JJTree can be used to translate character sequences into parse trees.

Example s

JavaCC/Expression.jj

build trees for arithmetic expressions

JJTree/Expression.jjt

JJTree/Eval.java

visitor for evaluation

JJTree/Gen.java

visitor for persistent tree generation

ck/Node.java

library for building Number trees for arithmetic expressions

ck/Go.java

executes serialized trees

Incantations

$ java -classpath JavaCC.zip COM.sun.labs.jjtree.Main Expression.jjt

$ sed '/^public/s/{.*/{ Number val;/' Lit.java > xx && mv xx Lit.java

$ java -classpath JavaCC.zip COM.sun.labs.javacc.Main Expression.jj

$ javac Expression.java

$ java -classpath .:ck.jar Expression -c |

java -classpath ck.jar ck.Go

32768 * 2147483648L + 128

byte short int long float double

-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13

$ java -classpath .:../ck/ck.jar Expression

2.1 + 3.2 * 4.3

Add

Lit

Mul

Lit

Lit

15.860001

Top-down syntax analysis with objects: oops

 

oops is a parser generator implemented in Java and targeted to Java.

oops accepts an LL(1) grammar -- expressed in EBNF -- and actions -- expressed as Java classes implementing a Goal or Reduce interface -- and produces presistent objects with recursive descent parsing methods that match a sentence over the grammar and message action objects as symbols are accepted and phrases reduced.

 

Rule receives parse() and creates object implementing Goal from non-terminal's class. Goal receives shift() with input symbol or completed Rule . Finally, Rule sends reduce() and Goal 's result is shifted for Rule to superior Goal .

User must provide Goal classes for non-terminals. Retargeting is possible by changing packages. GoalAdapter and GoalDebugger are trivial default implementations;
Reduce is an alternative interface providing only an array of shifted values.

Example

oops/oops.ebnf

grammar for arithmetic expressions

oops/Scanner.lex

JLex -based scanner frontend

oops/lines.java
oops/sum.java
oops/product.java
oops/term.java

Goal classes for tree building or evaluation

ck/Node.java

library for building Number trees for arithmetic expressions

ck/Go.java

executes serialized trees

Incantations

$ java -classpath JLex.jar JLex.Main Scanner.lex

$ mv Scanner.lex.java Scanner.java

$ javac -classpath ..:oops.jar:ck.jar Scanner.java \

lines.java sum.java product.java term.java

$ java -classpath ..:oops.jar:ck.jar oops.boot.Oops oops.ebnf > oops.ser

$ java -classpath ..:oops.jar:ck.jar -Dlines.gen=true \

oops.Compile oops.ser 'oops.Scanner$Input' |

java -classpath ck.jar ck.Go

32768 * 2147483648L + 128

byte short int long float double

-128 128 128 70368744177792 7.0368744E13 7.0368744177792E13

$ java -classpath ..:oops.jar:ck.jar \

oops.Compile -d oops.ser 'oops.Scanner$Input'

2.1 + 3.2 * 4.3

oops.parser.Parser {

lines : [{ ( [ sum ] "\n" ) }] .

lookahead {[empty], "-", "+", "(", "\n", Number}

sum : ( product [{ ( sum.add | sum.sub ) }] ) .

lookahead {"-", "+", "(", Number}

sum.add : ( "+" product ) .

lookahead {"+"}

sum.sub : ( "-" product ) .

lookahead {"-"}

product : ( term [{ ( product.mul | product.div | product.mod ) }] ) .

lookahead {"-", "+", "(", Number}

product.mul : ( "*" term ) .

lookahead {"*"}

product.div : ( "/" term ) .

lookahead {"/"}

product.mod : ( "%" term ) .

lookahead {"%"}

term : ( Number | ( "+" term ) | term.minus | ( "(" sum ")" ) ) .

lookahead {"-", "+", "(", Number}

term.minus : ( "-" term ) .

lookahead {"-"}

}

15.86

Summary

 

JLex

CUP

jay

JavaCC

oops

purpose

scanner

parser

parser

scanner parser builder

parser

technique

FSA

LALR(1)

LALR(1)

LL(n)

LL(1)

specification

regular expressions

BNF and actions

BNF and actions

EBNF mixed with Java

EBNF, Java separated

error recovery

 

(broken)

error , yyerrok

rudimentary

automatic

debugging

 

 

trace, animation

builder: dump()

trace

 

 

 

 

 

 

platform

Java

Java

C

Java

Java

learning curve

lex

yacc, baroque

yacc

very baroque

EBNF, Goal interface

status

stable

errors

stable

stable, no error recovery

stable, but experimental

support

maybe

unlikely

uos.de

metamata.
com

uos.de

Two newer, commercially supported tools should be considered: Metamata's Parse and Terence Pratt's ANTLR .