Copyright ©1996-1998 by Axel T. Schreiner.  All Rights Reserved.



8
The parser generator JavaCC

javacc is a LL(k) parser generator for Java. It has similar goals as yacc and lex but there are significant differences:

It generates a parser class with methods that recognize phrases. The representation of the grammar follows this model.

Tokens are recognized by regular expressions which are specified locally or globally within the grammar.

Errors are not reported relative to the original source.

Normal operation is LL(1); ambiguities are eliminated by local or global, syntactic or semantic measures.

Parsers can be generated so that there can be multiple instances and they can be operated in parallel.

Installation

$ java JavaCC0_7pre7 -o path/JavaCC
$ java Docs -o path/JavaCC/doc

Without the switch -o the installer will run a graphical interface to select the installation parameters.

Documentation

$ netscape path/JavaCC/doc/DOC/index.html

this produces a page which references the documentation of the various components:

javacc the parser generator
jjtree a preprocessor to generate parse trees
jjdoc a primitive preprocessor to create documentation

Many complete examples in path/JavaCC/examples are very helpful. This section explains a javacc source with a number of small examples.



Example

Calc* reads lines with arithmetic expressions from standard input and evaluates them with int arithmetic.

Calc* is developed in stages, it is supposed to demonstrate the important aspects of the parser generator, and it is patterned after examples by Chuck McManis and Sun which are delivered with javacc.

Recognition -- Calc0.jj

Calc0 reads lines with arithmetic expressions from standard input. Output is only produced if there are errors. This step mostly serves to introduce techniques and tools.

$ javacc Calc0.jj
Java Compiler Compiler Version 0.7pre7 (Parser Generator)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "javacc" with no arguments for help)
Reading from file Calc0.jj . . .
File "TokenMgrError.java" does not exist.  Will create one.
File "ParseException.java" does not exist.  Will create one.
File "Token.java" does not exist.  Will create one.
File "ASCII_CharStream.java" does not exist.  Will create one.
Parser generated successfully.
$ javac Calc0.java
Note: ./ASCII_CharStream.java uses a deprecated API.
Recompile with "-deprecation" for details.
1 warning

javacc must be on PATH; the generated code does not depend on the generator environment.

$ java Calc0
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
Was expecting one of:
<CONSTANT> ...
"+" ...
"-" ...
"(" ...

at Calc0.generateParseException(Calc0.java:258)
at Calc0.jj_consume_token(Calc0.java:197)
at Calc0.term(Calc0.java:124)
at Calc0.product(Calc0.java:100)
at Calc0.sum(Calc0.java:43)
at Calc0.expr(Calc0.java:22)
at Calc0.main(Calc0.java:9)



{programs/javacc/Calc0.jj}
// based on examples by Chuck McManis
// recognize arithmetic expressions

PARSER_BEGIN(Calc0)  // must define parser class
public class Calc0 {
public static void main (String args []) {
Calc0 parser = new Calc0(System.in);
for (;;)
try {
if (parser.expr() == -1)
System.exit(0);
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc0)

SKIP:    // defines input to be ignored
{ " " | "\r" | "\t"
}

TOKEN:    // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}

int expr():   // expr: sum \n
{}    // -1 at eof, 0 at eol
{ sum() <EOL> { return 1; }
| <EOL> { return 0; }
| <EOF> { return -1; }
}

void sum():   // sum: product { +- product }
{}
{ product() ( ( "+" | "-" ) product() )*
}

void product():   // product: term { *%/ term }
{}
{ term() ( ( "*" | "%" | "/" ) term() )*
}

void term():   // term: +term | -term | (sum) | number
{}
{ "+" term()
| "-" term()
| "(" sum() ")"
| <CONSTANT>
}
{}



File format

options { optional, see documentation
name = value ; ...
}
PARSER_BEGIN(name) specifies parser class and file names
package path; optional, defines package
import path; ... optional, copied as is
public class name ... { must define parser class
...
}
PARSER_END(name)
rule ...

Comments, names, constants, and operators (except for shift combinations) follow the rules of Java.

Syntax rules

type name ( ... ) throws ... : defines method and nonterminal (no overloading)
{ locals } defines local variables
{ formula } explains nonterminal

JAVACODE
type name ( ... ) throws ... { defines nonterminal
... by Java code
}

formula is a sequence of patterns:

re regular expression, recognizes Token
lval = re and assigns Token
name ( ... ) recognizes nonterminal
lval = name ( ... ) and assigns result
... | ... alternative
( ... ) group
( ... )* zero or more times
( ... )?  or  [ ... ] optional
( ... )+ one or more times
{ ... } Java code, e.g., with return
LOOKAHEAD( ... ) controls local lookahead, see documentation



Lexical rules

< state , ... > optional, defines name for state
TOKEN recognition sends Token to parser
[ IGNORE_CASE ] optional, prevents differentiating upper/lower case
:
{ re recognizes input
{ ... } optional, executed during recognition
: state optional, defines next state
| ...
}

Similarly, SKIP can be used to ignore input, MORE collects input for the next TOKEN and SPECIAL_TOKEN collects input and combines it with the next TOKEN, see documentation.

re is a regular expression:

"..." Java string, represents itself
< name : pattern > defines pattern
< #name : pattern > defines local pattern
< name > references pattern
< EOF > represents end of file

pattern is a pattern:

"..." Java string, represents itself
< name > references pattern
... | ... alternative
( ... ) group
( ... )* zero or more times
( ... )? optional
( ... )+ one or more times
[ ... , ... ] list of characters
~ [ ... , ... ] complement of a list of characters
"x" one character in the list
"x" - "y" a range of characters in the list



Error recovery -- Calc1.jj

Calc1 reads lines with arithmetic expressions from standard input. Output is only produced if there are errors. This step shows how to continue recognition in the presence of errors.

$ java Calc0
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
... first error terminates recognition

An error is reported with a ParseException which can be caught:

$ java Calc1
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
Was expecting one of:
<CONSTANT> ...
"+" ...
"-" ...
"(" ...

37/4

For a new InputStream a parser should be reset using ReInit(InputStream). Alternatively the option STATIC = false can be set and a new parser instantiated, see documentation.



{programs/javacc/Calc1.jj}
PARSER_BEGIN(Calc1)
public class Calc1 {
public static void main (String args []) {
Calc1 parser = new Calc1(System.in);
for (;;)
try {
if (parser.expr() == -1)
System.exit(0);
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc1)
SKIP:    // defines input to be ignored
{ " " | "\r" | "\t"
}
TOKEN:    // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}
int expr():   // expr: sum \n
{}    // -1 at eof, 0 at eol/error
{ try {
( sum() <EOL> { return 1; }
| <EOL>  { return 0; }
| <EOF>  { return -1; }
)
} catch (ParseException pe) {
System.err.println(pe);
for (;;)
switch (getNextToken().kind) {
case EOF: return -1;
case EOL: return 0;
}
}
}
void sum(): {}   // sum: product { +- product }
{ product() ( ( "+" | "-" ) product() )*
}
void product(): {}  // product: term { *%/ term }
{ term() ( ( "*" | "%" | "/" ) term() )*
}
void term(): {}   // term: +term | -term | (sum) | number
{ "+" term()
| "-" term()
| "(" sum() ")"
| <CONSTANT>
}
{}


Evaluation -- Calc2.jj

Calc2 reads lines with arithmetic expressions from standard input and produces their value using integer arithmetic.

$ java Calc2
12
12
12-34
-22
1/0
java.lang.ArithmeticException: / by zero
12-4*5
-8
12-(-6+3)
15

This step illustrates a use for the results of the nonterminal functions.
{programs/javacc/Calc2.jj}
PARSER_BEGIN(Calc2)
public class Calc2 {
public static void main (String args []) {
Calc2 parser = new Calc2(System.in);
for (;;)
try {
if (parser.expr() == -1)
System.exit(0);
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc2)

SKIP:    // defines input to be ignored
{ " " | "\r" | "\t"
}

TOKEN:    // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}



int expr() throws Exception: // expr: sum \n
{ int e; }   // prints result; -1 at eof, 0 at eol/error
{ try {
( e = sum() <EOL> { System.out.println("\t"+e); return 1; }
| <EOL>  { return 0; }
| <EOF>  { return -1; }
)
} catch (Exception err) {
if (err instanceof ParseException || err instanceof ArithmeticException
|| err instanceof NumberFormatException) {
System.err.println(err);
for (;;)
switch (getNextToken().kind) {
case EOF: return -1;
case EOL: return 0;
}
}
throw err;
}
}

int sum() throws NumberFormatException:  // sum: product { +- product }
{ int s, r; }     // returns value
{ s = product()
( "+" r = product() { s += r; }
| "-" r = product() { s -= r; }
)*   { return s; }
}

int product() throws NumberFormatException: // product: term { *%/ term }
{ int p, r;}     // returns value
{ p = term()
( "*" r = term() { p *= r; }
| "%" r = term() { p %= r; }
| "/" r = term() { p /= r; }
)*   { return p; }
}

int term() throws NumberFormatException:
// term: +term | -term | (sum) | number
{ int t; }   // returns value
{ "+" t = term() { return t; }
| "-" t = term() { return -t; }
| "(" t = sum() ")" { return t; }
| <CONSTANT>  { return Integer.parseInt(token.image); }
}
{}



Storing -- Calc3.jjt

Calc3 reads lines with arithmetic expressions from standard input and produces the full syntax tree. This step introduces the tree generator jjtree.

$ jjtree Calc3.jjt
Java Compiler Compiler Version 0.7pre7 (Tree Builder Version 0.3pre4)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "jjtree" with no arguments for help)
Reading from file Calc3.jjt . . .
File "Node.java" does not exist.  Will create one.
File "SimpleNode.java" does not exist.  Will create one.
Annotated grammar generated successfully in Calc3.jj
$ javacc Calc3.jj
Java Compiler Compiler Version 0.7pre7 (Parser Generator)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "javacc" with no arguments for help)
Reading from file Calc3.jj . . .
Parser generated successfully.
$ javac Calc3.java
$ java Calc3
1
expr
sum
product
term
1*(3+4)
expr
sum
product
term
term
sum
product
term
product
term

jjtree changes a javacc grammar so that a more or less complete syntax tree is built from Node objects for the nonterminals. By default, all nodes belong to SimpleNode and there is a method dump() which can output the names of the nonterminals. If the parser is successful the root of the tree can be obtained from the generator state jjtree using rootNode().



{programs/javacc/Calc3.jjt}
PARSER_BEGIN(Calc3)
public class Calc3 {
public static void main (String args []) {
Calc3 parser = new Calc3(System.in);
for (;; jjtree.reset()) // restart tree builder
try {
switch (parser.expr()) {
default: ((SimpleNode)jjtree.rootNode()).dump("\t");
case 0: break;
case -1: System.exit(0);
}
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc3)
SKIP:    // defines input to be ignored
{ " " | "\r" | "\t"
}
TOKEN:    // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}
int expr():   // expr: sum \n
{}    // -1 at eof, 0 at eol/error
{ try {
( sum() <EOL> { return 1; }
| <EOL>  { return 0; }
| <EOF>  { return -1; }
)
} catch (ParseException pe) {
System.err.println(pe);
for (;;)
switch (getNextToken().kind) {
case EOF: return -1;
case EOL: return 0;
}
}
}
void sum(): {}   // sum: product { +- product }
{ product() ( ( "+" | "-" ) product() )*
}
void product(): {}  // product: term { *%/ term }
{ term() ( ( "*" | "%" | "/" ) term() )*
}
void term(): {}   // term: +term | -term | (sum) | number
{ "+" term()
| "-" term()
| "(" sum() ")"
| <CONSTANT>
}
{}


Operator tree -- Calc4/Calc4.jjt

Calc4 reads lines with arithmetic expressions from standard input and produces a simplified syntax tree.

$ java Calc4
2/3 - 4*5
Sub
Div
IntLit
IntLit
Mul
IntLit
IntLit

This step shows how to use jjtree to build a tree that can be used for an interpreter or code generator. This solution resides in it's own directory because it cannot coexist with Calc3 -- unfortunately, SimpleNode depends on the parser class and is the base class of the individual node classes.
{programs/javacc/Calc4/Calc4.jjt}
options {
NODE_DEFAULT_VOID = true; // don't generate nodes by default
NODE_PREFIX = "";  // no prefix to node class names
MULTI = true;   // many class names
}
PARSER_BEGIN(Calc4)
public class Calc4 {
public static void main (String args []) {
Calc4 parser = new Calc4(System.in);
for (;; jjtree.reset())
try {
switch (parser.expr()) {
default: ((SimpleNode)jjtree.rootNode()).dump("\t");
case 0: break;
case -1: System.exit(0);
}
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc4)

SKIP:    // defines input to be ignored
{ " " | "\r" | "\t"
}



TOKEN:    // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}

int expr():   // expr: sum \n
{}    // -1 at eof, 0 at eol/error
{ try {
( sum() <EOL> { return 1; }
| <EOL>  { return 0; }
| <EOF>  { return -1; }
)
} catch (ParseException pe) {
System.err.println(pe);
for (;;)
switch (getNextToken().kind) {
case EOF: return -1;
case EOL: return 0;
}
}
}

void sum():   // sum: product { +- product }
{}
{ product()
( "+" product() #Add(2) // Add with 2 descendants
| "-" product() #Sub(2)
)*
}

void product():   // product: term { *%/ term }
{}
{ term()
( "*" term() #Mul(2)
| "%" term() #Rem(2)
| "/" term() #Div(2)
)*
}

void term():   // term: +term | -term | (sum) | number
{}
{ "+" term()  // no need to make node
| "-" term() #Neg(1) // insert negation node
| "(" sum() ")"
| ( <CONSTANT> ) #IntLit // create terminal node
}
{}



jjtree operations

jjtree places nodes for nonterminals(!) onto an internal stack and typically collects them for each recognized nonterminal:

typ name ( ... ) throws ... : creates ASTname node.
typ name ( ... ) throws ... #class : creates ASTclass node.
typ name ( ... ) throws ... #void : does not create a node, just as per option NODE_DEFAULT_VOID = true.

If a nonterminal is called, it's scope is opened. Nodes for recognized nonterminals are then stored in the scope. Postfix operations inline and at groups can decide how many nodes should be combined and stored as a new node.

#class(number) creates ASTclass node with number descendants.
#class(condition) if condition is true, creates ASTclass node with all descendants in the current scope.
#class(>1) special case: creates ASTclass node if there is more than one descendant.
#class special case: creates ASTclass node with all descendants in the scope.

The nodes can be accessed: jjtThis refers to the node that is currently created and has the proper class, possibly set by a postfix specification.

In the last action in a scope the node is already complete so that methods such as jjtGetChild() can be used. If the node is created subject to a condition nodeCreated() can be used to test if the condition was true.



Infix creation and evaluation -- Calc5

Calc5 reads lines with arithmetic expressions from standard input, outputs them with a minimal number of parentheses, and shows the values using integer arithmetic.

$ java Calc5
--1 - (-1)
1 - - 1
2
(32/5) + 3
32 / 5 + 3
9
1+(32/5)
1 + 32 / 5
7
1/(32/5)
1 / (32 / 5)
0

This step illustrates an interface that jjtree optionally inserts into the node classes. It can be used to implement the VISITOR design pattern.

In Calc5.jjt the option VISITOR = true is set and traversals are started in main():

switch (parser.expr()) {
default: System.out.println("\t"+
jjtree.rootNode().jjtAccept(new Infix(), null));
System.out.println("\t"+ ((int[])
jjtree.rootNode().jjtAccept(new Value(), null))[0]);

Furthermore, a component Integer val is added to IntLit and in term() the value of a CONSTANT is stored there:

void term():   // term: +term | -term | (sum) | number
{ Token t; }
{ ...
| ( t = <CONSTANT> { jjtThis.val = new Integer(t.image); }
) #IntLit   // requires val in IntLit
}



Infix has to implement the interface Calc5Visitor . Each visit() must return it's partial tree as a String. For a leaf a Number is sufficient:
{programs/javacc/Calc5/Infix.java}
public class Infix implements Calc5Visitor {
public Object visit (IntLit node, Object data) {
return node.val; // package accessible...
}
{}
For binary operators the result is pieced together by traversing the subtrees:
{programs/javacc/Calc5/Infix.java}
public Object visit(Add node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);

return aVal +" + "+ bVal;
}
public Object visit(Sub node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);

if (b instanceof Add || b instanceof Sub)
bVal = "(" + bVal + ")";

return aVal +" - "+ bVal;
}
{}
Parentheses are inserted based on precedence, associativity, and commutativity. This gets more complicated with more precedence levels:
{programs/javacc/Calc5/Infix.java}
public Object visit(Mul node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);

if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (b instanceof Add || b instanceof Sub)
bVal = "(" + bVal + ")";

return aVal +" * "+ bVal;
}



public Object visit(Rem node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);

if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (! (b instanceof Neg || b instanceof IntLit))
bVal = "(" + bVal + ")";

return aVal +" % "+ bVal;
}
public Object visit(Div node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);

if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (! (b instanceof Neg || b instanceof IntLit))
bVal = "(" + bVal + ")";

return aVal +" / "+ bVal;
}
public Object visit(Neg node, Object data) {
Node operand  = node.jjtGetChild(0);
if (operand instanceof IntLit)
return "- " + operand.jjtAccept(this, data);
else if (operand instanceof Neg)
return operand.jjtGetChild(0).jjtAccept(this, data);
else
return "- (" + operand.jjtAccept(this, data) + ")";
}
public Object visit(SimpleNode node, Object data) {
System.err.println("visit simple node");
return null;
}
}
{}
A general solution would have to employ a table.



A Calc5Visitor can serve as an interpreter. int[1] is used as a result to circumvent the inefficiency of the Number objects:
{programs/javacc/Calc5/Value.java}
public class Value implements Calc5Visitor {
public Object visit(SimpleNode node, Object data) {
System.err.println("visit simple node");
return null;
}
public Object visit(Add node, Object data) {
int[] a = (int[])node.jjtGetChild(0).jjtAccept(this, data),
b = (int[])node.jjtGetChild(1).jjtAccept(this, data);
a[0] += b[0];
return a;
}
public Object visit(Sub node, Object data) {
int[] a = (int[])node.jjtGetChild(0).jjtAccept(this, data),
b = (int[])node.jjtGetChild(1).jjtAccept(this, data);
a[0] -= b[0];
return a;
}
public Object visit(Mul node, Object data) {
int[] a = (int[])node.jjtGetChild(0).jjtAccept(this, data),
b = (int[])node.jjtGetChild(1).jjtAccept(this, data);
a[0] *= b[0];
return a;
}
public Object visit(Rem node, Object data) {
int[] a = (int[])node.jjtGetChild(0).jjtAccept(this, data),
b = (int[])node.jjtGetChild(1).jjtAccept(this, data);
a[0] %= b[0];
return a;
}
public Object visit(Div node, Object data) {
int[] a = (int[])node.jjtGetChild(0).jjtAccept(this, data),
b = (int[])node.jjtGetChild(1).jjtAccept(this, data);
a[0] /= b[0];
return a;
}
public Object visit(Neg node, Object data) {
int[] operand  = (int[])node.jjtGetChild(0).jjtAccept(this, data);
operand[0] = - operand[0];
return operand;
}
public Object visit (IntLit node, Object data) {
return new int[] { node.val.intValue() };
}
}
{}
Note that an array can be initialized with an array ``constant'' during creation.


Semantic analysis and evaluation -- Calc6

Calc6 reads lines with constant Java expressions from standard input, checks the semantics, and outputs their values. Calc6 implements all operations for integer, floating point, and boolean constants.

$ java Calc6
2/3 + 7
7
3./2 > 1
true

Calc6 illustrates a specialized class hierarchy that implements Node and adds more methods for semantic analysis and interpretation. This is the foundation for very modular interpreters and compilers.

Node node interface
Lit constants
BoolLit FloatLit IntLit operations on various values
Unary unary operators
Cpl Neg Not Pls
Binary binary operators
ArithBinary arithmetic operators
Add Div Mul Rem Sub
EqBinary identity comparisons
Eq Ne
IfBinary logical operations with termination
AndIf OrIf
IntBinary integer operations
Ash Lsh Rsh
LogicBinary logical and bit operations
And Or Xor
RelBinary comparisons
Ge Gt Le Lt

SemanticException

jjtree catches even user-defined exceptions; this makes it difficult to report the position of sematic errors.

29/Apr/1998