8
The parser generator JavaCC
javacc is a LL(k) parser generator for Java. It has similar goals as yacc and lex but there are significant differences:
| It generates a parser class with methods that recognize phrases. The representation of the grammar follows this model.
Tokens are recognized by regular expressions which are specified locally or globally within the grammar. Errors are not reported relative to the original source. Normal operation is LL(1); ambiguities are eliminated by local or global, syntactic or semantic measures. Parsers can be generated so that there can be multiple instances and they can be operated in parallel. |
Installation
$ java JavaCC0_7pre7 -o path/JavaCC
$ java Docs -o path/JavaCC/doc
Without the switch -o the installer will run a graphical interface to select the installation parameters.
Documentation
$ netscape path/JavaCC/doc/DOC/index.html
this produces a page which references the documentation of the various components:
| javacc | the parser generator | |
| jjtree | a preprocessor to generate parse trees | |
| jjdoc | a primitive preprocessor to create documentation |
Many complete examples in path/JavaCC/examples are very helpful. This section explains a javacc source with a number of small examples.
Calc* reads lines with arithmetic expressions from standard input and evaluates them with int arithmetic.
Calc* is developed in stages, it is supposed to demonstrate the important aspects of the parser generator, and it is patterned after examples by Chuck McManis and Sun which are delivered with javacc.
Calc0 reads lines with arithmetic expressions from standard input. Output is only produced if there are errors. This step mostly serves to introduce techniques and tools.
$ javacc Calc0.jj
Java Compiler Compiler Version 0.7pre7 (Parser Generator)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "javacc" with no arguments for help)
Reading from file Calc0.jj . . .
File "TokenMgrError.java" does not exist. Will create one.
File "ParseException.java" does not exist. Will create one.
File "Token.java" does not exist. Will create one.
File "ASCII_CharStream.java" does not exist. Will create one.
Parser generated successfully.
$ javac Calc0.java
Note: ./ASCII_CharStream.java uses a deprecated API.
Recompile with "-deprecation" for details.
1 warning
javacc must be on PATH; the generated code does not depend on the generator environment.
$ java Calc0
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
Was expecting one of:
<CONSTANT> ...
"+" ...
"-" ...
"(" ...
at Calc0.generateParseException(Calc0.java:258)
at Calc0.jj_consume_token(Calc0.java:197)
at Calc0.term(Calc0.java:124)
at Calc0.product(Calc0.java:100)
at Calc0.sum(Calc0.java:43)
at Calc0.expr(Calc0.java:22)
at Calc0.main(Calc0.java:9)
PARSER_BEGIN(Calc0) // must define parser class
public class Calc0 {
public static void main (String args []) {
Calc0 parser = new Calc0(System.in);
for (;;)
try {
if (parser.expr() == -1)
System.exit(0);
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc0)
SKIP: // defines input to be ignored
{ " " | "\r" | "\t"
}
TOKEN: // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}
int expr(): // expr: sum \n
{} // -1 at eof, 0 at eol
{ sum() <EOL> { return 1; }
| <EOL> { return 0; }
| <EOF> { return -1; }
}
void sum(): // sum: product { +- product }
{}
{ product() ( ( "+" | "-" ) product() )*
}
void product(): // product: term { *%/ term }
{}
{ term() ( ( "*" | "%" | "/" ) term() )*
}
void term(): // term: +term | -term | (sum) | number
{}
{ "+" term()
| "-" term()
| "(" sum() ")"
| <CONSTANT>
}
{}
| options { | optional, see documentation | |
| } | ||
| PARSER_BEGIN(name) | specifies parser class and file names | |
| optional, defines package | ||
| optional, copied as is | ||
| must define parser class | ||
| PARSER_END(name) | ||
| rule ... |
Comments, names, constants, and operators (except for shift combinations) follow the rules of Java.
Syntax rules
| type name ( ... ) throws ... : | defines method and nonterminal (no overloading) | |
| defines local variables | ||
| explains nonterminal |
JAVACODE
| type name ( ... ) throws ... { | defines nonterminal | |
| } |
formula is a sequence of patterns:
| re | regular expression, recognizes Token | |
| lval = re | and assigns Token | |
| name ( ... ) | recognizes nonterminal | |
| lval = name ( ... ) | and assigns result | |
| ... | ... | alternative | |
| ( ... ) | group | |
| ( ... )* | zero or more times | |
| ( ... )? or [ ... ] | optional | |
| ( ... )+ | one or more times | |
| { ... } | Java code, e.g., with return | |
| LOOKAHEAD( ... ) | controls local lookahead, see documentation |
| < state , ... > | optional, defines name for state | |
| TOKEN | recognition sends Token to parser | |
| [ IGNORE_CASE ] | optional, prevents differentiating upper/lower case | |
| : | ||
| { re | recognizes input | |
| optional, executed during recognition | ||
| optional, defines next state | ||
| | ... | ||
| } |
Similarly, SKIP can be used to ignore input, MORE collects input for the next TOKEN and SPECIAL_TOKEN collects input and combines it with the next TOKEN, see documentation.
re is a regular expression:
| "..." | Java string, represents itself | |
| < name : pattern > | defines pattern | |
| < #name : pattern > | defines local pattern | |
| < name > | references pattern | |
| < EOF > | represents end of file |
pattern is a pattern:
| "..." | Java string, represents itself | |
| < name > | references pattern | |
| ... | ... | alternative | |
| ( ... ) | group | |
| ( ... )* | zero or more times | |
| ( ... )? | optional | |
| ( ... )+ | one or more times | |
| [ ... , ... ] | list of characters | |
| ~ [ ... , ... ] | complement of a list of characters | |
| "x" | one character in the list | |
| "x" - "y" | a range of characters in the list |
Calc1 reads lines with arithmetic expressions from standard input. Output is only produced if there are errors. This step shows how to continue recognition in the presence of errors.
$ java Calc0
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
| ... | first error terminates recognition |
An error is reported with a ParseException which can be caught:
$ java Calc1
32/5 + 3
32//
ParseException: Encountered "/" at line 1, column 4.
Was expecting one of:
<CONSTANT> ...
"+" ...
"-" ...
"(" ...
37/4
For a new InputStream a parser should be reset using ReInit(InputStream). Alternatively the option STATIC = false can be set and a new parser instantiated, see documentation.
Calc2 reads lines with arithmetic expressions from standard input and produces their value using integer arithmetic.
$ java Calc2
12
12
12-34
-22
1/0
java.lang.ArithmeticException: / by zero
12-4*5
-8
12-(-6+3)
15
This step illustrates a use for the results of the nonterminal functions.
{programs/javacc/Calc2.jj}
PARSER_BEGIN(Calc2)
public class Calc2 {
public static void main (String args []) {
Calc2 parser = new Calc2(System.in);
for (;;)
try {
if (parser.expr() == -1)
System.exit(0);
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc2)
SKIP: // defines input to be ignored
{ " " | "\r" | "\t"
}
TOKEN: // defines token names
{ < EOL: "\n" >
| < CONSTANT: ( <DIGIT> )+ > // re: 1 or more
| < #DIGIT: ["0" - "9"] > // private re
}
int sum() throws NumberFormatException: // sum: product { +- product }
{ int s, r; } // returns value
{ s = product()
( "+" r = product() { s += r; }
| "-" r = product() { s -= r; }
)* { return s; }
}
int product() throws NumberFormatException: // product: term { *%/ term }
{ int p, r;} // returns value
{ p = term()
( "*" r = term() { p *= r; }
| "%" r = term() { p %= r; }
| "/" r = term() { p /= r; }
)* { return p; }
}
int term() throws NumberFormatException:
// term: +term | -term | (sum) | number
{ int t; } // returns value
{ "+" t = term() { return t; }
| "-" t = term() { return -t; }
| "(" t = sum() ")" { return t; }
| <CONSTANT> { return Integer.parseInt(token.image); }
}
{}
Calc3 reads lines with arithmetic expressions from standard input and produces the full syntax tree. This step introduces the tree generator jjtree.
$ jjtree Calc3.jjt
Java Compiler Compiler Version 0.7pre7 (Tree Builder Version 0.3pre4)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "jjtree" with no arguments for help)
Reading from file Calc3.jjt . . .
File "Node.java" does not exist. Will create one.
File "SimpleNode.java" does not exist. Will create one.
Annotated grammar generated successfully in Calc3.jj
$ javacc Calc3.jj
Java Compiler Compiler Version 0.7pre7 (Parser Generator)
Copyright (c) 1996, 1997 Sun Microsystems Inc.
(type "javacc" with no arguments for help)
Reading from file Calc3.jj . . .
Parser generated successfully.
$ javac Calc3.java
$ java Calc3
1
expr
sum
product
term
1*(3+4)
expr
sum
product
term
term
sum
product
term
product
term
jjtree changes a javacc grammar so that a more or less complete syntax tree is built from Node objects for the nonterminals. By default, all nodes belong to SimpleNode and there is a method dump() which can output the names of the nonterminals. If the parser is successful the root of the tree can be obtained from the generator state jjtree using rootNode().
Calc4 reads lines with arithmetic expressions from standard input and produces a simplified syntax tree.
$ java Calc4
2/3 - 4*5
Sub
Div
IntLit
IntLit
Mul
IntLit
IntLit
This step shows how to use jjtree to build a tree that can be used for an interpreter or code generator. This solution resides in it's own directory because it cannot coexist with Calc3 -- unfortunately, SimpleNode depends on the parser class and is the base class of the individual node classes.
{programs/javacc/Calc4/Calc4.jjt}
options {
NODE_DEFAULT_VOID = true; // don't generate nodes by default
NODE_PREFIX = ""; // no prefix to node class names
MULTI = true; // many class names
}
PARSER_BEGIN(Calc4)
public class Calc4 {
public static void main (String args []) {
Calc4 parser = new Calc4(System.in);
for (;; jjtree.reset())
try {
switch (parser.expr()) {
default: ((SimpleNode)jjtree.rootNode()).dump("\t");
case 0: break;
case -1: System.exit(0);
}
} catch (Exception e) {
e.printStackTrace(); System.exit(1);
}
}
}
PARSER_END(Calc4)
SKIP: // defines input to be ignored
{ " " | "\r" | "\t"
}
int expr(): // expr: sum \n
{} // -1 at eof, 0 at eol/error
{ try {
( sum() <EOL> { return 1; }
| <EOL> { return 0; }
| <EOF> { return -1; }
)
} catch (ParseException pe) {
System.err.println(pe);
for (;;)
switch (getNextToken().kind) {
case EOF: return -1;
case EOL: return 0;
}
}
}
void sum(): // sum: product { +- product }
{}
{ product()
( "+" product() #Add(2) // Add with 2 descendants
| "-" product() #Sub(2)
)*
}
void product(): // product: term { *%/ term }
{}
{ term()
( "*" term() #Mul(2)
| "%" term() #Rem(2)
| "/" term() #Div(2)
)*
}
void term(): // term: +term | -term | (sum) | number
{}
{ "+" term() // no need to make node
| "-" term() #Neg(1) // insert negation node
| "(" sum() ")"
| ( <CONSTANT> ) #IntLit // create terminal node
}
{}
jjtree places nodes for nonterminals(!) onto an internal stack and typically collects them for each recognized nonterminal:
| typ name ( ... ) throws ... : | creates ASTname node. | |
| typ name ( ... ) throws ... #class : | creates ASTclass node. | |
| typ name ( ... ) throws ... #void : | does not create a node, just as per option NODE_DEFAULT_VOID = true. |
If a nonterminal is called, it's scope is opened. Nodes for recognized nonterminals are then stored in the scope. Postfix operations inline and at groups can decide how many nodes should be combined and stored as a new node.
| #class(number) | creates ASTclass node with number descendants. | |
| #class(condition) | if condition is true, creates ASTclass node with all descendants in the current scope. | |
| #class(>1) | special case: creates ASTclass node if there is more than one descendant. | |
| #class | special case: creates ASTclass node with all descendants in the scope. |
The nodes can be accessed: jjtThis refers to the node that is currently created and has the proper class, possibly set by a postfix specification.
In the last action in a scope the node is already complete so that methods such as jjtGetChild() can be used. If the node is created subject to a condition nodeCreated() can be used to test if the condition was true.
Calc5 reads lines with arithmetic expressions from standard input, outputs them with a minimal number of parentheses, and shows the values using integer arithmetic.
$ java Calc5
--1 - (-1)
1 - - 1
2
(32/5) + 3
32 / 5 + 3
9
1+(32/5)
1 + 32 / 5
7
1/(32/5)
1 / (32 / 5)
0
This step illustrates an interface that jjtree optionally inserts into the node classes. It can be used to implement the VISITOR design pattern.
![]()
In Calc5.jjt
the option VISITOR = true is set and traversals are started in main():
switch (parser.expr()) {
default: System.out.println("\t"+
jjtree.rootNode().jjtAccept(new Infix(), null));
System.out.println("\t"+ ((int[])
jjtree.rootNode().jjtAccept(new Value(), null))[0]);
Furthermore, a component Integer val is added to IntLit and in term() the value of a CONSTANT is stored there:
void term(): // term: +term | -term | (sum) | number
{ Token t; }
{ ...
| ( t = <CONSTANT> { jjtThis.val = new Integer(t.image); }
) #IntLit // requires val in IntLit
}
return aVal +" + "+ bVal;
}
public Object visit(Sub node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);
if (b instanceof Add || b instanceof Sub)
bVal = "(" + bVal + ")";
return aVal +" - "+ bVal;
}
{}
Parentheses are inserted based on precedence, associativity, and commutativity. This gets more complicated with more precedence levels:
{programs/javacc/Calc5/Infix.java}
public Object visit(Mul node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);
if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (b instanceof Add || b instanceof Sub)
bVal = "(" + bVal + ")";
return aVal +" * "+ bVal;
}
if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (! (b instanceof Neg || b instanceof IntLit))
bVal = "(" + bVal + ")";
return aVal +" % "+ bVal;
}
public Object visit(Div node, Object data) {
Node a = node.jjtGetChild(0);
Node b = node.jjtGetChild(1);
Object aVal = a.jjtAccept(this, data);
Object bVal = b.jjtAccept(this, data);
if (a instanceof Add || a instanceof Sub)
aVal = "(" + aVal + ")";
if (! (b instanceof Neg || b instanceof IntLit))
bVal = "(" + bVal + ")";
return aVal +" / "+ bVal;
}
public Object visit(Neg node, Object data) {
Node operand = node.jjtGetChild(0);
if (operand instanceof IntLit)
return "- " + operand.jjtAccept(this, data);
else if (operand instanceof Neg)
return operand.jjtGetChild(0).jjtAccept(this, data);
else
return "- (" + operand.jjtAccept(this, data) + ")";
}
public Object visit(SimpleNode node, Object data) {
System.err.println("visit simple node");
return null;
}
}
{}
A general solution would have to employ a table.
Calc6 reads lines with constant Java expressions from standard input, checks the semantics, and outputs their values. Calc6 implements all operations for integer, floating point, and boolean constants.
$ java Calc6
2/3 + 7
7
3./2 > 1
true
Calc6 illustrates a specialized class hierarchy that implements Node and adds more methods for semantic analysis and interpretation. This is the foundation for very modular interpreters and compilers.
jjtree catches even user-defined exceptions; this makes it difficult to report the position of sematic errors.