|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
jay
and JLex
which largely automates the generation of parse trees from programs.
See:
Description
| Interface Summary | |
|---|---|
| Dump.Helper | can be implemented if a class wants to do it's own dump. |
| Leaf | what a leaf in a pj tree can do. |
| Rfc.yyActions | interface between grammar and actions. |
| Rfc.yyInput | must be implemented by a scanner object to supply input to the parser. |
| Class Summary | |
|---|---|
| Dump | output an indented tree. |
| PJ | preprocessor for jay and JLex. |
| PJServlet | Deprecated. Replaced by Servlet. |
| Rfc | RFC-style grammar for pj. |
| Rfc.Scanner | embedded pj scanner. |
| SubProcess | arranges to store text and process it through a native subprocess. |
| SubProcess.Copy | daemon thread to copy a stream to and from the subprocess. |
| Exception Summary | |
|---|---|
| Log | thrown to back out of a servlet with a report. |
| Rfc.yyException | thrown for irrecoverable syntax errors and stack overflow. |
This is the homepage of pj, a preprocessor for jay
and JLex
which largely automates the generation of parse trees from programs.
pj reads a grammar
specified in BNF, extended with optional and iteration constructs
and annotated only with class names, and generates a scanner with
JLex, jay grammar rules for error recovery, an
interface between the parser constructed by jay and actions
carried out during recognition, and an implementation for this
interface which can build parse trees that are suitable for
processing with jag.
pj is available through wcs and for binary
and source download.
java -jar pj.jar [options] [--] [source|-]
By default pj will either process an input file specified
on the command line or standard input and will produce a grammar
for jay as standard output. If the input is annotated with types,
the output will include an action interface. If the input is
correct, the exit code is zero.
The following options can be specified in any order:
| -dump | dump the parser tree. |
| -flatten | squeeze out a single ArrayList parameter. |
| -jay[=file] | postprocess with jay, keep jay input file.
|
| -packages list | package names to search for base classes; the first character of the list is the delimeter between names. The default is :java.lang:java.util. |
| -recover | generate error recovery for iterations. |
| -scanner[:static][=file] - | generate nested scanner class, static, keep JLex input file. |
| -scanner[:static] file | insert nested scanner class from JLex source file, static. |
| -trace[:0|1|2|3] | postprocess with jay and compile jay trace,
with animation simulating nothing, input, and/or output.
|
| -tree[:static] - | generate nested tree factory class, static. |
| -tree file | only generate tree factory and insert into source file to replace a line containing %%. This option can only be combined with -dump. |
| -verbose | postprocess with jay and create parser description.
|
| -wcs | allow System spoofing.
|
pj borrows heavily from jay. Input is mostly
free format with Java-style comments. However, the input file
consists of several sections which are separated by symbols that
must appear at the beginning of lines which are otherwise empty.
The sections and the symbols are as follows:
prolog
directives
macro definitions
%%
rules
%%
epilog
prolog and epilog together with their delimiting symbols are optional. However, the typical use of pj is to create a Java source file which starts with the prolog and ends with the epilog. Each is copied verbatim without the delimiting symbols; in this case the prolog must at least contain a class header and the epilog must terminate this class.
The class name from the first class header in the prolog is assumed to be the class name of the parser, i.e., the parser should not be an inner class.
directives are optional. They are useful to control the
behaviour of a JLex scanner generated by pj and to
simplify a grammar which pj forwards to jay. Each
kind of directive can be specified multiple times and their effect
is cumulative. Each directive starts with a specific symbol at
the beginning of a line.
Directives can be commented in the style of Java.
In strings, \ is used in the style of Java to escape a backslash, single and double quote, backspace, formfeed, newline, return and tab character.
%scan "string" ...
scan can specify (double-quoted) strings. This is only useful to influence the order of JLex rules if pj is used to generate a scanner.
%screen {macro-name} ...
screen is used to specify patterns to use for screening for literals if pj is used to generate a scanner.
screen designates one or more macros which must be defined later. Input matching the regular expression of any such macro will be checked against a hashtable of all literals (single quoted strings) used in the grammar specification. Screening literals can significantly reduce the size of a scanner generated by pj.
screen must be specified if multi-character literals are used. Unfortunately, at this point there is no check that the macro definitions actually find all literals. Single-character literals will at least be found by the default rule generated by pj for JLex.
%skip {macro-name} ...
skip is used to describe comments and other ignorable text if pj is used to generate a scanner.
skip designates one or more macros which must be defined later. Input matching the regular expression of any such macro will be ignored by the JLex scanner generated by pj.
%token <classname> name representation ...
token is used to provide information about input symbols. Typically, it will at least be used to define names for symbol categories such as numbers, strings, and user-defined identifiers.
token can start with one (qualified) Java classname. In this case the scanner's value() method must provide an object of this class whenever the token() method returns one of the tokens defined in this directive. If pj is used to generate the scanner, the class must have a constructor that accepts a string; this is true for most of the wrapper classes.
If token does not start with a classname the input symbols will later not be made available to actions. The directive is still useful, however, to combine a symbol name with a representation.
token contains one or more unique, alphanumeric input symbol names, which must start with a letter. Each name can then be used in the rules as a terminal symbol. Each name can be followed by a representation: a double- or single-quoted literal string (the latter may require a screen directive), or a macro defined later to designate a regular expression that describes how the input symbol is represented, i.e., in this case a name can represent many similar inputs in the rules.
If a name is not followed by a representation, it is not known how this input symbol would be represented. This can still be useful, as long as the scanner is not generated using pj.
%left symbol ...
%right symbol ...
%nonassoc symbol ...
These directives are used to designate symbols as left-, right-, or non-associative operators and to assign a precedence to a group of such symbols. The precedence increases with each successive such directive.
Each directive contains one or more symbols. A symbol is a name which may be defined with token or a literal string in single or double quotes.
These directives are useful to specify more compact grammars
for %prec symbol ...jay. The tutorial Step 2 demonstrates how
they simplify the specification of arithmetic expressions.
This precedence directive follows the items on the right hand side of a rule and references the precedence table created by the aforementioned directives. It uses the referenced precedence to resolve conflicts involving the right hand side.
macro definitions are optional and can only follow the directives. They are useful to assign names to regular expressions when pj is used to generate a scanner with JLex. Each macro definition consists of a single line:
name = regular expression
The name must be unique, alphanumeric, and start with a letter. The regular expression must conform to the requirements of JLex as described by the link above. It is usually advisable to enclose a regular expression in parentheses.
Comments can only appear on lines between macro definitions.
The rules section is preceded by the symbol %% (which is not optional) and contains one or more rules:
<classname> name : right-hand-side <actionname> |... ;
A rule can start with a Java classname. The default is java.lang.Object. Each action in this rule has to return an object from this class. It is advisable to fully qualify this classname or to use the -packages option on the command line. Each right-hand side of a rule can end with an actionname in angle brackets.
A rule defines a unique, alphanumeric non-terminal name which must start with a letter. The definition consists of one or more right-hand-sides, separated by |, and terminated with a semicolon. The first rule defines the start symbol of the grammar.
A right-hand-side is a sequence of zero or more items,
optionally followed by an inline
precedence directive and/or an action.
When the grammar is processed by jay at most one right
hand side should be empty.
An item can be a literal string in single or double quotes, a name defined as a token, a name that has already or will be defined as a non-terminal, or finally an otherwise undefined name which will be treated as if it had been defined using a token directive.
An item can also be an optional or iterative construct:
| name ? | specifies that name appears at most once and produces null or the corresponding value(). |
| name * | specifies that name appears zero or more times and produces a (potentially empty) java.util.ArrayList of all occurrences. |
| name + | specifies that name appears one or more times and produces a java.util.ArrayList of all occurrences. |
| name / item | specifies that name appears one or more times, separated by a (non-iterative) item as described above, and produces a java.util.ArrayList of all occurrences. |
For iterative constructs pj generates additional non-terminals with appropriate rules and optionally right-hand-sides to support a relatively robust error recovery mechanism. As indicated, most of these rules use java.lang.ArrayList as a classname and provide actions that collect values as required.
Explicit error recovery as implemented by jay is also
possible because error is a reserved word that can be
used as an item.
If requested pj will generate a scanner using JLex and insert the result as a statically nested or member class of the parser.
This scanner class is called yyLex and satisfies the
yyInput interface required by the parser generated by
jay. macro definitions
are simply passed to JLex.
The central part of the scanner is a table of regular expressions and actions. JLex permits the regular expressions to be ambiguous; if there is a conflict the longest possible input string is given to the first regular expression that can recognize it.
pj emits literal strings in double quotes as regular expressions before macro references. Within each group the elements appear in order of appearance in the pj input. As a consequence, the screen and skip expressions will appear relatively early in the second group. If there is no skip directive, white space will be ignored with a very late regular expression. The various directives can be specified in any order in order to influence the order of the regular expressions in the scanner.
A default rule at the end of the table recognizes each single character that has not yet been recognized; these characters correspond to the single quoted characters in the input of pj unless these are found by screening. As a consequence, a single character in double quotes is recognized rather then the same single character in single quotes; this may have unfortunate results if both are specified in the same grammar.
The JLex actions are generated to return appropriate token() values to the parser. For names that were typed using token directives value() will return appropriate objects as long as a constructor with a string argument exists.
A test driver for a scanner is quite trivial to write. If pj generates the scanner with the option -scanner:static - a test driver like the following is included into yyLex:
public static void main (String[] args) throws Exception {
yyLex scanner = new yyLex(System.in);
while (scanner.advance())
System.out.println(scanner.token()+"\t\""+scanner.value()+"\"");
}
pj uses jay to check the grammar and generate
a parser. The relevant input to jay consists of some of
the precedence directives and of the
rules, each modified to accommodate the
syntax requirements of jay.
A number of symbols have to be replaced throughout: Single-quoted
single characters are acceptable to jay. All other single-
and double-quoted literals are replaced by identifiers and mentioned
in token directives. The identifiers start with one or two
underscores and use the text of an alphanumeric literal or a
unique number.
Left-recursive rules are generated for the iterative constructs. If requested with -recover, these rules contain right-hand sides involving error; this may introduce otherwise harmless shift/reduce conflicts. Rules with an empty alternative are generated for optional constructs. The prefix opt_ and the suffix s is used for the new nonterminals; unique numbers are appended to avoid conflicts.
The classname at the beginning of each rule, if any,
is mentioned in a type directive. Iterative constructs
specify ArrayList. Optional constructs specify
the type of the optional item, if any.
Each right-hand side of a rule can end with an action consisting of an actionname enclosed in angle brackets:
<classname> name : right-hand-side <actionname> |... ;
pj generates an interface yyActions as a member
of the parser class. yyActions contains each actionname
as a method returning the type specified as classname for
the rule. The parameter types of the method result from the typed
symbols in the right-hand side, i.e., nonterminals and optional
symbols contribute their type, iterative constructs contribute
ArrayList, and terminals contribute their type
if one was specified with a token directive.
pj also generates a jay action for almost every
right-hand side. If there is an actionname, the action is
a call on the method declared in the interface yyActions.
Otherwise the action collects the values corresponding to the
typed symbols in the right-hand side, if any: if there is only a
single such symbol it's value is returned by the action, if there
are two or more, their values are returned in a ArrayList. Only a right-hand side without any typed
symbols has no action.
An action for an optional construct returns null or
the value corresponding to the optional symbol. An action for an
iterative construct returns a ArrayList of
values corresponding to the symbols actually found; for the
* and / constructs this list may be empty. It
should be noted that only typed symbols can be subject to optional
and iterative constructs; if a seperator in the / construct
is typed, it is also collected.
It is the user's responsibility to maintain matching types. While actions involving yyActions are declared and generated correctly, pj is not able to check semantic correctness of collection actions. Mismatches will show up as errors once the generated code is submitted to a Java compiler.
yyActions is intended to specify a factory for a parse
tree. If requested with -tree pj will generate
the factory and either insert it into an explicitly specified
source file or into the parser class. In the latter case, the
parser's test driver will use the factory, create a parse tree,
and display it with Dump.
Each actionname is generated as an inner class name and
the action generated for jay calls on a public factory
method which in turn calls on a suitable protected constructor
generated for the class. In this fashion the factory can be
subclassed and the method overridden to return an object from a
different class.
The generated inner class is derived from the return type of the action. It contains access methods named after the symbols which contributed types to the action method; the methods return the values corresponding to the symbols.
The intention is to create parse trees which use List in their branch nodes for the benefit of jag. Again, it is up to the user to maintain suitable types
because pj cannot completely anticipate the results.
pj creates temporary and permanent files if any of the
-scanner or -jay options are specified. Before
these options can be used, however, pj.properties must contain
properties that locate the processors. The property
pj.properties can be specified, e.g., on the command
line, to point to an alternate properties file which would be
located as a resource. jay and jay.skeleton
are the full paths to jay and the default skeleton,
respectively. java and jlex.jar are the full
paths to the Java executor and an executable jar file
containing JLex, respectively.
| -jay
-trace -verbose | input for jay is stored in a temporary file;
jay creates other temporary files and, if requested,
stores a parser description in y.output.
|
| -jay=file | input for jay is stored in file.
|
| -scanner[:static] - | input for JLex is stored in a temporary file;
Java code generated by JLex is stored in another temporary file
before it becomes part of the standard output or is sent to jay.
|
| -scanner[:static]=file - | input for JLex is stored in file;
Java code generated by JLex is stored in a temporary file
before it becomes part of the standard output or is sent to jay.
|
| -scanner[:static] file | input for JLex is read from file;
Java code generated by JLex is stored in a temporary file
before it becomes part of the standard output or is sent to jay.
|
| -tree file | file is read and should contain code so that a single line %% can be replaced by the classes and methods of the tree factory, the result is written as standard output. |
Depending on the command line options, pj runs JLex
and jay as subprocesses. The file pj/pj.properties inside pj.jar contains
four properties with paths which have to be adjusted for the local system:
jay executable, and
jay.skeleton points to the default Java skeleton for jay,
To produce the executable JLex archive, download the file Main.java into a directory JLex, apply this patch, compile, create a file manifest containing only Main-Class: JLex.Main and
jar cfm jlex.jar manifest JLex/*.class
To produce the jay executable,
download and compile the C sources with an ANSI C compiler.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||