Language Processing

Language Processing is a tutorial introducing a few tools which should facilitate the object-oriented implementation of language processors.


Step 1
step1 Step 1: Specifying and Checking a Grammar.


Step 2
step2 Step 2: Precedence, Associativity, and Iterative Constructs.


Step 3
step3 Step 3: Generating the Scanner.


Step 4
step4 Step 4: Checking Recognition.


Step 5
step5 Step 5: Error Recovery.


Step 6
step6 Step 6: Building Trees.


Step 7
step7 Step 7: Traversing Trees.


Step 8
step8 Step 8: Code Generation Principles.


Step 9
step9 Step 9: Typing.


Step 10
step10 Step 10: Type Checking.


pj This is the homepage of pj, a preprocessor for jay and JLex which largely automates the generation of parse trees from programs.


Parser Generator
jay This is the homepage of jay, a LALR(1) parser generator: Berkeley yacc © retargeted to C# and Java.


jay.yydebug jay/yydebug supports tracing and animation for a Java-based parser generated by jay.


jag This is the homepage of jag, a Java preprocessor supporting the visitor pattern and inheritance-based pattern matching for List based trees.


Web Compiler Service
wcs This is the homepage of wcs, a servlet to provide preprocessing, Java-based compilation, and execution in an applet.


Java Compiler Service
js Deprecated, replaced by wcs.


Jag Service
gs Deprecated, replaced by wcs.


Other Packages


Language Processing is a tutorial introducing a few tools which should facilitate the object-oriented implementation of language processors.

The tutorial examples can be downloaded from here. Additionally, the tools need to be downloaded from their individual homepages. They should be installed as siblings of the tutorial steps.

The Compilation Process

Here is an overview of what goes on inside a compiler:

In several steps input characters are turned into a sequence of terminal symbols and a symbol table and then into a tree representing the program expressed by the input characters. The tree can then be converted into something that is more or less executable and will produce the effect expected by the program.

The compilation phases run more or less in parallel, i.e., the intermediate data may or may not be available explicitly. How compilation progresses depends on the implemented language, e.g., if declare before use is required (as in C) or not (as in Java, mostly).

The Role of pj

pj helps implementing lexical analysis, syntax analysis, and the parse tree, all just based on an annotated grammar.

If requested, pj creates input for and executes JLex, a program that generates a scanner from a table of regular expressions and actions programmed in Java. The actions assemble symbols from input characters and may include maintaining symbol information in a Map-based datastructure.

pj creates input for and executes jay, Berkeley yacc retargeted to Java, a program that generates a parser from a table of grammar rules expressed in BNF and actions programmed in Java. pj creates an interface describing the actions. The actions are executed once the input symbols from the scanner match rules of the grammar and they usually create a parse tree describing a user program.

While jay requires BNF input, pj also supports notations for iterative and optional constructs and expands them as BNF. If requested, the iterative constructs will include robust error recovery.

If requested, pj implements the action interface with a factory and classes that implement a parse tree and a structured dump.

Finally, pj generates test drivers for the scanner and the parser which can include the graphical animation jay.yydebug.

The Role of jag

A compiler consists of a frontend and a backend. The frontend analyzes the source program and creates an intermediate representation. The backend processes the intermediate representation, e.g., to perform further analysis, determine runtime memory allocation, generate machine code, or interpret the program.

pj facilitates writing frontends that represent a source program as a tree using subclasses of ArrayList. A backend then has to traverse such a tree and make decisions based on the branch node classes and leaf values. jag supports a pattern/action paradigm to implement the tree traversal.

jag is a preprocessor; actions are programmed in Java. In the actions jag expands shorthand notations for printing and for recursive calls on the pattern selection mechanism.

A pattern investigates the classes of a branch node and its descendants; inheritance is used as a wildcard mechanism that encourages a programming style where patterns for more specific classes override patterns for more general classes. A tree traversal can first be implemented with a few general patterns and later improved by adding patterns for special cases.

Future Directions

jag could generate several visitors from a single input file.

jag might benefit from generic node classes which pj would have to create.

It is quite simple to convert between a tree and an XML document. This can support near automatic generation of deeply nested XML documents from structured data, i.e., a return of little languages for structured data. It also poses the question how XSL could be leveraged in language processing -- after all, XSL programming can be based on a pattern/action paradigm.

While wcs can spoof System in every package using a delegate to a singleton in a well-known package, the implementation should first attempt to take advantage of a security policy, if any, which might permit things like System.setIn(

In pj /+ should mark a list of one or more elements with some delimeter and /* should mark a delimited list which can be empty.

jag should support null correctly: if not specified in the input, an optional item will be represented by pj as null.

1.3, December 2005.
Axel T. Schreiner.