Language Processing
v2.0

Language Processing v2.0

Language Processing is an interactive tutorial about compiler construction and describes tools and techniques which facilitate the object-oriented implementation of language processors.

See:
          Description

Tutorial
step01 1: How to structure a compiler.
step02 2: How to define languages.
step03 Step 2: How to check grammars.
step04 Step 3: How to recognize symbols.
step05 Step 4: How to recognize phrases.
step06 Step 5: How to build trees.
step07 Step 6: How to interpret programs.
step08 Step 7: How to check types.

 

Tools
pj2 Compiler frontend generator.
vig Visitor generator.
wcs Web compiler service.
xv XML Visitor generator.

 

Language Processing is an interactive tutorial about compiler construction and describes tools and techniques which facilitate the object-oriented implementation of language processors.

This introduction briefly discusses the structure of a compiler and what role programming tools play in the construction of a compiler. Step 1 describes the implementation of an adding machine as a rudimentary example for hand-coding a compiler. The remaining steps look at the tools and programming techniques in more detail. The goal of the tutorial is to enable the reader to use professional compiler tools to at least write interpreters for small languages.

The examples in this tutorial can be compiled using the web compiler service and executed as applets in a Java-enabled browser. Alternatively, they can be downloaded from here. For local compilation the tools need to be downloaded from their individual homepages and installed as siblings of the tutorial steps.

The compilation process

Here is an overview of what goes on inside a compiler:

compiler overview

In several steps input characters are partitioned into a sequence of tokens and a symbol table and then into a tree representing the program expressed by the input characters. The tree can then be converted into something that is more or less executable and will produce the effect expected of the program.

The steps of the compilation process run more or less in parallel, i.e., the intermediate data may or may not be available explicitly. How the compilation progresses depends on the implemented language, e.g., whether declare before use is required (as in C) or not (as in Java, for the most part).

The role of a frontend generator

A compiler consists of a frontend and a backend. The frontend is responsible for analysis, i.e., it must decide if the input characters express a legitimate program. If so, the backend is responsible for synthesis, i.e., it must represent the program in some executable or interpretable format.

Frontend generators such as pj2 or oops3 help in implementing input partitioning (lexical analysis), checking program structure (syntax analysis), and representing the program as a tree, all just based on an annotated grammar of the programming language:

frontend generator

If requested, pj2 or oops3 creates input for and executes JLex, a Java program which generates a scanner (class for lexical analysis) from a table of regular expressions and actions written in Java. The actions discover tokens within input characters and maintain information in a Map-based datastructure.

pj2 creates input for and executes jay, Berkeley yacc retargeted to Java, a program which generates a parser (class for syntax analysis) from a table of grammar rules and actions written in Java. pj2 creates an interface describing the actions. The actions are executed once the tokens from the scanner match rules of the grammar.

The actions often represent a user program as a parse tree for processing by the compiler backend. If requested, pj2 implements the action interface with a factory and List-based classes to represent the parse tree and provide a structured dump.

oops3 represents grammar rules as a tree and provides visitors for the tree which implement the same functionality as pj2. Either tool can implement test scaffolding for the generated scanner, parser, and tree builder. Many other frontend generators exist; wcs can be used to experiment with some of them in a web browser.

The role of a visitor generator

A compiler consists of a frontend and a backend. Usually, the frontend analyzes the source program and creates an intermediate representation. The backend processes the intermediate representation, e.g., to perform further analysis, determine runtime memory allocation, generate machine code, or interpret the program.

Frontend generators such as pj2 and oops3 facilitate writing frontends which represent a source program as a tree using List-based classes. A backend then has to traverse such a tree and make decisions based on the branch node classes and leaf values. The backend if often implemented using the visitor pattern; however, the visitor pattern tends to be cumbersome to extend once more classes are added to the tree representation.

A visitor generator such as vig facilitates implementing a backend. vig is a preprocessor with a pattern/action paradigm; actions are programmed in Java. In the actions vig expands shorthand notations for printing and for recursive calls on the pattern selection mechanism.

A pattern investigates the classes of a branch node and its descendants; inheritance is used as a wildcard mechanism that encourages a programming style where patterns for more specific classes override patterns for more general classes. A tree traversal can first be implemented with a few general patterns and later improved by adding patterns for special cases.



(c) 2008 Axel T. Schreiner