An Object-Oriented Parser Generator

Axel-Tobias Schreiner

Department of Mathematics and Computer Science

University of Osnabrück, Germany

http://www.inf.uos.de/axel/talks/oops/ [ pdf ]

oops is a language-independent parser generator implemented in Java. It produces parsers from LL(1) grammars expressed in Extended Backus-Naur Form and stores the parsers as persistent objects. The talk discusses the design and implementation of oops and shows how oops was used in a compiler construction course to discover algorithms for grammar analysis, parsing, and semantic actions.

Contents

See Objects in compiler construction.

See Objects as a parser.

See Grammar analysis.

See Semantic actions.

See Conclusions.

Links

oops Homepage

Java, Objects, and Tools in Compiler Construction

German lecture notes

Objects in compiler construction

 

symbol table and symbols can be objects

tree can consist of objects --
messages for semantic analysis and code generation

image can consist of (portable) persistent objects --
messages for interpretation and/or code generation

Objects as a parser

 

processor is based on specification, takes sentence to tree

execute tree to do what sentence intends

 

intent of a grammar ought to be language recognition

Wirth's generalparser traverses a grammar graph to attempt recognition;
generalparser does not know if grammar is suitable for it's algorithm

Classes for EBNF

grammar

 

Parser

a b

 

Seq

a : b

 

Rule

a | b | c

 

Alt

"begin"

 

Lit

[ a ]

 

Opt

ID

 

Token

{ a }

 

Some

term

 

Id

[{ a }]

 

Many

Wirth pioneered flowchart-style syntax graphs to depict EBNF;
EBNF is rather restrictive -- this is better modeled with Nassi-Shneidermann flowcharts

classes must model the building blocks of a syntax graph;
classes must provide methods for checking (grammar analysis) and parsing (execution)

Grammar analysis

parsing by recursive descent amounts to traversing a syntax graph to perform recognition;
when is the traversal deterministic, i.e., when is the grammar not ambiguous?

Lookahead sets

 

 

 

 

 

 

Some

Alt

Opt

Many

Seq

 

lookahead sets are persistent and control parsing

Follow sets

 

 

 

 

 

 

Some

Alt

Opt

Many

Seq

 

follow sets are transient

a Rule starts out with an empty follow set;
computation iterates while non-terminals acquire larger follow sets (closure) within Seq

Checking

 

 

 

 

 

 

Some

Alt

Opt

Many

Seq

 

grammar is suitable if all decisions are deterministic;
decided by comparing lookahead and follow sets as required

marking algorithms decide if grammar is connected and if there is no infinite recursion;
left recursion is detected when lookahead sets are computed

Semantic actions

yacc

 

 

n : a b { $$ = $1 + $2; }

oops

yacc 's actions access value stack when phrase is reduced;
values are pushed onto the stack when input is shifted or after phrase is reduced

 

Rule receives parse() and creates object implementing Goal from non-terminal's class;
Goal receives shift() from Lit , Token , and completed Rule ;
finally, Rule sends reduce() and Goal 's result is shifted for Rule to superior Goal

user must provide Goal classes for non-terminals;
retargeting is possible by changing packages

GoalAdapter and GoalDebugger are trivial default implementations;
Reduce is an alternative interface providing only an array of shifted values

Conclusions

oops in teaching

yacc has the better algorithm but oops can be understood

algorithms for checking and parsing can be discovered;
they rely entirely on localized reasoning

oops in production

EBNF (iteration) seems more natural than BNF (recursion)

oops completely separates grammar and actions -- easier documentation, no new syntax

lexical analysis interface needs some more design work

research questions

automatic error recovery based on lookahead and follow sets and a Java programming technique to pass exceptions up and back down the activation stack

objects rather than regular expressions for lexical analysis

targeting to our CompilerKit, an extensible class hierarchy for semantic analysis and code and interpreter generation