Package step1

Step 1: Specifying and Checking a Grammar.


Class Summary
dummy fake class, triggers javadoc.

Package step1 Description

Step 1: Specifying and Checking a Grammar.

Arithmetic Expressions

expr.pj is an input file for pj and contains a typical grammar for constant arithmetic expressions.

The input file is minimal. It contains just enough information so that the grammar can be checked with jay. The first part of the file is empty but for a comment. A line consisting only of %% precedes the grammar rules. Each rule is terminated with a semicolon and defines a unique non-terminal which appears as left-hand side of the rule. Each rule has several right-hand sides which are separated by | and consist of a sequence of terminal and non-terminal symbols. jay counts each right-hand side individually as a rule.

A grammar is read top-down and it explains what a sequence of input symbols should look like. In this case, a sum can involve addition and subtraction or simply be a prod. A prod in turn can involve multiplication operators or be a term. A term, finally, can employ signs or parentheses, but the simplest case is just a Number. Therefore, the simplest sum is just a Number, but the grammar also covers expressions with properly nested parentheses, signs, and five different binary operators.

The grammar uses left recursion to define all operators to be left-associative. Recursion also guarantees that parentheses have to be properly nested. The grammar uses one non-terminal for each precedence level: additive, multiplicative, and sign operators. All operators are specified as single character literal terminals.

Number remains undefined because it never appears at the beginning of a rule. pj implicitly defines this identifier as a terminal and it will be discussed later how a Number is actually represented as a sequence of characters.

Grammar Check

The grammar can be checked with the following command:

$ java -jar pj.jar -jay=expr.jay -verbose expr.pj >/dev/null

This command generates two files: expr.jay contains the input which is processed by jay and y.output describes the analysis of the grammar which jay performs.

expr.pj contains a grammar that is suitable for processing with jay; therefore, the command above will not generate any error messages.

Web Compiler Service

pj can be executed through a web interface.

$ java -jar pj.jar -jay=expr.jay -verbose expr.pj >/dev/null

If the current page is viewed over a network connection each button produces a web page with (editable) input for pj. Useful processing options are preselected and the appropriate submission button is highlighted on the page.

Unsuitable Grammars

badbits.pj is another input file for pj with a grammar which is not suitable for jay.

$ java -jar pj.jar -jay -verbose badbits.pj >/dev/null
jay: 1 rule never reduced
jay: 2 shift/reduce conflicts.

This grammar describes sequences of 0 and 1 which have two or more elements. However, a parser generated by jay can only look one input symbol ahead and the grammar requires two bits at the end of the sequence. This provokes some messages from jay as shown above and y.output contains more information.

rule never reduced usually is a serious mistake. In this case jay does not use the recursive rule that allows more then two bits. This is clearly not what was intended and bits.pj contains another grammar that describes the same bit sequences and is acceptable to jay.

$ java -jar pj.jar -jay bits.pj >/dev/null

Sometimes, as in this case, a grammar can be rewritten so that it still describes the same language, i.e., the same set of sequences of terminals, and is acceptable to jay. Sometimes a grammar can be used even if jay complains.

Shift/Reduce Conflict

if.pj contains a grammar for a Java-style if statement. In this grammar, only statement is a non-terminal, pj will default condition and expression to be terminals, and the remaining symbols are quoted literals and as such terminals, too. This grammar is not quite suitable for jay:

$ java -jar pj.jar -jay -verbose if.pj >/dev/null
jay: 1 shift/reduce conflict.

Searching y.output reveals the problem:

7: shift/reduce conflict (shift 8, reduce 1) on __else
state 7
        statement : __if condition __then statement .  (1)
        statement : __if condition __then statement . __else statement  (2)

        __else  shift 8
        $end  reduce 1
jay's analysis simulates how input can be matched to the grammar. The period above marks how far a match has progressed; in this situation the input thus far is a complete if statement without else.

The message indicates that there is a problem if the next input symbol is else: one could reduce the if statement already as a complete statement and hope that the else matches an encompassing if statement, or one could add the else to the current if statement. This situation, reduce using a completed rule or shift, i.e., add more input to satisfy a longer rule, is a typical grammar problem.

It turns out that this particular conflict and it's resolution by jay is in fact acceptable. The table above shows that jay will arrange to accept the else and therefore add it to the innermost if statement; that is exactly what a typical language definition such as Java demands.

Reduce/Reduce Conflict

Another kind of conflict is likely to cause more grief. reduce.pj uses the grammar for arithmetic expressions introduced above and adds a rule intended to describe a simplified expression:

statement: sum
  | Number;

sum: ...

term: ...
  | Number
This provokes another message from jay:

$ java -jar pj.jar -jay -verbose reduce.pj >/dev/null
jay: 1 reduce/reduce conflict.

Searching y.output, again, provides a more detailed explanation:

1: reduce/reduce conflict (reduce 2, reduce 13) on $end
state 1
        statement : Number .  (2)
        term : Number .  (13)

        $end  reduce 2
        '+'  reduce 13
If the input consists only of a Number, jay can arrange that it is recognized as a term or immediatley as a statement. Such a situation is known as a reduce/reduce conflict and jay will resolve it in favour of the first rule in the grammar that fits.


In summary, grammars can easily be checked using pj with the options -jay and -verbose. A grammar may or may not be acceptable to jay. If it is not one should try to rewrite the grammar to avoid complaints from jay. However, shift/reduce conflict resolution is likely to be acceptable because it arranges for the longest possible input sequence to be accepted whereas reduce/reduce conflict resolution needs to be inspected very carefully because it relies on the order of the rules in the grammar which may be somewhat arbitrary.