Parsing

The first step towards compiling a CArray program is to implement a parser that recognizes CArray code. In particular, the parser must recognize the additional const keyword introduced for CArray.
Copying an existing parser and modifying its code to add a new grammar rule is a direct violation of scalable extensibility. Recall that, to respect scalable extensibility, we would like to augment the existing parser as little as possible, the amount of addition proportional to the amount of syntactic difference, namely, the addition of const keyword.
Polyglot makes this proportional addition possible by providing PPG (Polyglot Parser Generator), a parser generator for extensible grammars. PPG is based on the CUP LALR parser generator for Java and provides the ability to extend an existing language grammar written in CUP or PPG. Polyglot also provides an extended version of CUP that improves debugging support, giving counterexamples when shift/reduce or reduce/reduce conflicts arise in a grammar.
This section gives an overview of the structure of a PPG grammar file and shows how to write one for our CArray extension. A more detailed description of PPG is available at the Polyglot web site.

Overview of a PPG file

The generated code for the empty extension includes a template PPG file, which has the following structure:

Extending a parser

For CArray, the const keyword signifies a different array type. Therefore, we would like to modify the parser to recognize this new type by extending the grammar to handle constant-array types. First, let us look at the existing grammar for traditional array types, defined in polyglot/parse/java12.cup in the src directory of the Polyglot code base:
 1 array_type ::=
 2     // array of primitive types such as int[] and char[][]
 3     primitive_type:a dims:b {:
 4         // ...
 5     :}
 6     // array of object types such as String[] and Object[][]
 7   | name:a dims:b {:
 8         // ...
 9     :}
10 ;
At parsing, array types can be classified into two categories, depending on the ultimate base type, i.e., the type before all the square brackets. For example, the ultimate base of char[][] is char. When extending the grammar to handle the const keyword, we will also follow this classification.
For each of the possible productions, i.e., the right-hand side of a grammar rule, an action, specified between a pair of decorated braces, is a piece of Java code that turns the production into a desired object corresponding to the nonterminal being defined. The actions for array types are listed below:
 1 array_type ::=
 2     // array of primitive types such as int[] and char[][]
 3     primitive_type:a dims:b {:
 4         RESULT = parser.array(a, b.intValue());
 5     :}
 6     // array of object types such as String[] and Object[][]
 7   | name:a dims:b {:
 8         RESULT = parser.array(a.toType(), b.intValue());
 9     :}
10 ;
Each token in a production can be given a name following a colon after the token. For example, primitive_type:a means that the result of parsing primitive_type is stored in a variable named a. The scope of a token variable is the action block following its corresponding production. Each action can refer to these token variables. The result of the action, which corresponds to the nonterminal being defined, is assigned to variable RESULT. For example, the second production of array_type contains an action that converts a name to a type (a.toType()), unboxes dimensions, which is an Integer, into an int (unnecessary as of Java 5), and then uses an auxiliary method array defined within the parser to construct the desired TypeNode, which is the declared type of array_type.
The auxiliary method array is used to handle multidimensional arrays properly. As opposed to the ultimate base of an array type, the base of an array type has one fewer dimension than the array type itself. For example, the base of char[][] is char[]. The auxiliary method creates appropriate array types in these cases. When dealing with constant arrays, a similar auxiliary method is also necessary. Let us implement this method now:

Notice that the CArrayNodeFactory must define a factory method that
constructs a constant-array type node.  We will do this in the next section.
For now, we have enough machinery to parse constant-array types.  Let us define
these productions by extending the existing production rules for
array_type:
extend array_type ::= // RESULT of array_type is a TypeNode
    primitive_type:a CONST dims:b {:
        RESULT = parser.constArray(a, b);
    :}
  | name:a CONST dims:b {:
        RESULT = parser.constArray(a.toType(), b);
    :}
;
Notice the addition of terminal token CONST, which represents the keyword const, to differentiate the new productions from the old. The actions also take advantage of autounboxing for the dimensions.

Exercise

The grammar for casting does not use nonterminal array_type to circumvent ambiguity. Here is the definition of cast_expression from the Java grammar:
cast_expression ::= // RESULT of cast_expression is a Cast
    LPAREN:p primitive_type:a dims_opt:b RPAREN unary_expression:c {:
        RESULT = parser.nf.Cast(parser.pos(p, c,a),
                parser.array(a, b.intValue()), c);
    :}
  | LPAREN:p expression:a RPAREN unary_expression_not_plus_minus:b {:
        RESULT = parser.nf.Cast(parser.pos(p, b,a), parser.exprToType(a), b);
    :}
  | LPAREN:p name:a dims:b RPAREN unary_expression_not_plus_minus:c {:
        RESULT = parser.nf.Cast(parser.pos(p, c,a),
                parser.array(a.toType(), b.intValue()), c);
    :}
;
Extend cast_expression to enable casting to constant array types.
Hint: + Reveal...
Solution: + Reveal...
The final PPG file for CArray is as follows: + Reveal...
Rebuilding CArray now will fail, because we have not defined appropriate methods in the node factory. We will do this next.

Valid HTML 4.01 Transitional Valid CSS!