Parsing
The first step towards compiling a
CArray
program is to implement
a parser that recognizes CArray
code. In particular, the parser
must recognize the additional const
keyword introduced for
CArray
.
Copying an existing parser and modifying its code to add a new grammar rule is
a direct violation of scalable extensibility. Recall that, to respect scalable
extensibility, we would like to augment the existing parser as little
as possible, the amount of addition proportional to the amount of syntactic
difference, namely, the addition of
const
keyword.
Polyglot makes this proportional addition possible by providing
PPG (Polyglot Parser Generator),
a parser generator for extensible grammars.
PPG is based on
the CUP LALR
parser generator for Java and provides the ability to extend an existing
language grammar written in CUP or PPG. Polyglot also provides an extended
version of CUP that improves debugging support, giving counterexamples when
shift/reduce or reduce/reduce conflicts arise in a grammar.
This section gives an overview of the structure of a PPG grammar file and
shows how to write one for our
CArray
extension.
A more detailed
description of PPG is available at the Polyglot web site.
Overview of a PPG file
The generated code for the empty extension includes a template PPG file, which
has the following structure:
Extending a parser
For
CArray
, the const
keyword signifies a different
array type. Therefore, we would like to modify the parser to recognize this
new type by extending the grammar to handle constant-array types. First, let us
look at the existing grammar for traditional array types, defined in
polyglot/parse/java12.cup
in the src
directory of the
Polyglot code base:
1 array_type ::= 2 // array of primitive types such as int[] and char[][] 3 primitive_type:a dims:b {: 4 // ... 5 :} 6 // array of object types such as String[] and Object[][] 7 | name:a dims:b {: 8 // ... 9 :} 10 ;At parsing, array types can be classified into two categories, depending on the ultimate base type, i.e., the type before all the square brackets. For example, the ultimate base of
char[][]
is char
. When
extending the grammar to handle the const
keyword, we will also
follow this classification.
For each of the possible productions, i.e., the right-hand side of a
grammar rule, an action, specified between a pair of decorated braces,
is a piece of Java code that turns the production into a desired object
corresponding to the nonterminal being defined. The actions for array types are
listed below:
1 array_type ::= 2 // array of primitive types such as int[] and char[][] 3 primitive_type:a dims:b {: 4 RESULT = parser.array(a, b.intValue()); 5 :} 6 // array of object types such as String[] and Object[][] 7 | name:a dims:b {: 8 RESULT = parser.array(a.toType(), b.intValue()); 9 :} 10 ;Each token in a production can be given a name following a colon after the token. For example,
primitive_type:a
means that the result of
parsing primitive_type
is stored in a variable named
a
. The scope of a token variable is the action block following its
corresponding production. Each action can refer to these token variables. The
result of the action, which corresponds to the nonterminal being defined, is
assigned to variable RESULT
.
For example, the second production of array_type
contains an action
that converts a name to a type (a.toType()
), unboxes dimensions,
which is an Integer
, into an int
(unnecessary as of
Java 5), and then uses an auxiliary method array
defined within the
parser to construct the desired TypeNode
, which is the declared
type of array_type
.
The auxiliary method
array
is used to handle multidimensional
arrays properly. As opposed to the ultimate base of an array type, the
base of an array type has one fewer dimension than the array type
itself. For example, the base of char[][]
is char[]
.
The auxiliary method creates appropriate array types in these cases. When
dealing with constant arrays, a similar auxiliary method is also necessary. Let
us implement this method now:
Notice that the
CArrayNodeFactory
must define a factory method that
constructs a constant-array type node. We will do this in the next section.
For now, we have enough machinery to parse constant-array types. Let us define
these productions by extending the existing production rules for
array_type
:
extend array_type ::= // RESULT of array_type is a TypeNode primitive_type:a CONST dims:b {: RESULT = parser.constArray(a, b); :} | name:a CONST dims:b {: RESULT = parser.constArray(a.toType(), b); :} ;Notice the addition of terminal token
CONST
, which represents the
keyword const
, to differentiate the new productions from the old.
The actions also take advantage of autounboxing for the dimensions.
Exercise
The grammar for casting does not use nonterminal
array_type
to
circumvent ambiguity. Here is the definition of cast_expression
from the Java grammar:
cast_expression ::= // RESULT of cast_expression is a Cast LPAREN:p primitive_type:a dims_opt:b RPAREN unary_expression:c {: RESULT = parser.nf.Cast(parser.pos(p, c,a), parser.array(a, b.intValue()), c); :} | LPAREN:p expression:a RPAREN unary_expression_not_plus_minus:b {: RESULT = parser.nf.Cast(parser.pos(p, b,a), parser.exprToType(a), b); :} | LPAREN:p name:a dims:b RPAREN unary_expression_not_plus_minus:c {: RESULT = parser.nf.Cast(parser.pos(p, c,a), parser.array(a.toType(), b.intValue()), c); :} ;Extend
cast_expression
to enable casting to constant array types.
Hint:
+ Reveal...
Here is the documentation for the extended definition:
Solution:
+ Reveal...
1 extend cast_expression ::= // RESULT of cast_expression is a Cast 2 LPAREN:p primitive_type:a CONST dims:b RPAREN unary_expression:c {: 3 RESULT = parser.nf.Cast(parser.pos(p, c, a), parser.constArray(a, b), c); 4 :} 5 | LPAREN:p name:a CONST dims:b RPAREN unary_expression_not_plus_minus:c {: 6 RESULT = parser.nf.Cast(parser.pos(p, c, a), 7 parser.constArray(a.toType(), b), c); 8 :} 9 ;
The final PPG file for
CArray
is as follows:
+ Reveal...
Rebuilding
CArray
now will fail, because we have not defined
appropriate methods in the node factory. We will do this next.