Overriding Operations on the AST
In addition to syntactic changes, programming language extensions also modify
the semantics of the existing syntactic constructs. For example, in
the
CArray extension, assignments to an element of an array is
prohibited if the array is constant. This section describes how extension
objects accommodate semantic changes.
Changing the language semantics involves the following modifications to the
existing compiler:
- Override methods that enforce semantics in the base language by creating an extension object class for each AST node with semantic changes.
- Modify the extension factory to override an auxiliary method that returns an instance of the extension object class created.
When
ConstArrayTypeNode was added in an earlier section, two
methods are overridden to associate a type object with the node and to determine
when the type object is canonical. These methods are invoked by a
node visitor that traverses the AST to perform a desired operation.
For example, buildTypes is called by a TypeBuilder,
and disambiguate is called by an AmbiguityRemover.
Package polyglot.visit contains many node visitors, most used by
the core Polyglot compiler passes (such as the two mentioned above), and others
primarily for optimizations (such as ExpressionFlattener, which
rewrites expressions in the AST to reduces the complexity of programs).
Each node visitor is a subclass of
NodeVisitor in package
polyglot.visit. This class defines these core methods for visiting
the AST:
begin, called before the entire tree is visited. This method allows the visitor to perform any initialization that cannot be done when the visitor is created.end, called after the entire tree has been visited. This method allows the visitor to perform any last minute cleanup, including flushing buffers and I/O connections.entergives the visitor the option of changing internal state or returning a new visitor which will be used to visit the children of a given AST node.leave, called after all of the children of a given AST node have been visited to provide the visitor with the last opportunity to modify the AST node and all its descendants.overridegives the visitor an option of overriding all traversal of a given AST node and all its descendants.
begin- Given a node
nat the root of the current AST, one of these two cases:- Overriding traversal of
n. This is achieved by having the visitor'soverridemethod return a non-null value, which becomes the new node replacingn - Normal traversal, consisting of these steps:
enter, preparing to visit children ofn- Recursively visit (step 2) each child of
n leave, finishing visitingn
- Overriding traversal of
end
n visited in step 2 is determined by
the method visitChildren(), which every AST node class must implement.
For example, ConstArrayTypeNode_c inherits this implementation from
ArrayTypeNode_c:
@Override
public Node visitChildren(NodeVisitor v) {
TypeNode base = visitChild(this.base, v);
return reconstruct(this, base);
}
Each constant-array type node has a position, the location of the text
corresponding to the type node in the input source file, and the type node for
the base type of the array. This implementation of visitChildren
elects not to visit the position, because it never changes. In fact, positions
are never visited by any AST node by default. On the other hand, the type node
for the base type (this.base) is visited using the visitor
v; the result of the visit is stored in local variable
base. Finally, the method returns the reconstruction of
the AST node, potentially replacing the initial base with the visited base,
making a copy if there is a change to maintain the invariant that AST nodes are
immutable.
Each implementation of a visitor overrides methods
enter,
leave, and possibly override to invoke different
methods defined in AST classes to fulfill the specific purposes of these
visitors. For example, in TypeBuilder, enter calls
buildTypesEnter, and leave calls
buildTypes; in AmbiguityRemover enter
calls disambiguateEnter, leave calls
disambiguate, and override calls
disambiguateOverride.
We have seen that buildTypes and disambiguate are
dispatched from two different visitors. The following table lists some notable
methods:
| Visitor | enter |
leave |
override |
|---|---|---|---|
TypeBuilder |
buildTypesEnter |
buildTypes |
N/A |
AmbiguityRemover |
disambiguateEnter |
disambiguate |
disambiguateOverride |
TypeChecker |
typeCheckEnter |
typeCheck |
typeCheckOverride |
ExceptionChecker |
exceptionCheckEnter |
exceptionCheck |
N/A |
This approach of dispatching operations back to AST nodes
avoids an extensibility problem with Gang-of-4 visitors, which do not allow
new node types to be added in a scalable, modular way. Because operations are implemented
with AST nodes, the code associated with a given AST node is all found in that
node's implementation or in the implementations of its extension objects.
However, the code associated with a given compiler pass is distributed across
the various nodes that participate in an interesting way in that pass.
Nodes that do not participate in a compiler pass in a way more
interesting than allowing their children to be rewritten by the pass can
simply inherit the default implementation of that pass's operations.
Polyglot implements its own mechanism for dispatching operations to either
node objects or their extension objects.
When an operation defined in an AST node class must be redefined, the
overriding definition is implemented in an extension class, whose instances are
attached to the base AST node to enable the override. Java's inheritance
mechanism is insufficient to both supply the override and enable scalable
extensibility. For instance, suppose we would like to implement a language
extension that skips type checking. The standard inheritance mechanism would
require all existing node classes be overridden with a type-checking method that does
nothing, whereas Polyglot requires that the default extension class for the language
override the type-checking method with one that does nothing, and attaches
an extension object created from this class to every AST node. In short, using
inheritance to mix in new functionality requires changing many classes, while the
extension object design pattern means Polyglot requires just one.
Implementing extension object classes
We are now ready to detect and flag assignments to an element of a constant
array as errors. This check is performed during type checking, when the AST
is visited by a
TypeChecker. In Polyglot, assignments to array
elements are represented by interface ArrayAccessAssign.
Therefore, we need to override the implementation in
ArrayAccessAssign_c. To do this, we will create extension class
CArrayArrayAccessAssignExt whose instances will be attached to
instances of ArrayAccessAssign_c, starting with the following
skeleton:
package carray.ast;
import polyglot.util.SerialVersionUID;
/**
* This class provides an extension object to ArrayAssign to implement the
* restriction that elements of a const array cannot be modified.
*/
public class CArrayArrayAccessAssignExt extends CArrayExt {
// Extension objects are also serializable.
private static final long serialVersionUID = SerialVersionUID.generate();
}
The naming convention for extension classes is to begin with the extension name
(e.g., CArray), followed by the name of the overridden interface
(e.g., ArrayAccessAssign), and end with Ext.
To implement the check for illegal assignments, we will override method
typeCheck in this extension class. Overriding this method, rather
than other methods, ensures that all the children of the assignments are
assigned appropriate type objects, which are crucial for the check itself. To
start, let us begin with the method skeleton:
/* Also add these imports:
* import polyglot.ast.Node;
* import polyglot.types.SemanticException;
* import polyglot.visit.TypeChecker;
*/
@Override
public Node typeCheck(TypeChecker tc) throws SemanticException {
// TODO: Implement this method.
}
The method throws a SemanticException when there is a semantic
error. In particular, the error will be thrown in this exception. When the
AST node is free of errors, this method returns the type-checked node, namely,
the AST node with type information attached to it.
Checking illegal assignments entails detecting that the type of the left-hand
side of the assignments, which is always an
ArrayType, is in fact a
ConstArrayType.
First, we need to retrieve the AST node associated with this extension. Use
instance method node to do this:
ArrayAccessAssign n = (ArrayAccessAssign) this.node();
Second, we retrieve the left-hand side of the assignment, which is an
ArrayAccess:
ArrayAccess left = n.left();
Third, we retrieve the array whose element is being assigned to:
Expr array = left.array();
Now, if the type of the array is ConstArrayType, issue an error:
if (array.type() instanceof ConstArrayType) {
throw new SemanticException("Cannot assign a value to an element of a const array.",
n.position());
}
Otherwise, type checking remains the same as in the base language, so we
delegate the rest of type checking to the base language. Method
superLang returns a language dispatcher of the language
extended by this extension. The language dispatcher has methods for operations
on the AST similar to what we have seen, but with the additional parameter of
the AST node to perform the operation on. In our case, it is the assignment
itself:
return superLang().typeCheck(n, tc);
Language dispatchers select AST nodes or extension objects thereof containing
the appropriate implementation of AST operations for a given language extension.
We will explore language dispatchers in more detail in a later section. For
now, here is the full implementation of typeCheck:
/* Also add these imports:
* import polyglot.ast.ArrayAccess;
* import polyglot.ast.ArrayAccessAssign;
* import polyglot.ast.Expr;
* import polyglot.ast.Node;
* import polyglot.types.SemanticException;
* import polyglot.visit.TypeChecker;
* import carray.types.ConstArrayType;
*/
@Override
public Node typeCheck(TypeChecker tc) throws SemanticException {
// Suppose n is a[2] = 3;
ArrayAccessAssign n = (ArrayAccessAssign) this.node();
// Then left is a[2].
ArrayAccess left = n.left();
// And array is a.
Expr array = left.array();
// If the type of the array is a ConstArrayType, then this assignment
// is illegal.
if (array.type() instanceof ConstArrayType) {
throw new SemanticException("Cannot assign a value to an element of a const array.",
n.position());
}
// Let the base language deal with the default type checking.
return superLang().typeCheck(n, tc);
}
Exercise
Method
throwTypes returns a list of Types of
exceptions that might get thrown from an AST node at run time. This method is
called when the ExceptionChecker verifies that a method or a
constructor throws only exceptions it declares to throw. Here is the
implementation of throwTypes for ArrayAccessAssign in
Java 1.4:
Override method
throwTypes in
CArrayArrayAccessAssignExt to indicate that
ArrayStoreException is no longer possible to get thrown when
assigning to an array element.
The final implementation of
CArrayArrayAccessAssignExt is as
follows:
+ Reveal...
Modifying extension factory
Once an extension class has been implemented, its instance must be created and
attached to an AST node for the overridden AST operations to take effect.
As we have seen, the extension factory is responsible for creating extension
objects, and the node factory is responsible for attaching extension objects to
AST nodes. The abstract extension factory deals with arranging extension
objects in a correct order, leaving two auxiliary methods overridable by its
subclasses. Therefore, we must modify the concrete extension factory
to instantiate the extension class.
The extension generator already created the template concrete extension factory
for
CArray:
CArrayExtFactory_c.javaThe class is declared final because a concrete extension factory is proprietary to a specific language extension, so it should not be extended or reused by further extensions of the language extension. The two constructors are standard, following ones declared in the abstract extension factory. The auxiliary method
extNodeImpl supplies the
default extension object for CArray. The default extension object
simply forwards all the AST operations to the language being extended, unless
the implementation of the default extension class is overridden otherwise. Now,
we want operations on ArrayAccessAssign, namely, type checking, to
not default to the base language. We can do this by supplying a different
extension object for ArrayAccessAssign, namely, an instance of
CArrayArrayAccessAssignExt we just created. As a result, we
override the auxiliary method that supplies extension objects for assignments
to array elements:
@Override
protected Ext extArrayAccessAssignImpl() {
// We want to override some operations for ArrayAccessAssign, so
// return an extension object specific to carray.
return new CArrayArrayAccessAssignExt();
}
Here is the implementation of
CArrayExtFactory_c so far:
package carray.ast;
import polyglot.ast.Ext;
import polyglot.ast.ExtFactory;
/**
* This class defines extension factory methods specifically for carray
* extension (and not for extensions of carray).
*/
public final class CArrayExtFactory_c extends CArrayAbstractExtFactory_c {
public CArrayExtFactory_c() {
super();
}
public CArrayExtFactory_c(ExtFactory nextExtFactory) {
super(nextExtFactory);
}
@Override
protected Ext extNodeImpl() {
// Return a default extension object for this extension, which simply
// forward any operation to the base language.
return new CArrayExt();
}
@Override
protected Ext extArrayAccessAssignImpl() {
// We want to override some operations for ArrayAccessAssign, so
// return an extension object specific to carray.
return new CArrayArrayAccessAssignExt();
}
}
Later on, this class will be modified to handle more operations for other
compiler passes in CArray. Running the test harness should yield a
better result, but more is to be done, especially for operations on types. We
will do this in the next section.