Overriding Operations on the AST
In addition to syntactic changes, programming language extensions also modify
the semantics of the existing syntactic constructs. For example, in
the
CArray
extension, assignments to an element of an array is
prohibited if the array is constant. This section describes how extension
objects accommodate semantic changes.
Changing the language semantics involves the following modifications to the
existing compiler:
- Override methods that enforce semantics in the base language by creating an extension object class for each AST node with semantic changes.
- Modify the extension factory to override an auxiliary method that returns an instance of the extension object class created.
When
ConstArrayTypeNode
was added in an earlier section, two
methods are overridden to associate a type object with the node and to determine
when the type object is canonical. These methods are invoked by a
node visitor that traverses the AST to perform a desired operation.
For example, buildTypes
is called by a TypeBuilder
,
and disambiguate is called by an AmbiguityRemover
.
Package polyglot.visit
contains many node visitors, most used by
the core Polyglot compiler passes (such as the two mentioned above), and others
primarily for optimizations (such as ExpressionFlattener
, which
rewrites expressions in the AST to reduces the complexity of programs).
Each node visitor is a subclass of
NodeVisitor
in package
polyglot.visit
. This class defines these core methods for visiting
the AST:
begin
, called before the entire tree is visited. This method allows the visitor to perform any initialization that cannot be done when the visitor is created.end
, called after the entire tree has been visited. This method allows the visitor to perform any last minute cleanup, including flushing buffers and I/O connections.enter
gives the visitor the option of changing internal state or returning a new visitor which will be used to visit the children of a given AST node.leave
, called after all of the children of a given AST node have been visited to provide the visitor with the last opportunity to modify the AST node and all its descendants.override
gives the visitor an option of overriding all traversal of a given AST node and all its descendants.
begin
- Given a node
n
at the root of the current AST, one of these two cases:- Overriding traversal of
n
. This is achieved by having the visitor'soverride
method return a non-null value, which becomes the new node replacingn
- Normal traversal, consisting of these steps:
enter
, preparing to visit children ofn
- Recursively visit (step 2) each child of
n
leave
, finishing visitingn
- Overriding traversal of
end
n
visited in step 2 is determined by
the method visitChildren()
, which every AST node class must implement.
For example, ConstArrayTypeNode_c
inherits this implementation from
ArrayTypeNode_c
:
@Override public Node visitChildren(NodeVisitor v) { TypeNode base = visitChild(this.base, v); return reconstruct(this, base); }Each constant-array type node has a position, the location of the text corresponding to the type node in the input source file, and the type node for the base type of the array. This implementation of
visitChildren
elects not to visit the position, because it never changes. In fact, positions
are never visited by any AST node by default. On the other hand, the type node
for the base type (this.base
) is visited using the visitor
v
; the result of the visit is stored in local variable
base
. Finally, the method returns the reconstruction of
the AST node, potentially replacing the initial base with the visited base,
making a copy if there is a change to maintain the invariant that AST nodes are
immutable.
Each implementation of a visitor overrides methods
enter
,
leave
, and possibly override
to invoke different
methods defined in AST classes to fulfill the specific purposes of these
visitors. For example, in TypeBuilder
, enter
calls
buildTypesEnter
, and leave
calls
buildTypes
; in AmbiguityRemover
enter
calls disambiguateEnter
, leave
calls
disambiguate
, and override
calls
disambiguateOverride
.
We have seen that buildTypes
and disambiguate
are
dispatched from two different visitors. The following table lists some notable
methods:
Visitor | enter |
leave |
override |
---|---|---|---|
TypeBuilder |
buildTypesEnter |
buildTypes |
N/A |
AmbiguityRemover |
disambiguateEnter |
disambiguate |
disambiguateOverride |
TypeChecker |
typeCheckEnter |
typeCheck |
typeCheckOverride |
ExceptionChecker |
exceptionCheckEnter |
exceptionCheck |
N/A |
This approach of dispatching operations back to AST nodes
avoids an extensibility problem with Gang-of-4 visitors, which do not allow
new node types to be added in a scalable, modular way. Because operations are implemented
with AST nodes, the code associated with a given AST node is all found in that
node's implementation or in the implementations of its extension objects.
However, the code associated with a given compiler pass is distributed across
the various nodes that participate in an interesting way in that pass.
Nodes that do not participate in a compiler pass in a way more
interesting than allowing their children to be rewritten by the pass can
simply inherit the default implementation of that pass's operations.
Polyglot implements its own mechanism for dispatching operations to either
node objects or their extension objects.
When an operation defined in an AST node class must be redefined, the
overriding definition is implemented in an extension class, whose instances are
attached to the base AST node to enable the override. Java's inheritance
mechanism is insufficient to both supply the override and enable scalable
extensibility. For instance, suppose we would like to implement a language
extension that skips type checking. The standard inheritance mechanism would
require all existing node classes be overridden with a type-checking method that does
nothing, whereas Polyglot requires that the default extension class for the language
override the type-checking method with one that does nothing, and attaches
an extension object created from this class to every AST node. In short, using
inheritance to mix in new functionality requires changing many classes, while the
extension object design pattern means Polyglot requires just one.
Implementing extension object classes
We are now ready to detect and flag assignments to an element of a constant
array as errors. This check is performed during type checking, when the AST
is visited by a
TypeChecker
. In Polyglot, assignments to array
elements are represented by interface ArrayAccessAssign
.
Therefore, we need to override the implementation in
ArrayAccessAssign_c
. To do this, we will create extension class
CArrayArrayAccessAssignExt
whose instances will be attached to
instances of ArrayAccessAssign_c
, starting with the following
skeleton:
package carray.ast; import polyglot.util.SerialVersionUID; /** * This class provides an extension object to ArrayAssign to implement the * restriction that elements of a const array cannot be modified. */ public class CArrayArrayAccessAssignExt extends CArrayExt { // Extension objects are also serializable. private static final long serialVersionUID = SerialVersionUID.generate(); }The naming convention for extension classes is to begin with the extension name (e.g.,
CArray
), followed by the name of the overridden interface
(e.g., ArrayAccessAssign
), and end with Ext
.
To implement the check for illegal assignments, we will override method
typeCheck
in this extension class. Overriding this method, rather
than other methods, ensures that all the children of the assignments are
assigned appropriate type objects, which are crucial for the check itself. To
start, let us begin with the method skeleton:
/* Also add these imports: * import polyglot.ast.Node; * import polyglot.types.SemanticException; * import polyglot.visit.TypeChecker; */ @Override public Node typeCheck(TypeChecker tc) throws SemanticException { // TODO: Implement this method. }The method throws a
SemanticException
when there is a semantic
error. In particular, the error will be thrown in this exception. When the
AST node is free of errors, this method returns the type-checked node, namely,
the AST node with type information attached to it.
Checking illegal assignments entails detecting that the type of the left-hand
side of the assignments, which is always an
ArrayType
, is in fact a
ConstArrayType
.
First, we need to retrieve the AST node associated with this extension. Use
instance method node
to do this:
ArrayAccessAssign n = (ArrayAccessAssign) this.node();Second, we retrieve the left-hand side of the assignment, which is an
ArrayAccess
:
ArrayAccess left = n.left();Third, we retrieve the array whose element is being assigned to:
Expr array = left.array();Now, if the type of the array is
ConstArrayType
, issue an error:
if (array.type() instanceof ConstArrayType) { throw new SemanticException("Cannot assign a value to an element of a const array.", n.position()); }Otherwise, type checking remains the same as in the base language, so we delegate the rest of type checking to the base language. Method
superLang
returns a language dispatcher of the language
extended by this extension. The language dispatcher has methods for operations
on the AST similar to what we have seen, but with the additional parameter of
the AST node to perform the operation on. In our case, it is the assignment
itself:
return superLang().typeCheck(n, tc);Language dispatchers select AST nodes or extension objects thereof containing the appropriate implementation of AST operations for a given language extension. We will explore language dispatchers in more detail in a later section. For now, here is the full implementation of
typeCheck
:
/* Also add these imports: * import polyglot.ast.ArrayAccess; * import polyglot.ast.ArrayAccessAssign; * import polyglot.ast.Expr; * import polyglot.ast.Node; * import polyglot.types.SemanticException; * import polyglot.visit.TypeChecker; * import carray.types.ConstArrayType; */ @Override public Node typeCheck(TypeChecker tc) throws SemanticException { // Suppose n is a[2] = 3; ArrayAccessAssign n = (ArrayAccessAssign) this.node(); // Then left is a[2]. ArrayAccess left = n.left(); // And array is a. Expr array = left.array(); // If the type of the array is a ConstArrayType, then this assignment // is illegal. if (array.type() instanceof ConstArrayType) { throw new SemanticException("Cannot assign a value to an element of a const array.", n.position()); } // Let the base language deal with the default type checking. return superLang().typeCheck(n, tc); }
Exercise
Method
throwTypes
returns a list of Type
s of
exceptions that might get thrown from an AST node at run time. This method is
called when the ExceptionChecker
verifies that a method or a
constructor throws only exceptions it declares to throw. Here is the
implementation of throwTypes
for ArrayAccessAssign
in
Java 1.4:
Override method
throwTypes
in
CArrayArrayAccessAssignExt
to indicate that
ArrayStoreException
is no longer possible to get thrown when
assigning to an array element.
The final implementation of
CArrayArrayAccessAssignExt
is as
follows:
+ Reveal...
Modifying extension factory
Once an extension class has been implemented, its instance must be created and
attached to an AST node for the overridden AST operations to take effect.
As we have seen, the extension factory is responsible for creating extension
objects, and the node factory is responsible for attaching extension objects to
AST nodes. The abstract extension factory deals with arranging extension
objects in a correct order, leaving two auxiliary methods overridable by its
subclasses. Therefore, we must modify the concrete extension factory
to instantiate the extension class.
The extension generator already created the template concrete extension factory
for
CArray
:
CArrayExtFactory_c.javaThe class is declared final because a concrete extension factory is proprietary to a specific language extension, so it should not be extended or reused by further extensions of the language extension. The two constructors are standard, following ones declared in the abstract extension factory. The auxiliary method
extNodeImpl
supplies the
default extension object for CArray
. The default extension object
simply forwards all the AST operations to the language being extended, unless
the implementation of the default extension class is overridden otherwise. Now,
we want operations on ArrayAccessAssign
, namely, type checking, to
not default to the base language. We can do this by supplying a different
extension object for ArrayAccessAssign
, namely, an instance of
CArrayArrayAccessAssignExt
we just created. As a result, we
override the auxiliary method that supplies extension objects for assignments
to array elements:
@Override protected Ext extArrayAccessAssignImpl() { // We want to override some operations for ArrayAccessAssign, so // return an extension object specific to carray. return new CArrayArrayAccessAssignExt(); }
Here is the implementation of
CArrayExtFactory_c
so far:
package carray.ast; import polyglot.ast.Ext; import polyglot.ast.ExtFactory; /** * This class defines extension factory methods specifically for carray * extension (and not for extensions of carray). */ public final class CArrayExtFactory_c extends CArrayAbstractExtFactory_c { public CArrayExtFactory_c() { super(); } public CArrayExtFactory_c(ExtFactory nextExtFactory) { super(nextExtFactory); } @Override protected Ext extNodeImpl() { // Return a default extension object for this extension, which simply // forward any operation to the base language. return new CArrayExt(); } @Override protected Ext extArrayAccessAssignImpl() { // We want to override some operations for ArrayAccessAssign, so // return an extension object specific to carray. return new CArrayArrayAccessAssignExt(); } }Later on, this class will be modified to handle more operations for other compiler passes in
CArray
. Running the test harness should yield a
better result, but more is to be done, especially for operations on types. We
will do this in the next section.