Overriding Operations on the AST

In addition to syntactic changes, programming language extensions also modify the semantics of the existing syntactic constructs. For example, in the CArray extension, assignments to an element of an array is prohibited if the array is constant. This section describes how extension objects accommodate semantic changes.

Changing the language semantics involves the following modifications to the existing compiler:

Override methods that enforce semantics in the base language by creating an extension object class for each AST node with semantic changes.
Modify the extension factory to override an auxiliary method that returns an instance of the extension object class created.

When ConstArrayTypeNode was added in an earlier section, two methods are overridden to associate a type object with the node and to determine when the type object is canonical. These methods are invoked by a node visitor that traverses the AST to perform a desired operation. For example, buildTypes is called by a TypeBuilder, and disambiguate is called by an AmbiguityRemover. Package polyglot.visit contains many node visitors, most used by the core Polyglot compiler passes (such as the two mentioned above), and others primarily for optimizations (such as ExpressionFlattener, which rewrites expressions in the AST to reduces the complexity of programs).

Each node visitor is a subclass of NodeVisitor in package polyglot.visit. This class defines these core methods for visiting the AST:

begin, called before the entire tree is visited. This method allows the visitor to perform any initialization that cannot be done when the visitor is created.
end, called after the entire tree has been visited. This method allows the visitor to perform any last minute cleanup, including flushing buffers and I/O connections.
enter gives the visitor the option of changing internal state or returning a new visitor which will be used to visit the children of a given AST node.
leave, called after all of the children of a given AST node have been visited to provide the visitor with the last opportunity to modify the AST node and all its descendants.
override gives the visitor an option of overriding all traversal of a given AST node and all its descendants.

Given an AST, a node visitor invokes these methods in the following fashion:

begin
Given a node n at the root of the current AST, one of these two cases:
- Overriding traversal of n. This is achieved by having the visitor's override method return a non-null value, which becomes the new node replacing n
- Normal traversal, consisting of these steps:
  - enter, preparing to visit children of n
  - Recursively visit (step 2) each child of n
  - leave, finishing visiting n
end

The order of the children of n visited in step 2 is determined by the method visitChildren(), which every AST node class must implement. For example, ConstArrayTypeNode_c inherits this implementation from ArrayTypeNode_c:

    @Override
    public Node visitChildren(NodeVisitor v) {
        TypeNode base = visitChild(this.base, v);
        return reconstruct(this, base);
    }

Each constant-array type node has a position, the location of the text corresponding to the type node in the input source file, and the type node for the base type of the array. This implementation of visitChildren elects not to visit the position, because it never changes. In fact, positions are never visited by any AST node by default. On the other hand, the type node for the base type (this.base) is visited using the visitor v; the result of the visit is stored in local variable base. Finally, the method returns the reconstruction of the AST node, potentially replacing the initial base with the visited base, making a copy if there is a change to maintain the invariant that AST nodes are immutable.

Each implementation of a visitor overrides methods enter, leave, and possibly override to invoke different methods defined in AST classes to fulfill the specific purposes of these visitors. For example, in TypeBuilder, enter calls buildTypesEnter, and leave calls buildTypes; in AmbiguityRemover enter calls disambiguateEnter, leave calls disambiguate, and override calls disambiguateOverride.

We have seen that buildTypes and disambiguate are dispatched from two different visitors. The following table lists some notable methods:

Visitor	`enter`	`leave`	`override`
`TypeBuilder`	`buildTypesEnter`	`buildTypes`	N/A
`AmbiguityRemover`	`disambiguateEnter`	`disambiguate`	`disambiguateOverride`
`TypeChecker`	`typeCheckEnter`	`typeCheck`	`typeCheckOverride`
`ExceptionChecker`	`exceptionCheckEnter`	`exceptionCheck`	N/A

This approach of dispatching operations back to AST nodes avoids an extensibility problem with Gang-of-4 visitors, which do not allow new node types to be added in a scalable, modular way. Because operations are implemented with AST nodes, the code associated with a given AST node is all found in that node's implementation or in the implementations of its extension objects. However, the code associated with a given compiler pass is distributed across the various nodes that participate in an interesting way in that pass. Nodes that do not participate in a compiler pass in a way more interesting than allowing their children to be rewritten by the pass can simply inherit the default implementation of that pass's operations.

Polyglot implements its own mechanism for dispatching operations to either node objects or their extension objects. When an operation defined in an AST node class must be redefined, the overriding definition is implemented in an extension class, whose instances are attached to the base AST node to enable the override. Java's inheritance mechanism is insufficient to both supply the override and enable scalable extensibility. For instance, suppose we would like to implement a language extension that skips type checking. The standard inheritance mechanism would require all existing node classes be overridden with a type-checking method that does nothing, whereas Polyglot requires that the default extension class for the language override the type-checking method with one that does nothing, and attaches an extension object created from this class to every AST node. In short, using inheritance to mix in new functionality requires changing many classes, while the extension object design pattern means Polyglot requires just one.

Implementing extension object classes

We are now ready to detect and flag assignments to an element of a constant array as errors. This check is performed during type checking, when the AST is visited by a TypeChecker. In Polyglot, assignments to array elements are represented by interface ArrayAccessAssign. Therefore, we need to override the implementation in ArrayAccessAssign_c. To do this, we will create extension class CArrayArrayAccessAssignExt whose instances will be attached to instances of ArrayAccessAssign_c, starting with the following skeleton:

package carray.ast;

import polyglot.util.SerialVersionUID;

/**
 * This class provides an extension object to ArrayAssign to implement the
 * restriction that elements of a const array cannot be modified.
 */
public class CArrayArrayAccessAssignExt extends CArrayExt {
    // Extension objects are also serializable.
    private static final long serialVersionUID = SerialVersionUID.generate();
}

The naming convention for extension classes is to begin with the extension name (e.g., CArray), followed by the name of the overridden interface (e.g., ArrayAccessAssign), and end with Ext.

To implement the check for illegal assignments, we will override method typeCheck in this extension class. Overriding this method, rather than other methods, ensures that all the children of the assignments are assigned appropriate type objects, which are crucial for the check itself. To start, let us begin with the method skeleton:

    /* Also add these imports:
     * import polyglot.ast.Node;
     * import polyglot.types.SemanticException;
     * import polyglot.visit.TypeChecker;
     */
    @Override
    public Node typeCheck(TypeChecker tc) throws SemanticException {
        // TODO: Implement this method.
    }

The method throws a SemanticException when there is a semantic error. In particular, the error will be thrown in this exception. When the AST node is free of errors, this method returns the type-checked node, namely, the AST node with type information attached to it.

Checking illegal assignments entails detecting that the type of the left-hand side of the assignments, which is always an ArrayType, is in fact a ConstArrayType. First, we need to retrieve the AST node associated with this extension. Use instance method node to do this:

        ArrayAccessAssign n = (ArrayAccessAssign) this.node();

Second, we retrieve the left-hand side of the assignment, which is an ArrayAccess:

        ArrayAccess left = n.left();

Third, we retrieve the array whose element is being assigned to:

        Expr array = left.array();

Now, if the type of the array is ConstArrayType, issue an error:

        if (array.type() instanceof ConstArrayType) {
            throw new SemanticException("Cannot assign a value to an element of a const array.",
                                        n.position());
        }

Otherwise, type checking remains the same as in the base language, so we delegate the rest of type checking to the base language. Method superLang returns a language dispatcher of the language extended by this extension. The language dispatcher has methods for operations on the AST similar to what we have seen, but with the additional parameter of the AST node to perform the operation on. In our case, it is the assignment itself:

        return superLang().typeCheck(n, tc);

Language dispatchers select AST nodes or extension objects thereof containing the appropriate implementation of AST operations for a given language extension. We will explore language dispatchers in more detail in a later section. For now, here is the full implementation of typeCheck:

    /* Also add these imports:
     * import polyglot.ast.ArrayAccess;
     * import polyglot.ast.ArrayAccessAssign;
     * import polyglot.ast.Expr;
     * import polyglot.ast.Node;
     * import polyglot.types.SemanticException;
     * import polyglot.visit.TypeChecker;
     * import carray.types.ConstArrayType;
     */
    @Override
    public Node typeCheck(TypeChecker tc) throws SemanticException {
        // Suppose n is a[2] = 3;
        ArrayAccessAssign n = (ArrayAccessAssign) this.node();
        // Then left is a[2].
        ArrayAccess left = n.left();
        // And array is a.
        Expr array = left.array();

        // If the type of the array is a ConstArrayType, then this assignment
        // is illegal.
        if (array.type() instanceof ConstArrayType) {
            throw new SemanticException("Cannot assign a value to an element of a const array.",
                                        n.position());
        }

        // Let the base language deal with the default type checking.
        return superLang().typeCheck(n, tc);
    }

Exercise

Method throwTypes returns a list of Types of exceptions that might get thrown from an AST node at run time. This method is called when the ExceptionChecker verifies that a method or a constructor throws only exceptions it declares to throw. Here is the implementation of throwTypes for ArrayAccessAssign in Java 1.4:

Override method throwTypes in CArrayArrayAccessAssignExt to indicate that ArrayStoreException is no longer possible to get thrown when assigning to an array element.

Solution: + Reveal...

The final implementation of CArrayArrayAccessAssignExt is as follows: + Reveal...

CArrayArrayAccessAssignExt.java

Modifying extension factory

Once an extension class has been implemented, its instance must be created and attached to an AST node for the overridden AST operations to take effect. As we have seen, the extension factory is responsible for creating extension objects, and the node factory is responsible for attaching extension objects to AST nodes. The abstract extension factory deals with arranging extension objects in a correct order, leaving two auxiliary methods overridable by its subclasses. Therefore, we must modify the concrete extension factory to instantiate the extension class.

The extension generator already created the template concrete extension factory for CArray:

CArrayExtFactory_c.java

The class is declared final because a concrete extension factory is proprietary to a specific language extension, so it should not be extended or reused by further extensions of the language extension. The two constructors are standard, following ones declared in the abstract extension factory. The auxiliary method extNodeImpl supplies the default extension object for CArray. The default extension object simply forwards all the AST operations to the language being extended, unless the implementation of the default extension class is overridden otherwise. Now, we want operations on ArrayAccessAssign, namely, type checking, to not default to the base language. We can do this by supplying a different extension object for ArrayAccessAssign, namely, an instance of CArrayArrayAccessAssignExt we just created. As a result, we override the auxiliary method that supplies extension objects for assignments to array elements:

    @Override
    protected Ext extArrayAccessAssignImpl() {
        // We want to override some operations for ArrayAccessAssign, so
        // return an extension object specific to carray.
        return new CArrayArrayAccessAssignExt();
    }

Here is the implementation of CArrayExtFactory_c so far:

package carray.ast;

import polyglot.ast.Ext;
import polyglot.ast.ExtFactory;

/**
 * This class defines extension factory methods specifically for carray
 * extension (and not for extensions of carray).
 */
public final class CArrayExtFactory_c extends CArrayAbstractExtFactory_c {

    public CArrayExtFactory_c() {
        super();
    }

    public CArrayExtFactory_c(ExtFactory nextExtFactory) {
        super(nextExtFactory);
    }

    @Override
    protected Ext extNodeImpl() {
        // Return a default extension object for this extension, which simply
        // forward any operation to the base language.
        return new CArrayExt();
    }

    @Override
    protected Ext extArrayAccessAssignImpl() {
        // We want to override some operations for ArrayAccessAssign, so
        // return an extension object specific to carray.
        return new CArrayArrayAccessAssignExt();
    }
}

Later on, this class will be modified to handle more operations for other compiler passes in CArray. Running the test harness should yield a better result, but more is to be done, especially for operations on types. We will do this in the next section.