Adding Compiler Passes

Most compilers for language extensions require additional compiler passes to perform an analysis that does not exist in the base language, or to prepare the AST for later phases of the compiler, such as translation to other languages. This section explores how new compiler passes can be added to the Polyglot framework.
In the upcoming section, we will add two translation passes to the CArray extension. One of these passes translates a CArray program to Java 5 in a way that preserves the semantics of constant arrays. This translation becomes easier if the AST contains no array initializers. Therefore, in this section, we will add a pass to rewrite all array initializers to array creation expressions to prepare the AST for this translation.
Several steps are involved to have new compiler passes up and running:

Implementing a compiler pass

Adding a compiler pass involves implementing several mutually dependent components at once. We will proceed from the least dependent component to the most: For our CArray extension, we will create a pass that rewrites all array initializers to array creation expressions. A node visitor called ArrayInitRemover will be created for AST traversal. This visitor invokes methods removeArrayInitEnter and removeArrayInit that we will define for the AST. The first method prepares the visitor prior to visiting the node itself. The second method transforms the AST node according to the desired translation. In other words, we are adding another row to the table of visitors and methods they dispatch to:
Visitor enter leave override
ArrayInitRemover removeArrayInitEnter removeArrayInit N/A

Adding operations on the AST

To add operations on the AST, we first create an interface defining their method signatures. For CArray, this is interface CArrayOps:

Because these operations are applicable to every AST node in the
CArray extension, we modify the default extension class for
CArray, namely, CArrayExt, to implement
CArrayOps and provide the default implementation of these
operations:

For removeArrayInitEnter, the default is to leave the visitor as
is; the method simply returns the given visitor.  For
removeArrayInit, the default is to leave the AST node unmodified;
the method simply returns the AST node associated with the extension object by
invoking method node.
We will need to override these methods for AST nodes more involved in the translation. For now, let us turn to the other components of the compiler pass to make the code compile.

Node visitor

Every node visitor is a subclass of NodeVisitor, defined in package polyglot.visit. For ArrayInitRemover, we will extend ContextVisitor, which maintains a context throughout the visitor's pass, and is also the base class of the disambiguation and type checking visitors. Here is the skeleton:
package carray.visit;

import polyglot.visit.ContextVisitor;

/**
 * A visitor that translates array initializers to array creation expression.
 * Useful for translating carray to Java 5.
 */
public class ArrayInitRemover extends ContextVisitor {
    // TODO: Implement this class.
}
First, we create a constructor that simply invokes the superclass constructor:
    /* Also add these imports:
     * import polyglot.ast.NodeFactory;
     * import polyglot.frontend.Job;
     * import polyglot.types.TypeSystem;
     */
    public ArrayInitRemover(Job job, TypeSystem ts, NodeFactory nf) {
        super(job, ts, nf);
    }
Next, we override method lang, which returns the language dispatcher associated with this visitor, to cast the dispatcher type, as we know that the dispatcher associated with ArrayInitRemover is always CArrayLang, the language dispatcher for CArray and its extensions:
    /* Also add these imports:
     * import carray.ast.CArrayLang;
     */
    @Override
    public CArrayLang lang() {
        return (CArrayLang) super.lang();
    }
Now, we override variants of methods enter and leave defined in NodeVisitor. The error-handling visitor, extended by the context visitor overrides the above two methods with error-handling code, and invokes methods enterCall and leaveCall, which replace the overridden enter and leave. We will override these two methods accordingly:
    /* Also add these imports:
     * import polyglot.ast.Node;
     * import polyglot.visit.NodeVisitor;
     */

    @Override
    public ArrayInitRemover enterCall(Node n) {
        // Prepare the visitor for before visiting n by letting each AST node
        // decide what to do with the visitor.
        return lang().removeArrayInitEnter(n, this);
    }

    @Override
    public Node leaveCall(Node old, Node n, NodeVisitor v) {
        // After visiting old's children, the results of which are now children
        // of n, visit n itself to perform the operation that n's class defines
        // for this compiler pass.
        return lang().removeArrayInit(n, this);
    }
Node visitors use the language dispatcher to redirect the invocation of the AST operation to the appropriate AST node or extension object. We will explore in more detail how language dispatchers work below.
The implementation of ArrayInitRemover is not yet complete due to the structure of the AST. We only want to rewrite array initializers that are not part of an array creation expression. For example, we want to rewrite the right-hand side of the declaration int[] a = { 1, 2, 3 };, but not of the declaration int[] a = new int[]{ 1, 2, 3 };. In fact, the rewriting pass we are implementing translates the first declaration to the second.
To keep track whether to translate a given array initializer, ArrayInitRemover maintains a boolean state whether the current AST node being visited is an array creation expression, an instance of NewArray. If so, when visiting an array initializer, an instance of ArrayInit, the rewriting is skipped. Nevertheless, an array initializer that is a child of an array creation expression could contain an array initializer, so when visiting the children of an array initializer, the boolean state must be reset. AST nodes will query this state to determine whether rewriting is required.
We implement this state using instance variable inNewArray. We also add an accessor method:
    protected boolean inNewArray = false;

    public boolean inNewArray() {
        return inNewArray;
    }
The state can be set by the following method, which makes a copy of the visitor first to maintain immutability of the visitor. (Visitors can be mutable; in fact, ArrayInitRemover can. This tutorial illustrates use of a utility class that copies an object.) The state is then set accordingly, and the updated visitor is returned:
    /* Also add these imports:
     * import polyglot.util.Copy;
     */
    public ArrayInitRemover inNewArray(boolean inNewArray) {
        ArrayInitRemover v = Copy.Util.copy(this);
        v.inNewArray = inNewArray;
        return v;
    }
Methods in the AST will invoke the above mutator as appropriate, as shown below.
Here is the final implementation of ArrayInitRemover:

Language dispatcher

The data structure that represents an AST node is a chain of the base AST node and zero or more extension objects. For a given language, one of these instances in the chain contains the correct implementation of AST operations for that language. Polyglot must be able to locate this instance correctly, or language extensions will not work as expected.
A language dispatcher is responsible for locating an instance containing the correct implementation of AST operations. For convenience, let a node instance denote one of the instances in a chain of objects that represents an AST node. Each node instance belongs to a unique language extension within the chain. A language dispatcher traverses the chain to find the node instance matching its language extension and invokes the requested AST operation on it.
The extension generator already created a template language dispatcher for CArray:

Given an AST node, finding the correct node instance is the same.  Therefore,
only one instance of a language dispatcher is required per language extension.
In other words, language dispatchers are singleton.  The static method
lang is for future developments and can be ignored for now.  Method
carrayExt finds the node instance for the CArray
extension.  Method NodeOps overrides one in JLang_c,
the superclass of CArrayLang_c, to ensure that existing AST
operations are redirected to the implementation in CArray.
To add dispatching methods for new AST operations, we first modify the template interface CArrayLang with the method signatures:

Each dispatch method is like the method for AST, but with the addition of the
first parameter being the AST node to be dispatched on.
Next, we implement these methods in CArrayLang_c. The implementation of removeArrayInitEnter and removeArrayInit are in instances of CArrayOps, which is implemented by CArrayExt. Therefore, we need a method similar to NodeOps that finds a node instance of CArrayOps:
    /**
     * Return an object that implements CArray operations appropriate for this
     * language.
     * @param n
     * @return
     */
    protected CArrayOps CArrayOps(Node n) {
        return carrayExt(n);
    }
It turns out the implementation of NodeOps and CArrayOps are the same because an instance of CArrayExt contains the correct implementation of operations defined in both NodeOps and CArrayOps.
Finally, we are ready to implement the dispatch methods. For each dispatch method, the dispatching pattern is identical regardless of the language extension; that is, we always dispatch to the same method with the same arguments. The only difference is the node instance we dispatch to; that is, the receiver object of the method depends on the language extension. Since we have already refactored the determination of the receiver object into method CArrayOps, the body of dispatch methods are always the same, and hence they can be declared final:
    @Override
    public final ArrayInitRemover removeArrayInitEnter(Node n,
            ArrayInitRemover v) {
        return CArrayOps(n).removeArrayInitEnter(v);
    }

    @Override
    public final Node removeArrayInit(Node n, ArrayInitRemover v) {
        // We always dispatch to the same method regardless of the language, but
        // the method _implementation_ we dispatch to will vary depending on the
        // language.  The receiver object, i.e., CArrayOps(n), will determine
        // the appropriate implementation.  Overriding method CArrayOps will
        // change the appropriate receiver object, so this method can be
        // declared final.
        return CArrayOps(n).removeArrayInit(v);
    }
Extensions of CArray will override node instance finders such as methods NodeOps and CArrayOps, which is enough for Polyglot to locate the correct node instance.
Here is the completed implementation of CArrayLang_c:

Putting it all together

So far, we have implemented the minimally required infrastructure for a new compiler pass without actually transforming the AST. Now we will override the AST operations we defined for certain AST nodes that require rewriting. In particular, for CArray, the AST nodes involved are array initializers and array creation expressions.
First, we will deal with array creation expressions, as the steps involved are easier. All the node instance for NewArray has to do is to notify ArrayInitRemover that it is about to visit a NewArray. We implement this notification by overriding method removeArrayInitEnter in extension class CArrayNewArrayExt, resulting in the following implementation:
 1 package carray.ast;
 2 
 3 import polyglot.util.SerialVersionUID;
 4 import carray.visit.ArrayInitRemover;
 5 
 6 public class CArrayNewArrayExt extends CArrayExt {
 7     private static final long serialVersionUID = SerialVersionUID.generate();
 8 
 9     @Override
10     public ArrayInitRemover removeArrayInitEnter(ArrayInitRemover v) {
11         // Update the visitor to record that we are entering a
12         // NewArray node.
13         return v.inNewArray(true);
14     }
15 }
Next, we will deal with array initializers. In contrast with array creation expressions, the node instance for ArrayInit that is a child of a NewArray has to notify ArrayInitRemover that it should reset the state of being in a NewArray when visiting children of the ArrayInit. This is to ensure correctness of rewriting multidimensional array declarations such as
int[][] m = new int[][]{ { 1 }, { 2 }, { 3 } };
which should be rewritten to
int[][] m = new int[][]{ new int[]{ 1 },
                         new int[]{ 2 },
                         new int[]{ 3 } };
The method removeArrayInitEnter for CArrayArrayInitExt is as follows:
    @Override
    public ArrayInitRemover removeArrayInitEnter(ArrayInitRemover v) {
        if (v.inNewArray()) {
            // If this ArrayInit is a direct child of NewArray, undo this fact
            // as we are entering the scope of ArrayInit.
            return v.inNewArray(false);
        }
        return v;
    }
Now, onto the actual rewriting. If the ArrayInit currently visited is a direct child of a NewAray, skip the rewriting and invoke the superclass functionality, which currently does nothing but could be overridden by extensions of CArray. This results in the following skeletion of method removeArrayInit:
    @Override
    public Node removeArrayInit(ArrayInitRemover v) {
        if (!v.inNewArray()) {
            // TODO: Rewrite.
        }
        return super.removeArrayInit(v);
    }
The rewriting creates a new NewArray node with the ArrayInit being visited as its initializer. The type of the array creation expression is the type of the array initializer. Here is the signature of the node factory method for NewArray:
    NewArray NewArray(Position pos, TypeNode base, int addDims, ArrayInit init);
where base is the type node for the ultimate base of the array type, namely, the type before all the pairs of square brackets, and addDims is the number of pairs of square brackets. For example, the ultimate base of int[][] is int, and its additional dimensions is 2.
First, we will obtain all of these arguments:
            ArrayInit n = (ArrayInit) node();
            ArrayType type = n.type().toArray();
            Type ultimateBase = type.ultimateBase();
            int dims = type.dims();
Next, we create the desired NewArray using the node factory provided by the node visitor. We also wrap ultimateBase with a canonical type node:
            NodeFactory nf = v.nodeFactory();
            NewArray result =
                    nf.NewArray(n.position(),
                                nf.CanonicalTypeNode(Position.COMPILER_GENERATED,
                                                     ultimateBase),
                                dims,
                                n);
Finally, we set the type of the NewArray we just created to the type of the array initializer from which we rewrote:
            return result.type(type);
This results in the following complete implementation of CArrayArrayInitExt:

Exercise

Make sure these extension classes are recognized by the Polyglot frontend when the CArray extension is run.
Hint: + Reveal...
Solution: + Reveal...

Scheduler

We have implemented a new compiler pass above, but we have yet to include it in the list of passes to be run by Polyglot. To do so, we must modify the scheduler so that Polyglot is able to determine when the new pass should be run. First, let us create the skeletal scheduler for CArray. This scheduler extends the scheduler for compiling Java 1.4 programs:
package carray;

import polyglot.frontend.ExtensionInfo;
import polyglot.frontend.JLScheduler;

/**
 * {@code CArraySchedule} extends the base scheduler to handle translations of
 * CArray programs to Java.
 */
public class CArrayScheduler extends JLScheduler {
    public CArrayScheduler(ExtensionInfo extInfo) {
        super(extInfo);
    }

    // TODO: Add the rewriting pass.
}
Polyglot completes all the compiler passes by reaching all the goals defined in the scheduler. Each goal defines the pass to be run and other goals that are its prerequisites. That is, the prerequisite goals must be completed first before the current goal can complete. The list of goals that must be run for a compilation unit constitutes a job. A job encapsulates work done by the compiler for a single compilation unit, and contains all information for a particular compilation unit carried between phases of the compiler. Only one pass should be run over a job at a time.
To add a pass for the job, create a method that returns a goal that the job must achieve. For CArray, we will create a goal that is achieved when the array initializers have been removed from the AST. We start with the skeleton of this goal factory method:
    /**
     * Return a goal that removes all array initializers (ArrayInit) that are
     * not part of an array instance creation expression (NewArray).
     * @param job
     * @return
     */
    public Goal RemoveArrayInit(Job job) {
        // TODO: Implement this method.
    }
We are creating a VisitorGoal for the given job, where this goal can be achieved by running a specified node visitor. In our case, the node visitor that will achieve the goal is ArrayInitRemover:
        ExtensionInfo extInfo = job.extensionInfo();
        TypeSystem ts = extInfo.typeSystem();
        NodeFactory nf = extInfo.nodeFactory();
        Goal g = new VisitorGoal(job, new ArrayInitRemover(job, ts, nf));
(In general, a Goal is achieved by the creation and running of a Pass; see the createPass(ExtensionInfo) method of Goal and the class polyglot.frontend.Pass. Many goals are achieved by running a visitor over an AST, and so the VisitorGoal class provides a convenient way to specify the visitor that will achieve the goal.)
Next, we add the prerequisite goals that must be achieved before the pass defined in this goal can be run. As seen above, ArrayInitRemover requires that the AST have type information so that the type of array initializers can be determined when being rewritten. As a result, the AST must have been type checked. We must also make sure that there is no circular dependency; if such a cycle occurs, a CyclicDependencyException is thrown:
        try {
            // Make sure we have type information before we translate things.
            g.addPrerequisiteGoal(TypeChecked(job), this);
        }
        catch (CyclicDependencyException e) {
            throw new InternalCompilerError(e);
        }
Finally, the created goal must be interned before returned. This is because multiple different goals might have the same goal as a prerequisite. Interning ensures that the goal factory methods always return the same goal for the same job:
        return internGoal(g);
Here is the implementation of CArrayScheduler so far:
 1 package carray;
 2 
 3 import polyglot.ast.NodeFactory;
 4 import polyglot.frontend.CyclicDependencyException;
 5 import polyglot.frontend.ExtensionInfo;
 6 import polyglot.frontend.JLScheduler;
 7 import polyglot.frontend.Job;
 8 import polyglot.frontend.goals.Goal;
 9 import polyglot.frontend.goals.VisitorGoal;
10 import polyglot.types.TypeSystem;
11 import polyglot.util.InternalCompilerError;
12 import carray.visit.ArrayInitRemover;
13 
14 /**
15  * {@code CArraySchedule} extends the base scheduler to handle translations of
16  * CArray programs to Java.
17  */
18 public class CArrayScheduler extends JLScheduler {
19     public CArrayScheduler(ExtensionInfo extInfo) {
20         super(extInfo);
21     }
22 
23     /**
24      * Return a goal that removes all array initializers (ArrayInit) that are
25      * not part of an array instance creation expression (NewArray).
26      * @param job
27      * @return
28      */
29     public Goal RemoveArrayInit(Job job) {
30         ExtensionInfo extInfo = job.extensionInfo();
31         TypeSystem ts = extInfo.typeSystem();
32         NodeFactory nf = extInfo.nodeFactory();
33         Goal g = new VisitorGoal(job, new ArrayInitRemover(job, ts, nf));
34         try {
35             // Make sure we have type information before we translate things.
36             g.addPrerequisiteGoal(TypeChecked(job), this);
37         }
38         catch (CyclicDependencyException e) {
39             throw new InternalCompilerError(e);
40         }
41         return internGoal(g);
42     }
43 }
We are almost done with adding a compiler pass. The only missing link is that the Polyglot frontend does not know about the scheduler we just created yet. The extension information will complete this missing link.

Extension information

Extension information is the main interface for defining language extensions. It defines the type system, AST node factory, parser, and other parameters of a language extension. Each extension information is a subclass of ExtensionInfo defined in package polyglot.frontend. The Polyglot frontend will load an ExtensionInfo at startup.
The extension generator already created a template ExtensionInfo:

In addition to the parser, node factory, and type system, the template extension
information also defines the default source file extension and the compiler
name that can be used in a command line as the entry point to the Polyglot
frontend.
One of the parameters an ExtensionInfo defines is the scheduler associated with the language extension. Method createScheduler instantiates the scheduler, and must be overridden for any new scheduler to be recognized by the Polyglot frontend when running the language extension. For CArray, we supply a CArrayScheduler created above:
    @Override
    public Scheduler createScheduler() {
        return new CArrayScheduler(this);
    }
For the next and final two sections, scheduler and extension information will be modified further to accommodate more changes to the language extension.

Valid HTML 4.01 Transitional Valid CSS!