Adding Compiler Passes
Most compilers for language extensions require additional compiler passes to
perform an analysis that does not exist in the base language, or to prepare the
AST for later phases of the compiler, such as translation to other languages.
This section explores how new compiler passes can be added to the Polyglot
framework.
In the upcoming section, we will add two translation passes to the
CArray
extension. One of these passes translates a
CArray
program to Java 5 in a way that preserves the semantics of
constant arrays. This translation becomes easier if the AST contains no
array initializers. Therefore, in this section, we will add a pass to rewrite
all array initializers to array creation expressions to prepare the AST for this
translation.
Several steps are involved to have new compiler passes up and running:
- Implement components that make up a compiler pass.
- Modify the scheduler to incorporate the new compiler pass into the list of passes to be run.
- Notify the extension information of the modified scheduler.
Implementing a compiler pass
Adding a compiler pass involves implementing several mutually dependent
components at once. We will proceed from the least dependent component to the
most:
- Define a new operation on the AST corresponding to the new compiler pass.
- Define a node visitor for traversing the AST to perform the pass.
- Extend the language dispatcher to select appropriate AST node or extension object containing the suitable implementation of the defined AST operation.
CArray
extension, we will create a pass that rewrites all
array initializers to array creation expressions. A node visitor called
ArrayInitRemover
will be created for AST traversal. This visitor
invokes methods removeArrayInitEnter
and
removeArrayInit
that we will define for the AST. The first method
prepares the visitor prior to visiting the node itself. The second method
transforms the AST node according to the desired translation. In other words,
we are adding another row to the table of visitors and methods they dispatch to:
Visitor | enter |
leave |
override |
---|---|---|---|
ArrayInitRemover |
removeArrayInitEnter |
removeArrayInit |
N/A |
Adding operations on the AST
To add operations on the AST, we first create an interface defining their method
signatures. For
CArray
, this is interface CArrayOps
:
Because these operations are applicable to every AST node in the
CArray
extension, we modify the default extension class for
CArray
, namely, CArrayExt
, to implement
CArrayOps
and provide the default implementation of these
operations:
For
removeArrayInitEnter
, the default is to leave the visitor as
is; the method simply returns the given visitor. For
removeArrayInit
, the default is to leave the AST node unmodified;
the method simply returns the AST node associated with the extension object by
invoking method node
.
We will need to override these methods for AST nodes more involved in the
translation. For now, let us turn to the other components of the compiler pass
to make the code compile.
Node visitor
Every node visitor is a subclass of
NodeVisitor
, defined in package
polyglot.visit
. For ArrayInitRemover
, we will extend
ContextVisitor
, which maintains a context throughout the visitor's
pass, and is also the base class of the disambiguation and type checking
visitors. Here is the skeleton:
package carray.visit; import polyglot.visit.ContextVisitor; /** * A visitor that translates array initializers to array creation expression. * Useful for translating carray to Java 5. */ public class ArrayInitRemover extends ContextVisitor { // TODO: Implement this class. }First, we create a constructor that simply invokes the superclass constructor:
/* Also add these imports: * import polyglot.ast.NodeFactory; * import polyglot.frontend.Job; * import polyglot.types.TypeSystem; */ public ArrayInitRemover(Job job, TypeSystem ts, NodeFactory nf) { super(job, ts, nf); }
Next, we override method
lang
, which returns the language
dispatcher associated with this visitor, to cast the dispatcher type, as we know
that the dispatcher associated with ArrayInitRemover
is always
CArrayLang
, the language dispatcher for CArray
and its
extensions:
/* Also add these imports: * import carray.ast.CArrayLang; */ @Override public CArrayLang lang() { return (CArrayLang) super.lang(); }Now, we override variants of methods
enter
and leave
defined in NodeVisitor
. The error-handling visitor, extended by
the context visitor overrides the above two methods with error-handling code,
and invokes methods enterCall
and leaveCall
, which
replace the overridden enter
and leave
. We will
override these two methods accordingly:
/* Also add these imports: * import polyglot.ast.Node; * import polyglot.visit.NodeVisitor; */ @Override public ArrayInitRemover enterCall(Node n) { // Prepare the visitor for before visiting n by letting each AST node // decide what to do with the visitor. return lang().removeArrayInitEnter(n, this); } @Override public Node leaveCall(Node old, Node n, NodeVisitor v) { // After visiting old's children, the results of which are now children // of n, visit n itself to perform the operation that n's class defines // for this compiler pass. return lang().removeArrayInit(n, this); }Node visitors use the language dispatcher to redirect the invocation of the AST operation to the appropriate AST node or extension object. We will explore in more detail how language dispatchers work below.
The implementation of
ArrayInitRemover
is not yet complete due to
the structure of the AST. We only want to rewrite array initializers that are
not part of an array creation expression. For example, we want to rewrite the
right-hand side of the declaration int[] a = { 1, 2, 3 };
, but not
of the declaration int[] a = new int[]{ 1, 2, 3 };
. In fact, the
rewriting pass we are implementing translates the first declaration to the
second.
To keep track whether to translate a given array initializer,
ArrayInitRemover
maintains a boolean state whether the current AST
node being visited is an array creation expression, an instance of
NewArray
. If so, when visiting an array initializer, an instance
of ArrayInit
, the rewriting is skipped. Nevertheless, an array
initializer that is a child of an array creation expression could contain
an array initializer, so when visiting the children of an array initializer,
the boolean state must be reset. AST nodes will query this state to determine
whether rewriting is required.
We implement this state using instance variable
inNewArray
. We
also add an accessor method:
protected boolean inNewArray = false; public boolean inNewArray() { return inNewArray; }The state can be set by the following method, which makes a copy of the visitor first to maintain immutability of the visitor. (Visitors can be mutable; in fact,
ArrayInitRemover
can. This tutorial illustrates use of a
utility class that copies an object.) The state is then set accordingly, and
the updated visitor is returned:
/* Also add these imports: * import polyglot.util.Copy; */ public ArrayInitRemover inNewArray(boolean inNewArray) { ArrayInitRemover v = Copy.Util.copy(this); v.inNewArray = inNewArray; return v; }Methods in the AST will invoke the above mutator as appropriate, as shown below.
Here is the final implementation of
ArrayInitRemover
:
Language dispatcher
The data structure that represents an AST node is a chain of the base AST node
and zero or more extension objects. For a given language, one of these
instances in the chain contains the correct implementation of AST operations for
that language. Polyglot must be able to locate this instance correctly, or
language extensions will not work as expected.
A language dispatcher is responsible for locating an instance
containing the correct implementation of AST operations. For convenience, let
a node instance denote one of the instances in a chain of objects that
represents an AST node. Each node instance belongs to a unique language
extension within the chain. A language dispatcher traverses the chain to find
the node instance matching its language extension and invokes the requested AST
operation on it.
The extension generator already created a template language dispatcher for
CArray
:
Given an AST node, finding the correct node instance is the same. Therefore, only one instance of a language dispatcher is required per language extension. In other words, language dispatchers are singleton. The static method
lang
is for future developments and can be ignored for now. Method
carrayExt
finds the node instance for the CArray
extension. Method NodeOps
overrides one in JLang_c
,
the superclass of CArrayLang_c
, to ensure that existing AST
operations are redirected to the implementation in CArray
.
To add dispatching methods for new AST operations, we first modify the template
interface
CArrayLang
with the method signatures:
Each dispatch method is like the method for AST, but with the addition of the first parameter being the AST node to be dispatched on.
Next, we implement these methods in
CArrayLang_c
. The
implementation of removeArrayInitEnter
and
removeArrayInit
are in instances of CArrayOps
, which
is implemented by CArrayExt
. Therefore, we need a method similar
to NodeOps
that finds a node instance of CArrayOps
:
/** * Return an object that implements CArray operations appropriate for this * language. * @param n * @return */ protected CArrayOps CArrayOps(Node n) { return carrayExt(n); }It turns out the implementation of
NodeOps
and
CArrayOps
are the same because an instance of
CArrayExt
contains the correct implementation of operations defined
in both NodeOps
and CArrayOps
.
Finally, we are ready to implement the dispatch methods. For each dispatch
method, the dispatching pattern is identical regardless of the language
extension; that is, we always dispatch to the same method with the same
arguments. The only difference is the node instance we dispatch to; that is,
the receiver object of the method depends on the language extension. Since we
have already refactored the determination of the receiver object into method
CArrayOps
, the body of dispatch methods are always the same, and
hence they can be declared final:
@Override public final ArrayInitRemover removeArrayInitEnter(Node n, ArrayInitRemover v) { return CArrayOps(n).removeArrayInitEnter(v); } @Override public final Node removeArrayInit(Node n, ArrayInitRemover v) { // We always dispatch to the same method regardless of the language, but // the method _implementation_ we dispatch to will vary depending on the // language. The receiver object, i.e., CArrayOps(n), will determine // the appropriate implementation. Overriding method CArrayOps will // change the appropriate receiver object, so this method can be // declared final. return CArrayOps(n).removeArrayInit(v); }Extensions of
CArray
will override node instance finders such as
methods NodeOps
and CArrayOps
, which is enough for
Polyglot to locate the correct node instance.
Here is the completed implementation of
CArrayLang_c
:
Putting it all together
So far, we have implemented the minimally required infrastructure for a new
compiler pass without actually transforming the AST. Now we will override
the AST operations we defined for certain AST nodes that require rewriting. In
particular, for
CArray
, the AST nodes involved are array
initializers and array creation expressions.
First, we will deal with array creation expressions, as the steps involved are
easier. All the node instance for
NewArray
has to do is to notify
ArrayInitRemover
that it is about to visit a NewArray
.
We implement this notification by overriding method
removeArrayInitEnter
in extension class
CArrayNewArrayExt
, resulting in the following implementation:
1 package carray.ast; 2 3 import polyglot.util.SerialVersionUID; 4 import carray.visit.ArrayInitRemover; 5 6 public class CArrayNewArrayExt extends CArrayExt { 7 private static final long serialVersionUID = SerialVersionUID.generate(); 8 9 @Override 10 public ArrayInitRemover removeArrayInitEnter(ArrayInitRemover v) { 11 // Update the visitor to record that we are entering a 12 // NewArray node. 13 return v.inNewArray(true); 14 } 15 }
Next, we will deal with array initializers. In contrast with array creation
expressions, the node instance for
ArrayInit
that is a child of
a NewArray
has to notify ArrayInitRemover
that it
should reset the state of being in a NewArray
when visiting
children of the ArrayInit
. This is to ensure correctness of
rewriting multidimensional array declarations such as
int[][] m = new int[][]{ { 1 }, { 2 }, { 3 } };which should be rewritten to
int[][] m = new int[][]{ new int[]{ 1 }, new int[]{ 2 }, new int[]{ 3 } };The method
removeArrayInitEnter
for CArrayArrayInitExt
is as follows:
@Override public ArrayInitRemover removeArrayInitEnter(ArrayInitRemover v) { if (v.inNewArray()) { // If this ArrayInit is a direct child of NewArray, undo this fact // as we are entering the scope of ArrayInit. return v.inNewArray(false); } return v; }
Now, onto the actual rewriting. If the
ArrayInit
currently visited
is a direct child of a NewAray
, skip the rewriting and invoke the
superclass functionality, which currently does nothing but could be overridden
by extensions of CArray
. This results in the following skeletion
of method removeArrayInit
:
@Override public Node removeArrayInit(ArrayInitRemover v) { if (!v.inNewArray()) { // TODO: Rewrite. } return super.removeArrayInit(v); }The rewriting creates a new
NewArray
node with the
ArrayInit
being visited as its initializer. The type of the array
creation expression is the type of the array initializer. Here is the signature
of the node factory method for NewArray
:
NewArray NewArray(Position pos, TypeNode base, int addDims, ArrayInit init);
where base
is the type node for the ultimate base of the
array type, namely, the type before all the pairs of square brackets, and
addDims
is the number of pairs of square brackets. For example,
the ultimate base of int[][]
is int
, and its
additional dimensions is 2.
First, we will obtain all of these arguments:
ArrayInit n = (ArrayInit) node();
ArrayType type = n.type().toArray();
Type ultimateBase = type.ultimateBase();
int dims = type.dims();
Next, we create the desired NewArray
using the node factory
provided by the node visitor. We also wrap ultimateBase
with a
canonical type node:
NodeFactory nf = v.nodeFactory(); NewArray result = nf.NewArray(n.position(), nf.CanonicalTypeNode(Position.COMPILER_GENERATED, ultimateBase), dims, n);Finally, we set the type of the
NewArray
we just created to the
type of the array initializer from which we rewrote:
return result.type(type);
This results in the following complete implementation of
CArrayArrayInitExt
:
Exercise
Make sure these extension classes are recognized by the Polyglot frontend when
the
CArray
extension is run.
Hint:
+ Reveal...
We just overrode operations on the AST. What needs to be done next?
Solution:
+ Reveal...
Add these two methods to
CArrayExtFactory_c
:
1 @Override 2 protected Ext extArrayInitImpl() { 3 // We want to remove top-level ArrayInit nodes before translation, so 4 // return an extension object specific to carray that defines the 5 // removal procedure. 6 return new CArrayArrayInitExt(); 7 } 8 9 @Override 10 protected Ext extNewArrayImpl() { 11 return new CArrayNewArrayExt(); 12 }
Scheduler
We have implemented a new compiler pass above, but we have yet to include it in
the list of passes to be run by Polyglot. To do so, we must modify the
scheduler so that Polyglot is able to determine when the new pass
should be run. First, let us create the skeletal scheduler for
CArray
. This scheduler extends the scheduler for compiling Java
1.4 programs:
package carray; import polyglot.frontend.ExtensionInfo; import polyglot.frontend.JLScheduler; /** * {@code CArraySchedule} extends the base scheduler to handle translations of * CArray programs to Java. */ public class CArrayScheduler extends JLScheduler { public CArrayScheduler(ExtensionInfo extInfo) { super(extInfo); } // TODO: Add the rewriting pass. }
Polyglot completes all the compiler passes by reaching all the goals
defined in the scheduler. Each goal defines the pass to be run and other goals
that are its prerequisites. That is, the prerequisite goals must be completed
first before the current goal can complete. The list of goals that must be run
for a compilation unit constitutes a job. A job encapsulates work done
by the compiler for a single compilation unit, and contains all information for
a particular compilation unit carried between phases of the compiler. Only one
pass should be run over a job at a time.
To add a pass for the job, create a method that returns a goal that the job must
achieve. For
CArray
, we will create a goal that is achieved when
the array initializers have been removed from the AST. We start with the
skeleton of this goal factory method:
/** * Return a goal that removes all array initializers (ArrayInit) that are * not part of an array instance creation expression (NewArray). * @param job * @return */ public Goal RemoveArrayInit(Job job) { // TODO: Implement this method. }We are creating a
VisitorGoal
for the given job, where this goal
can be achieved by running a specified node visitor. In our case,
the node visitor that will achieve the goal is
ArrayInitRemover
:
ExtensionInfo extInfo = job.extensionInfo(); TypeSystem ts = extInfo.typeSystem(); NodeFactory nf = extInfo.nodeFactory(); Goal g = new VisitorGoal(job, new ArrayInitRemover(job, ts, nf));
(In general, a
Goal
is achieved by the creation and
running of a Pass
; see the
createPass(ExtensionInfo)
method of Goal
and the class polyglot.frontend.Pass
. Many goals are achieved by running
a visitor over an AST, and so the VisitorGoal
class
provides a convenient way to specify the visitor that will achieve
the goal.)
Next, we add the prerequisite goals that must be achieved before the pass
defined in this goal can be run. As seen above,
ArrayInitRemover
requires that the AST have type information so that the type of array
initializers can be determined when being rewritten. As a result, the AST must
have been type checked. We must also make sure that there is no circular
dependency; if such a cycle occurs, a CyclicDependencyException
is
thrown:
try { // Make sure we have type information before we translate things. g.addPrerequisiteGoal(TypeChecked(job), this); } catch (CyclicDependencyException e) { throw new InternalCompilerError(e); }Finally, the created goal must be interned before returned. This is because multiple different goals might have the same goal as a prerequisite. Interning ensures that the goal factory methods always return the same goal for the same job:
return internGoal(g);
Here is the implementation of
CArrayScheduler
so far:
1 package carray; 2 3 import polyglot.ast.NodeFactory; 4 import polyglot.frontend.CyclicDependencyException; 5 import polyglot.frontend.ExtensionInfo; 6 import polyglot.frontend.JLScheduler; 7 import polyglot.frontend.Job; 8 import polyglot.frontend.goals.Goal; 9 import polyglot.frontend.goals.VisitorGoal; 10 import polyglot.types.TypeSystem; 11 import polyglot.util.InternalCompilerError; 12 import carray.visit.ArrayInitRemover; 13 14 /** 15 * {@code CArraySchedule} extends the base scheduler to handle translations of 16 * CArray programs to Java. 17 */ 18 public class CArrayScheduler extends JLScheduler { 19 public CArrayScheduler(ExtensionInfo extInfo) { 20 super(extInfo); 21 } 22 23 /** 24 * Return a goal that removes all array initializers (ArrayInit) that are 25 * not part of an array instance creation expression (NewArray). 26 * @param job 27 * @return 28 */ 29 public Goal RemoveArrayInit(Job job) { 30 ExtensionInfo extInfo = job.extensionInfo(); 31 TypeSystem ts = extInfo.typeSystem(); 32 NodeFactory nf = extInfo.nodeFactory(); 33 Goal g = new VisitorGoal(job, new ArrayInitRemover(job, ts, nf)); 34 try { 35 // Make sure we have type information before we translate things. 36 g.addPrerequisiteGoal(TypeChecked(job), this); 37 } 38 catch (CyclicDependencyException e) { 39 throw new InternalCompilerError(e); 40 } 41 return internGoal(g); 42 } 43 }
We are almost done with adding a compiler pass. The only missing link is that
the Polyglot frontend does not know about the scheduler we just created yet.
The extension information will complete this missing link.
Extension information
Extension information is the main interface for defining language extensions.
It defines the type system, AST node factory, parser, and other parameters of a
language extension. Each extension information is a subclass of
ExtensionInfo
defined in package polyglot.frontend
.
The Polyglot frontend will load an ExtensionInfo
at startup.
The extension generator already created a template
ExtensionInfo
:
In addition to the parser, node factory, and type system, the template extension information also defines the default source file extension and the compiler name that can be used in a command line as the entry point to the Polyglot frontend.
One of the parameters an
ExtensionInfo
defines is the scheduler
associated with the language extension. Method createScheduler
instantiates the scheduler, and must be overridden for any new scheduler to be
recognized by the Polyglot frontend when running the language extension. For
CArray
, we supply a CArrayScheduler
created above:
@Override public Scheduler createScheduler() { return new CArrayScheduler(this); }
For the next and final two sections, scheduler and extension information will be
modified further to accommodate more changes to the language extension.