Translating Language Constructs

Polyglot generates Java files as the output of a compilation. Consequently, any new language construct must be either properly translated to Java features or serialized from a type object in order to retain the original features. For the CArray extension, the feature of interest is the const keyword.
The readers might wonder why our implementation of CArray so far manages to pass all the test cases, which means that for the test cases containing no errors, Polyglot is able to generate Java code from them. The truth is that constant array types have been silently translated to traditional array types because the const keyword is never printed during code generation. This behavior occurs because ConstArrayType_c inherits the method that translates types to strings used by the code generator. While this translate is acceptable, silent translations often result to unexpected errors during code generation. This section shows a more general way to translate language features.
There are two possible translations from CArray to Java:
  1. Type erasure: Effectively, the const keyword is removed from the language, coercing constant arrays to traditional arrays. This is in fact safe because any invalid CArray program will have resulted in a compile-time error before the erasure, so no code is generated in that case. One needs to be careful, however, when mixing Java files with translated CArray files.
  2. Type preservation: An implementation of probably-more-complicated representation of the original features in the target language is devised. It turns out that constant arrays can be represented more verbosely in Java 5 using generics. The semantics of constant arrays remains the same, e.g., constant arrays cannot be assigned to nonconstant arrays.
These two translations will be guided through this section. In addition, the second, more involved translation will also introduce the quasi-quoter, which simplifies the translation by offering the ability to create AST nodes by parsing code fragments and existing AST nodes.
The following are general changes to the compiler implementation required to translate language features:

Type erasure translation

First, let us correct the translation method for constant-array types. The method translate must be overridden so that the keyword const is output when a constant-array type is encountered during code generation. The implementation follows that of toString, except that we need to invoke translate on the base type. This difference is similar to that of the implementations of equalsImpl and typeEqualsImpl:
    @Override
    public String translate(Resolver c) {
        String result = base.translate(c);
        if (base instanceof ConstArrayType) {
            // If base is also a ConstArrayType, the keyword "const" would
            // have been displayed, so just add additional dimension here.
            result += "[]";
        }
        else result += " const[]";
        return result;
    }
This results in the following complete and correct implementation of ConstArrayType_c:
ConstArrayType_c.java
Most of the test cases should now fail, because const now appears in the output Java files. Now we will add a compiler pass that removes constant-array types from the AST to make the output files compilable again.

Adding translation pass

Translating language features is essentially changing the language extension of the program being translated. Polyglot provides class ExtensionRewriter in package polyglot.translate that provides basic infrastructure for converting ASTs to a different language extension.
ExtensionRewriter defines method typeToJava that converts a canonical type in the source extension to an ambiguous type node in the target extension. We will need to override this method to translate constant-array types. The overriding implementation will reside in class CArrayRewriter in package carray.translate. Let us begin with its skeleton:
package carray.translate;

import polyglot.ast.TypeNode;
import polyglot.frontend.ExtensionInfo;
import polyglot.frontend.Job;
import polyglot.translate.ExtensionRewriter;
import polyglot.types.SemanticException;
import polyglot.types.Type;
import polyglot.util.Position;

/**
 * {@code CArrayRewriter} extends {@code ExtensionRewriter} to provide
 * translations for array types when a CArray program is translated into Java.
 */
public class CArrayRewriter extends ExtensionRewriter {
    public CArrayRewriter(Job job, ExtensionInfo from_ext, ExtensionInfo to_ext) {
        super(job, from_ext, to_ext);
    }

    @Override
    public TypeNode typeToJava(Type t, Position pos) throws SemanticException {
        // TODO: Implement this method.
    }
}
typeToJava is invoked for every canonical type node encountered in the source AST. For CArray whenever we encounter a constant-array type, we must translate it to an ArrayTypeNode of the target language, where the base type of the resulting array type node is a recursive call to typeToJava on the base type of the constant-array type. The node factory of the target language is accessible via method to_nf.
This above translation definition results in the following implementation of typeToJava:
    @Override
    public TypeNode typeToJava(Type t, Position pos) throws SemanticException {
        // We coerce constant arrays to traditional arrays by
        // translating to Java 1.4.
        if (t instanceof ConstArrayType) {
            // X const[] → X[]
            ConstArrayType at = (ConstArrayType) t;
            Type base = at.base();
            NodeFactory nf = to_nf();
            return nf.ArrayTypeNode(pos, typeToJava(base, base.position()));
        }
        return super.typeToJava(t, pos);
    }
For types other than constant-array types, the method invokes the superclass implementation, which deals with them.

Modifying scheduler

The scheduler must be modified so that the translator is among one of the passes. First, we must preserve all the constructs that we are translating so that they can be recovered when the compiler reads a CArray program from bytecode format. That is, language constructs must be serialized before the translation. We add an empty goal PreRemoveCArray which has the goal Serialized as a prerequisite:
    /**
     * Return a goal that must be accomplished before CArray features are
     * translated.
     * @param job
     * @return
     */
    public Goal PreRemoveCArray(Job job) {
        Goal g = new EmptyGoal(job, "PreRemoveCArray");
        try {
            // Make sure we serialize before changing things.
            g.addPrerequisiteGoal(Serialized(job), this);
        }
        catch (CyclicDependencyException e) {
            throw new InternalCompilerError(e);
        }
        return internGoal(g);
    }
Next, we create a translation goal named RemoveCArray, i.e., a visitor goal achieved by running CArrayRewriter. This goal has PreRemoveCArray as a prerequisite:
    /**
     * Return a goal that wraps up the translation of CArray features by
     * converting AST nodes to the target language.
     * @param job
     * @return
     */
    public Goal RemoveCArray(Job job) {
        Goal g =
                new VisitorGoal(job,
                                new CArrayRewriter(job,
                                                   extInfo,
                                                   extInfo.outputExtensionInfo()));
        try {
            g.addPrerequisiteGoal(PreRemoveCArray(job), this);
        }
        catch (CyclicDependencyException e) {
            throw new InternalCompilerError(e);
        }

        return internGoal(g);
    }
After the translation, we would like to pass the translated AST to the compiler of the target language to check that the translation is indeed free of errors. That is, the AST should not be output yet. We override the goal CodeGenerated to be an empty goal without outputting any code. RemoveCArray is a prerequisite of this goal:
    @Override
    public Goal CodeGenerated(Job job) {
        // Because we want the target language to compile our
        // translation, do not generate code now.
        Goal g = new EmptyGoal(job, "CodeGenerated");
        // Add a prerequisite goal to translate CArray features.
        try {
            g.addPrerequisiteGoal(RemoveCArray(job), this);
        }
        catch (CyclicDependencyException e) {
            throw new InternalCompilerError(e);
        }
        return internGoal(g);
    }
Finally, we invoke the compiler of the target language to compile the translated AST by overriding method runToCompletion. After a successful run of all passes in the current extension, create a compilation job for each source file the current extension has created, and run all the passes defined by the output extension on these jobs:
    @Override
    public boolean runToCompletion() {
        boolean complete = super.runToCompletion();
        if (complete) {
            // Call the compiler for output files to compile our translated
            // code.
            ExtensionInfo outExtInfo = extInfo.outputExtensionInfo();
            Scheduler outScheduler = outExtInfo.scheduler();

            // Create a goal to compile every source file.
            for (Job job : outScheduler.jobs()) {
                Job newJob = outScheduler.addJob(job.source(), job.ast());
                outScheduler.addGoal(outExtInfo.getCompileGoal(newJob));
            }
            return outScheduler.runToCompletion();
        }
        return complete;
    }
Notice that this method depends on method outputExtensionInfo defined in the extension information. We will provide a correct implementation of this method below to complete the translation.

Setting output extension information

The extension information defines method outputExtensionInfo, which produces an extension information object for the output language this extension translates to. By default, this method returns null because the output language is Java; in this case, the Java compiler can compile the output files. Nevertheless, whenever a language extension does some kind of a translation, it is a good practice to define the output extension information so that all compilation jobs are contained within the Polyglot framework.
First, we will define a field that will host the output extension information object:
    /**
     * The ExtensionInfo for the target language when we are translating CArray
     * features.
     */
    protected polyglot.frontend.ExtensionInfo outputExtensionInfo;
Now, we override method outputExtensionInfo to return the Java 1.4 output extension information (JLOutputExtensionInfo), which differs from the Java 1.4 extension information (JLExtensionInfo) in that parsing is skipped, as the compiler is already given the AST from the current extension. The field created above caches this output extension information so it is created only once:
    @Override
    public polyglot.frontend.ExtensionInfo outputExtensionInfo() {
        if (this.outputExtensionInfo == null) {
            this.outputExtensionInfo = new JLOutputExtensionInfo(this) {
                @Override
                protected Options createOptions() {
                    Options options = super.createOptions();
                    // We already serialized when erasing constant arrays,
                    // so don't do it again.
                    options.serialize_type_info = false;
                    return options;
                }
            };
        }
        return outputExtensionInfo;
    }
Notice that the output extension information object is in fact an anonymous subclass of JLOutputExtensionInfo, overriding method createOptions so that serialization in the target-language compiler is skipped. This is because type information can be serialized only once, and the CArray compiler has already done that prior to the translation.
This completes the erasure translation. All the test cases should pass again.

Type preservation translation

The translation by type erasure above might not be desirable because a Java program could refer to the translated program and manipulate formerly constant arrays in an unexpected way, effectively making their elements mutable. We now present another translation that preserves semantics, maintaining a difference between constant and traditional arrays.
We start by describing the translation to Java 5. Then, we show how to provide the users of the compiler with an option to choose the desired translation. Further steps are required to change the scheduler and output extension information to accommodate the new translation. Finally, we show how to implement the translation using quasi-quoting, which is able to parse code fragment and saves us time from using the node factory to create AST nodes manually.

Translation definition

We will translate arrays to instances of one of the static nested classes in class CArray shown here:
CArray.java
Download this file and save it to directory runtime/src/carray. CArray's Ant build will include this file during build automatically. Also, add directory runtime/src to build path in Eclipse to make the test harness work with the translation.
A constant array of non-primitive base type T is translated to an instance of class CArray.ConstArray<? extends T>. A traditional array of non-primitive base type T is translated to an instance of class CArray.Array<T>. For example, String const[][] is translated to ConstArray<? extends ConstArray<? extends String>>, and Object[] is translated to Array<Object>. For primitive base types, the translation is similar, but uses different classes having the suffix of primitive types. For example, int const[] is translated to ConstArray_int.
Array accesses are translated to invocations of method get, and assignments to array elements are translated to invocations of method set. Array creations expressions are translated to invocations of method init.
As shown, this class is incomplete but is sufficient for translating all the test cases. The readers are invited to implement the remaining classes as an exercise.

Adding command-line options

Polyglot accepts command-line options as implemented in a subclass of Option defined in package polyglot.main. To provide the users with an option of choosing a desired translation, we will add an option flag translateCArray in class CArrayOptions, defined in package carray:
CArrayOptions.java
This class defines a new field translateCArray to indicate the user's selection. Method populateFlags adds the new flag option by providing a list of possible strings that turns on the flag, along with the description of the flag. Method handleArg sets the declared field whenever one of the possible strings is mentioned in the command line.
Notice that the field is declared public so that the option is accessible throughout the extension. In fact, we will access this field in the scheduler to set the appropriate prerequisite goals, in the extension information to set the appropriate output extension information, and in the rewriter to return type nodes translated to the correct output language.

Modifying scheduler

The only modification to the scheduler is in goal PreRemoveCArray by adding RemoveArrayInit that we implemented in the last section as a prerequisite goal if the flag translateCArray is enabled:
    /**
     * Return a goal that must be accomplished before CArray features are
     * translated.
     * @param job
     * @return
     */
    public Goal PreRemoveCArray(Job job) {
        Goal g = new EmptyGoal(job, "PreRemoveCArray");
        try {
            // Make sure we serialize before changing things.
            g.addPrerequisiteGoal(Serialized(job), this);
            Options opts = extInfo.getOptions();
            if (opts instanceof CArrayOptions) {
                CArrayOptions options = (CArrayOptions) opts;
                if (options.translateCArray) {
                    // Remove array initializers only if we are preserving type
                    // tests, i.e., translating to Java 5.
                    g.addPrerequisiteGoal(RemoveArrayInit(job), this);
                }
            }
        }
        catch (CyclicDependencyException e) {
            throw new InternalCompilerError(e);
        }
        return internGoal(g);
    }
This results in the final implementation of CArrayScheduler:
CArrayScheduler.java

Modifying output extension information

Method outputExtensionInfo is modified to return the output extension information object for Java 5 instead if the flag translateCArray is enabled. Notice that we can serialize type information again, because the JL5 extension uses a different serializer:
    @Override
    public polyglot.frontend.ExtensionInfo outputExtensionInfo() {
        if (this.outputExtensionInfo == null) {
            CArrayOptions options = (CArrayOptions) this.getOptions();
            if (options.translateCArray)
                this.outputExtensionInfo = new JL5OutputExtensionInfo(this);
            else this.outputExtensionInfo = new JLOutputExtensionInfo(this) {
                @Override
                protected Options createOptions() {
                    Options options = super.createOptions();
                    // We already serialized when erasing constant arrays,
                    // so don't do it again.
                    options.serialize_type_info = false;
                    return options;
                }
            };
        }
        return outputExtensionInfo;
    }
This results in the final implementation of ExtensionInfo:
ExtensionInfo.java

Quasi-quoting

The quasiquoter for Java 1.4 is implemented in class polyglot.qq.QQ, which provides methods for parsing strings into ASTs. To use the class, invoke one of the parseT methods to create an AST node of type T.
Each parseT method takes a format string as its first argument, followed by zero or more additional Object arguments. Each pattern in the format string is matched with its corresponding Object. The format string may contain the following patterns:
Pattern Expected substitution
%s String (parsed as an identifier)
%T Type or TypeNode
%E Expr
%S Stmt
%D ClassDecl
%M ClassMember
%F Formal
%LT List<Type> or List<TypeNode>
%LE List<Expr>
%LS List<Stmt>
%LD List<ClassDecl>
%LM List<ClassMember>
%LF List<Formal>
These patterns are recognized as tokens by the lexer, surrounding the token with whitespace or parentheses may be needed to parse the string.
For example:
    Expr e;
    TypeNode t;
    Stmt s = qq.parseStmt("%T %s = new %T(%E);", t, "tmp", t, e);
Java 5 also has a quasiquoter located in package polyglot.ext.jl5.qq. As we will see below, this quasiquoter can only handle a handful of Java 5 features; unsupported features must be constructed using the node factory for Java 5.

Implementing translation

Modifying the rewriter
To implement the translation, we again return to CArrayRewriter and modify method typeToJava to return different type nodes depending on the option flag:
CArrayRewriter.java
Notice the use of quasiquoter to parse translated array class types. The rewriter also mirrors the option flag so that extension classes can access it when determining whether to preserve semantics when translating AST nodes.
Adding translation methods in extension classes
ExtensionRewriter invokes method extRewriteEnter before visiting children of an AST node, and method extRewrite when leaving the AST node. We will override these methods in several extension classes to complete the translation.
First, we must import the translated representation class CArray. This is done by creating extension class CArraySourceFileExt:
CArraySourceFileExt.java
Next, we translate array creation expressions by overriding method extRewrite in extension class CArrayNewArrayExt:
CArrayNewArrayExt.java
In the code above, initArg is the argument to method init defined in the translated representation class CArray. A complication of the translation of array creation expressions is that generic arrays cannot be instantiated when multidimensional arrays are translated. As a result, arrays of erasure types are created first and then cast to the desired type. For example, translating
String const[][] csm = { { "Hello", "World" },
                         { "Kitty", "Cup" } };
gives
ConstArray<? extends ConstArray<? extends String>> csm =
        Array.init((ConstArray<? extends String>[]) new ConstArray[]{ Array.init(new String[]{ "Hello", "World" }),
                                                                      Array.init(new String[]{ "Kitty", "Cup" }) });

Exercise

Translate array accesses to invocations of method get in class CArray. For example, ak[47] should translate to ak.get(47)
Solution: + Reveal...

Exercise

Modify the extension factory to create instances of extension classes we have created in this section.
Solution: + Reveal...
This results in the final extension factory for CArray: + Reveal...
Now, all the translation test cases should pass.