Translating Language Constructs
Polyglot generates Java files as the output of a compilation. Consequently,
any new language construct must be either properly translated to Java features
or serialized from a type object in order to retain the original features. For
the
CArray
extension, the feature of interest is the
const
keyword.
The readers might wonder why our implementation of
CArray
so far
manages to pass all the test cases, which means that for the test cases
containing no errors, Polyglot is able to generate Java code from them. The
truth is that constant array types have been silently translated to traditional
array types because the const
keyword is never printed during code
generation. This behavior occurs because ConstArrayType_c
inherits the method
that translates types to strings used by the code generator. While this translate
is acceptable, silent translations often result to
unexpected errors during code generation. This section shows a more general way to
translate language features.
There are two possible translations from
CArray
to Java:
- Type erasure: Effectively, the
const
keyword is removed from the language, coercing constant arrays to traditional arrays. This is in fact safe because any invalidCArray
program will have resulted in a compile-time error before the erasure, so no code is generated in that case. One needs to be careful, however, when mixing Java files with translatedCArray
files. - Type preservation: An implementation of probably-more-complicated representation of the original features in the target language is devised. It turns out that constant arrays can be represented more verbosely in Java 5 using generics. The semantics of constant arrays remains the same, e.g., constant arrays cannot be assigned to nonconstant arrays.
The following are general changes to the compiler implementation required to
translate language features:
- A compiler pass performing the translation must be added.
- The scheduler must be modified to accommodate the new compiler pass.
- Optionally, the translated AST should be recompiled to ensure the correctness of the translation. This entails bypassing code generation from the current extension, and forwarding the translated AST to the target language extension for recompilation.
Type erasure translation
First, let us correct the translation method for constant-array types. The
method
translate
must be overridden so that the keyword
const
is output when a constant-array type is encountered during
code generation. The implementation follows that of toString
,
except that we need to invoke translate
on the base type. This
difference is similar to that of the implementations of equalsImpl
and typeEqualsImpl
:
@Override public String translate(Resolver c) { String result = base.translate(c); if (base instanceof ConstArrayType) { // If base is also a ConstArrayType, the keyword "const" would // have been displayed, so just add additional dimension here. result += "[]"; } else result += " const[]"; return result; }This results in the following complete and correct implementation of
ConstArrayType_c
:
ConstArrayType_c.javaMost of the test cases should now fail, because
const
now appears
in the output Java files. Now we will add a compiler pass that removes
constant-array types from the AST to make the output files compilable again.
Adding translation pass
Translating language features is essentially changing the language extension
of the program being translated. Polyglot provides class
ExtensionRewriter
in package polyglot.translate
that
provides basic infrastructure for converting ASTs to a different language
extension.
ExtensionRewriter
defines method typeToJava
that
converts a canonical type in the source extension to an ambiguous type node in
the target extension. We will need to override this method to translate
constant-array types. The overriding implementation will reside in class
CArrayRewriter
in package carray.translate
. Let us
begin with its skeleton:
package carray.translate; import polyglot.ast.TypeNode; import polyglot.frontend.ExtensionInfo; import polyglot.frontend.Job; import polyglot.translate.ExtensionRewriter; import polyglot.types.SemanticException; import polyglot.types.Type; import polyglot.util.Position; /** * {@code CArrayRewriter} extends {@code ExtensionRewriter} to provide * translations for array types when a CArray program is translated into Java. */ public class CArrayRewriter extends ExtensionRewriter { public CArrayRewriter(Job job, ExtensionInfo from_ext, ExtensionInfo to_ext) { super(job, from_ext, to_ext); } @Override public TypeNode typeToJava(Type t, Position pos) throws SemanticException { // TODO: Implement this method. } }
typeToJava
is invoked for every canonical type node encountered
in the source AST. For CArray
whenever we encounter a
constant-array type, we must translate it to an ArrayTypeNode
of
the target language, where the base type of the resulting array type node is
a recursive call to typeToJava
on the base type of the
constant-array type. The node factory of the target language is accessible via
method to_nf
.
This above translation definition results in the following implementation of
typeToJava
:
@Override public TypeNode typeToJava(Type t, Position pos) throws SemanticException { // We coerce constant arrays to traditional arrays by // translating to Java 1.4. if (t instanceof ConstArrayType) { // X const[] → X[] ConstArrayType at = (ConstArrayType) t; Type base = at.base(); NodeFactory nf = to_nf(); return nf.ArrayTypeNode(pos, typeToJava(base, base.position())); } return super.typeToJava(t, pos); }For types other than constant-array types, the method invokes the superclass implementation, which deals with them.
Modifying scheduler
The scheduler must be modified so that the translator is among one of the
passes. First, we must preserve all the constructs that we are translating so
that they can be recovered when the compiler reads a
CArray
program
from bytecode format. That is, language constructs must be serialized before
the translation. We add an empty goal PreRemoveCArray
which has
the goal Serialized
as a prerequisite:
/** * Return a goal that must be accomplished before CArray features are * translated. * @param job * @return */ public Goal PreRemoveCArray(Job job) { Goal g = new EmptyGoal(job, "PreRemoveCArray"); try { // Make sure we serialize before changing things. g.addPrerequisiteGoal(Serialized(job), this); } catch (CyclicDependencyException e) { throw new InternalCompilerError(e); } return internGoal(g); }Next, we create a translation goal named
RemoveCArray
, i.e., a
visitor goal achieved by running CArrayRewriter
. This goal has
PreRemoveCArray
as a prerequisite:
/** * Return a goal that wraps up the translation of CArray features by * converting AST nodes to the target language. * @param job * @return */ public Goal RemoveCArray(Job job) { Goal g = new VisitorGoal(job, new CArrayRewriter(job, extInfo, extInfo.outputExtensionInfo())); try { g.addPrerequisiteGoal(PreRemoveCArray(job), this); } catch (CyclicDependencyException e) { throw new InternalCompilerError(e); } return internGoal(g); }After the translation, we would like to pass the translated AST to the compiler of the target language to check that the translation is indeed free of errors. That is, the AST should not be output yet. We override the goal
CodeGenerated
to be an empty goal without outputting any code.
RemoveCArray
is a prerequisite of this goal:
@Override public Goal CodeGenerated(Job job) { // Because we want the target language to compile our // translation, do not generate code now. Goal g = new EmptyGoal(job, "CodeGenerated"); // Add a prerequisite goal to translate CArray features. try { g.addPrerequisiteGoal(RemoveCArray(job), this); } catch (CyclicDependencyException e) { throw new InternalCompilerError(e); } return internGoal(g); }Finally, we invoke the compiler of the target language to compile the translated AST by overriding method
runToCompletion
. After a successful run
of all passes in the current extension, create a compilation job for each source
file the current extension has created, and run all the passes defined by the
output extension on these jobs:
@Override public boolean runToCompletion() { boolean complete = super.runToCompletion(); if (complete) { // Call the compiler for output files to compile our translated // code. ExtensionInfo outExtInfo = extInfo.outputExtensionInfo(); Scheduler outScheduler = outExtInfo.scheduler(); // Create a goal to compile every source file. for (Job job : outScheduler.jobs()) { Job newJob = outScheduler.addJob(job.source(), job.ast()); outScheduler.addGoal(outExtInfo.getCompileGoal(newJob)); } return outScheduler.runToCompletion(); } return complete; }Notice that this method depends on method
outputExtensionInfo
defined in the extension information. We will provide a correct implementation
of this method below to complete the translation.
Setting output extension information
The extension information defines method
outputExtensionInfo
, which
produces an extension information object for the output language this extension
translates to. By default, this method returns null because the output language
is Java; in this case, the Java compiler can compile the output files.
Nevertheless, whenever a language extension does some kind of a translation, it
is a good practice to define the output extension information so that all
compilation jobs are contained within the Polyglot framework.
First, we will define a field that will host the output extension information
object:
/** * The ExtensionInfo for the target language when we are translating CArray * features. */ protected polyglot.frontend.ExtensionInfo outputExtensionInfo;Now, we override method
outputExtensionInfo
to return the Java 1.4
output extension information (JLOutputExtensionInfo
), which differs
from the Java 1.4 extension information (JLExtensionInfo
) in that
parsing is skipped, as the compiler is already given the AST from the current
extension. The field created above caches this output extension information so
it is created only once:
@Override public polyglot.frontend.ExtensionInfo outputExtensionInfo() { if (this.outputExtensionInfo == null) { this.outputExtensionInfo = new JLOutputExtensionInfo(this) { @Override protected Options createOptions() { Options options = super.createOptions(); // We already serialized when erasing constant arrays, // so don't do it again. options.serialize_type_info = false; return options; } }; } return outputExtensionInfo; }Notice that the output extension information object is in fact an anonymous subclass of
JLOutputExtensionInfo
, overriding method
createOptions
so that serialization in the target-language compiler
is skipped. This is because type information can be serialized only once, and
the CArray
compiler has already done that prior to the translation.
This completes the erasure translation. All the test cases should pass again.
Type preservation translation
The translation by type erasure above might not be desirable because a Java
program could refer to the translated program and manipulate formerly constant
arrays in an unexpected way, effectively making their elements mutable. We now
present another translation that preserves semantics, maintaining a difference
between constant and traditional arrays.
We start by describing the translation to Java 5. Then, we show how to provide
the users of the compiler with an option to choose the desired translation.
Further steps are required to change the scheduler and output extension
information to accommodate the new translation. Finally, we show how to
implement the translation using quasi-quoting, which is able to parse code
fragment and saves us time from using the node factory to create AST nodes
manually.
Translation definition
We will translate arrays to instances of one of the static nested classes in
class
CArray
shown here:
CArray.javaDownload this file and save it to directory runtime/src/carray.
CArray
's Ant build
will include this file during build automatically. Also, add directory
runtime/src to build path in Eclipse to make the test harness work with the
translation.
A constant array of non-primitive base type
T
is translated to an
instance of class CArray.ConstArray<? extends T>
. A
traditional array of non-primitive base type T
is translated to an
instance of class CArray.Array<T>
. For example,
String const[][]
is translated to
ConstArray<? extends ConstArray<? extends String>>
, and
Object[]
is translated to Array<Object>
. For
primitive base types, the translation is similar, but uses different classes
having the suffix of primitive types. For example, int const[]
is
translated to ConstArray_int
.
Array accesses are translated to invocations of method
get
, and
assignments to array elements are translated to invocations of method
set
. Array creations expressions are translated to invocations of
method init
.
As shown, this class is incomplete but is sufficient for translating all the
test cases. The readers are invited to implement the remaining classes as an
exercise.
Adding command-line options
Polyglot accepts command-line options as implemented in a subclass of
Option
defined in package polyglot.main
. To provide
the users with an option of choosing a desired translation, we will add an
option flag translateCArray
in class CArrayOptions
,
defined in package carray
:
CArrayOptions.javaThis class defines a new field
translateCArray
to indicate the
user's selection. Method populateFlags
adds the new flag option by
providing a list of possible strings that turns on the flag, along with the
description of the flag. Method handleArg
sets the declared field
whenever one of the possible strings is mentioned in the command line.
Notice that the field is declared public so that the option is accessible
throughout the extension. In fact, we will access this field in the scheduler
to set the appropriate prerequisite goals, in the extension information to
set the appropriate output extension information, and in the rewriter to return
type nodes translated to the correct output language.
Modifying scheduler
The only modification to the scheduler is in goal
PreRemoveCArray
by adding RemoveArrayInit
that we implemented in the last section
as a prerequisite goal if the flag translateCArray
is enabled:
/** * Return a goal that must be accomplished before CArray features are * translated. * @param job * @return */ public Goal PreRemoveCArray(Job job) { Goal g = new EmptyGoal(job, "PreRemoveCArray"); try { // Make sure we serialize before changing things. g.addPrerequisiteGoal(Serialized(job), this); Options opts = extInfo.getOptions(); if (opts instanceof CArrayOptions) { CArrayOptions options = (CArrayOptions) opts; if (options.translateCArray) { // Remove array initializers only if we are preserving type // tests, i.e., translating to Java 5. g.addPrerequisiteGoal(RemoveArrayInit(job), this); } } } catch (CyclicDependencyException e) { throw new InternalCompilerError(e); } return internGoal(g); }This results in the final implementation of
CArrayScheduler
:
CArrayScheduler.java
Modifying output extension information
Method
outputExtensionInfo
is modified to return the output
extension information object for Java 5 instead if the flag
translateCArray
is enabled. Notice that we can serialize type
information again, because the JL5 extension uses a different serializer:
@Override public polyglot.frontend.ExtensionInfo outputExtensionInfo() { if (this.outputExtensionInfo == null) { CArrayOptions options = (CArrayOptions) this.getOptions(); if (options.translateCArray) this.outputExtensionInfo = new JL5OutputExtensionInfo(this); else this.outputExtensionInfo = new JLOutputExtensionInfo(this) { @Override protected Options createOptions() { Options options = super.createOptions(); // We already serialized when erasing constant arrays, // so don't do it again. options.serialize_type_info = false; return options; } }; } return outputExtensionInfo; }This results in the final implementation of
ExtensionInfo
:
ExtensionInfo.java
Quasi-quoting
The quasiquoter for Java 1.4 is implemented in class
polyglot.qq.QQ
,
which provides methods for parsing strings into
ASTs. To use the class, invoke one of the parseT
methods to create
an AST node of type T
.
Each
These patterns are recognized as tokens by the lexer, surrounding the token with
whitespace or parentheses may be needed to parse the string.
parseT
method takes a format string as its first argument,
followed by zero or more additional Object
arguments. Each pattern
in the format string is matched with its corresponding Object
. The
format string may contain the following patterns:
Pattern | Expected substitution |
---|---|
%s |
String (parsed as an identifier) |
%T |
Type or TypeNode |
%E |
Expr |
%S |
Stmt |
%D |
ClassDecl |
%M |
ClassMember |
%F |
Formal |
%LT |
List<Type> or List<TypeNode> |
%LE |
List<Expr> |
%LS |
List<Stmt> |
%LD |
List<ClassDecl> |
%LM |
List<ClassMember> |
%LF |
List<Formal> |
For example:
Expr e; TypeNode t; Stmt s = qq.parseStmt("%T %s = new %T(%E);", t, "tmp", t, e);
Java 5 also has a quasiquoter located in package
polyglot.ext.jl5.qq
. As we will see below, this quasiquoter can
only handle a handful of Java 5 features; unsupported features must be
constructed using the node factory for Java 5.
Implementing translation
Modifying the rewriter
To implement the translation, we again return to
CArrayRewriter
and
modify method typeToJava
to return different type nodes depending
on the option flag:
CArrayRewriter.javaNotice the use of quasiquoter to parse translated array class types. The rewriter also mirrors the option flag so that extension classes can access it when determining whether to preserve semantics when translating AST nodes.
Adding translation methods in extension classes
ExtensionRewriter
invokes method extRewriteEnter
before visiting children of an AST node, and method extRewrite
when
leaving the AST node. We will override these methods in several extension
classes to complete the translation.
First, we must import the translated representation class
CArray
.
This is done by creating extension class CArraySourceFileExt
:
CArraySourceFileExt.java
Next, we translate array creation expressions by overriding method
extRewrite
in extension class CArrayNewArrayExt
:
CArrayNewArrayExt.javaIn the code above,
initArg
is the argument to method
init
defined in the translated representation class
CArray
. A complication of the translation of array creation
expressions is that generic arrays cannot be instantiated when multidimensional
arrays are translated. As a result, arrays of erasure types are created first
and then cast to the desired type. For example, translating
String const[][] csm = { { "Hello", "World" }, { "Kitty", "Cup" } };gives
ConstArray<? extends ConstArray<? extends String>> csm = Array.init((ConstArray<? extends String>[]) new ConstArray[]{ Array.init(new String[]{ "Hello", "World" }), Array.init(new String[]{ "Kitty", "Cup" }) });
Exercise
Translate array accesses to invocations of method
get
in class
CArray
. For example, ak[47]
should translate to
ak.get(47)
Solution:
+ Reveal...
Exercise
Modify the extension factory to create instances of extension classes we have
created in this section.
Solution:
+ Reveal...
Add these methods toe
CArrayExtFactory_c
:
@Override protected Ext extArrayAccessImpl() { return new CArrayArrayAccessExt(); } @Override protected Ext extSourceFileImpl() { return new CArraySourceFileExt(); }
Now, all the translation test cases should pass.