Programming Assignment 3: Intermediate Code Generation
due Wednesday, March 8, 10AM
In this programming assignment, you will build an intermediate code
generator for the
language Iota. The latest version of the Iota language definition is at
http://courses.cs.cornell.edu/cs412/2000sp/iota/iota.html.
We expect you to build upon the code that you wrote for Programming
Assignments 1 and 2, and to fix any problems with your lexical analysis,
parsing, or semantic analysis that interfere with IR generation for correct
Iota programs.
Requirements
Your program will have a main class, Iota.IRGen, that translates
Iota code to an intermediate representation. When invoked from the command
line, the translator will implement at least the following minimal specification:
it should expect to receive a single argument that is the name of a source
file containing a module specification, with an extension of ".mod".
The translator will parse and type-check the program as in Programming Assignments
1 and 2. It will next
convert the AST for each function into an intermediate representation (IR)
and then put the IR into canonical form.
Your translator must support a number of dump options for debugging purposes:
-
It must continue to support the "-dump_ast" option from Programming
Assignment 2.
-
The "-dump_cir" option prints to System.out a textual
representation of the canonical IR of each of the functions in the
module.
-
The "-dump_ir" option prints to System.out a textual
representation of the (non-canonical) IR of each of the functions in the
module.
You may choose to add additional options not mentioned here as long as
they do not interfere with the required functionality. For example, your
IR generator might accept more than one source file on the command line
or support more debugging options than those listed. You may allow multiple
dump options to be specified; we require support for only a single dump
option on a particular command invocation.
Extra credit
Certain language features need not be implemented in this assignment. However,
you will be required to implement them for a later programming assignment, and
you may do them now for extra credit. If these language features are encountered
in the program, the compiler should print a message: "Language feature not
implemented: <description>" and then exit.
Logic Operators. Iota has the short-circuiting logical operators
"&" and "|".
You are not required to implement these operations.
Bounds checks. The Iota language specification requires that arrays and strings be bounds-checked.
For this programming assignment, we do not require implementation of bounds
checking.
String comparisons. The operators ==
, <
, and
>
all may be used to compare strings to determine their
lexicographic equality or their lexicographic ordering. You need not implement
these operators for this assignment.
Intermediate Code
IR model
We expect that you will use a tree-based IR similar to the one described
in class or the slightly different IR described by Appel. You may choose
to add more IR nodes or modify some of the IR nodes; if you do so, you
must describe the meaning of these new IR nodes and explain why you added
them. We recommend that before coding you carefully design and document all of the IR
translation functions and the transformations that you use to convert the
IR to canonical form. Canonical form for the IR must have
the properties described in lecture. In other words, all side-effects,
including function calls, are moved to top level as statements. All
SEQ and ESEQ nodes must be removed as part of this process. Also, two-way
conditional
jumps must be converted to one-way jumps by basic block reordering and
the possible introduction of new jump statements.
Suggestions for how to proceed
You are free to implement this project in any order you see fit. However,
we suggest that the first step is to carefully design your IR tree nodes
and write the interfaces defining them. This will allow the IR translation
and canonicalization phases to be attacked independently.
For IR translation, define a method on AST nodes that creates a corresponding
IR node, as discussed in lecture. Write a placeholder method that generates
an empty IR node or, even better, a special IR node that can contain the
entire AST node that was to be transformed. When dumped, the special IR
node uses your existing AST dumping facility. This placeholder method
can then be implemented for each of the Iota language constructs in turn, so
that you can test IR translation on simple programs and gradually add the
functionality for translating all of the AST nodes to IR.
For canonicalization, you can take the same approach. First, attack
the problem of bringing side-effecting nodes up to the top level. Elimination
of two-way jumps can be worked on as a parallel problem. For the hoisting
of side-effecting nodes, define a transformation method for each IR node
that reduces it to a sequence of statement nodes plus an expression node
in the case of canonicalizing expressions, as described in class. Again,
you can use a placeholder method to allow incremental development and testing.
The placeholder transformation method simply returns the statement node
or expression node it is invoked on. This approach will allow you to test
out your canonicalization functionality without having implemented all
of the relevant code.
Output format
As in Programming Assignment 2, the printed IR representation should use
parentheses to indicate tree structure. The output will be simpler because
there is no need to print the types of the various nodes. Binary operation
nodes may be indicated using the corresponding symbol. The non-canonical
IR for each function will be a single item in the print-out; the canonical
IR will be a list of parenthesized items, one for each statement in the
canonical IR. For example, the Iota function
f(x: int) = x + g(x)
might be compiled to IR and printed out as follows:
MOVE(RV, MEM(FP + 8) + CALL(NAME(g), MEM(FP + 8)))
For compactness, + nodes in the trees
are printed out in infix notation here; CONST
nodes are printed simply as the corresponding constant, and FP
and RV denote the frame pointer and return
value registers, respectively. During canonicalization, the CALL
node must
be hoisted to the top of the tree, resulting in the following output:
MOVE(TEMP(t1), CALL(NAME(g), MEM(FP + 8)))
MOVE(RV, MEM(FP + 8) + TEMP(t1))
Pretty-printing the IR output
To help you generate reasonable dumps of these IR trees, we are providing
a pretty-printing class CodeWriter that you should use. The
source code for this class is online.
Your IR nodes must implement the Unparse interface from Programming
Assignment 2:
interface Unparse {
public void unparse(Iota.util.CodeWriter cw);
// Write a human-readable representation of the node to the given
// CodeWriter cw.
}
CodeWriter automatically
introduces newlines to avoid excessively long lines, while also maintaining
indentation levels.
In expected use, you will call begin(0) whenever you write
a parenthesis and end() whenever you write its closing parenthesis.
allowBreak(0) is called at every comma and after binary operators.
With this usage, the IR expression above will be formatted as follows with
a narrow formatting width, resulting in a more readable output than simply
wrapping lines:
MOVE(TEMP(t1),
CALL(NAME(g),
MEM(FP + 8)))
MOVE(RV, MEM(FP + 8) +
TEMP(t1))
Do not use the newline() method to insert newlines except at the
end of each statement in canonicalized IR.
As an example, the ESEQ node described in Appel
might be dumped with the following code:
class ExpSeqNode extends IRNode implements Unparse {
StmNode stm;
ExpNode exp;
public void unparse(CodeWriter cw) {
cw.write("ESEQ(");
cw.begin(0);
stm.unparse();
cw.write(",");
cw.allowBreak(0);
exp.unparse();
cw.write(")");
cw.end();
}
}
Implementing Iota
Word size
In the next assignment, you will be generating 32-bit Pentium code, so
all variables and registers will take a full 32 bits of storage. Although
it would be possible to generate code that stores characters and booleans
more compactly, these data types will be stored in 32-bit words as well.
All stack locations and dynamically allocated objects will be aligned to
32-bit boundaries: the address of a variable or dynamically allocated object
always will be divisible by four.
Iota modules and name mangling
An Iota program may comprise several Iota modules that must be linked
together. When you generate code for an Iota module, you must mangle
the names used for publicly visible symbols in the
Iota program so that the linker can link external references from other
modules to these global variables or functions. The public variables and
functions in the module are, of course, those declared in the interface
file of the module. External references in the IR into the form required
by the linker.
Global variables and functions names in Iota are translated to assembler
names as follows. A global identifier I in module M is translated as an
assembler symbol _M__I; that is, two underscore characters are placed
after the name of the module, and one before. For example, if a call
is made to a function F in another module M, the following IR might
be generated for the call, even though
the Iota program expression will look like F(...) :
CALL(NAME(_M__F), ...)
These fully-expanded names are used even when the global function or
variable is found within the same module as the call itself. This name mangling
scheme is chosen for simplicity. Obviously you can create confusion by writing
Iota programs in which double underscores appear in the identifiers. A
better mangling would avoid this problem but be more complex to implement.
The only exception to this name-mangling strategy is the function main,
which is the first function executed in the program. To allow the run-time
system to find this function, it must be mangled as _iota__main
regardless of what module it occurs in.
Iota runtime
Various Iota language constructs require runtime support. When you
generate assembly code in the next programming assignment, we will
provide a library of runtime support routines.
These functions are available to the assembler code you will
generate but are not exposed to the
Iota programmer. Your IR should call these functions.
- iota__newstring. Takes one argument, the number of non-NULL
characters to allocate for the new string. The return value is a
pointer in memory to the allocated storage.
- iota__newarray. Takes two arguments, the first argument is
the desired number of elements in the array, the second is a value to
which all elements will be initialized. Be careful! Recall
that in Iota the expression used to initialize arrays is evaluated
once for each element.
- iota__abort. No arguments. Causes your program to crash.
Dynamically allocated objects
Arrays and strings will be stored in memory in the manner described in
class. To create new arrays and strings, you will use the iota__newstring
and iota__newarray routines described above.
Note that these routines allocate space to store
the length of the array or string. For both arrays and strings, the array
or string variable on the stack, and the addresses returned by the creation
routines, actually are pointers to the 32-bit word following the length;
that is, to the array elements or string characters themselves. Strings
are stored in a packed representation in which every character is placed
in a consecutive byte. The string is always terminated by an extra null
character (0), and the string is also padded out with nulls until it fills
out a 32-bit word completely.
a : array[T]
|
a.length |
a � |
a[0] |
|
a[1] |
|
a[2] |
|
... |
|
a[a.length-1] |
The Iota language specification requires that arrays and strings be
bounds-checked, although it is not required for this assignment. To implement
such a feature, it is necessary to have a way to halt the program in mid-execution.
The procedure iota__abort() can be used to accomplish this.
Return
The Iota return statement can be converted to IR in a number of ways. We
recommend that a return statement be converted to a sequence of two IR
statements. The first statement moves the returned expression (if any)
into the return-value register (RV), and the second statement jumps to
the label of the current function epilogue. This strategy allows the function
epilogue to be shared by all possible return paths from the function. Of
course, the epilogue must have a label for this to work.
Creating the function prologue and epilogue can be postponed
until code generation.
Consequently, the function epilogue in the IR is empty except for a single
labeled statement; we would like you to give it the label "F_epilogue"
where F is the name of the function.
What to turn in
Turn in on paper:
-
A brief document (max 3 pages) describing your IR generator
and its structure, and your testing strategy. This document should be clear,
concise, and convincing. Justify any changes made to the problem set specifications.
-
A design document (max 5 pages) describing the translations and transformations
your program implements. This document does not need to be a complete
description, but it should describe any interesting translation or transformation
problems you encountered, and how you resolved them. The use of the notations
introduced in class for describing code transformations is encouraged but
not required.
-
Include the method signatures from the classes and interfaces you have
written. These signatures may be generated by using either javadoc or javap;
preferably, the former.
Turn in electronically:
-
Source code
-
Compiled .class files properly located in their directory hierarchy
-
The test cases you mentioned in the paper document.
Electronic submission instructions
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date. Electronic submissions
after 10AM will be considered a day late. To submit Programming Assignment 3, please drop your files in \\goose\courses\cs412-sp00\grpX\pa3
,
where grpX
is your group identifier. Please organize your
top-level directory structure as follows :
src\
- all of your source code and class files. For example,
we expect to find a class file for your main program in src\Iota\IRGen.class,
and a companion .java file in the same directory.
doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy in your src\
directory, brief
descriptions of the major classes, any known bugs, and any other information
that we might find useful when grading your assignment.
test\
- any test cases you used in testing your project.
Note: Failure to submit your assignment in the proper format may
result in deductions from your grade.