Due Friday, March 31
In this programming assignment, you will build a simple code generator for the language Iota. At the end of this assignment, you will have a working compiler for most of the Iota language. We expect you to build upon the code that you wrote for Programming Assignments 1-3, and to fix any problems with your lexical analysis, parsing, or semantic analysis that interfere with code generation for correct Iota programs.
Your program will have a main class, Iota.Compiler, that compiles Iota code to assembly code. When invoked from the command line, the compiler will implement at least the following minimal specification: it should expect to receive a single argument that is the name of a source file containing a module specification, with an extension of ".mod". The compiler will parse and type-check the program as in programming assignments 1 and 2, and convert the program to canonical IR as in programming assignment 3. It will then perform the following steps:
".asm
" fileobj
"
file.When invoked on a source code with the command line
"jview Iota.Compiler foo.mod
", the compiler
will generate an assembly file "foo.asm
" that
can be assembled using MASM to produce an object file
"foo.obj
". We are supplying a library file
"iota.lib
" that implements the io
and conv
interfaces described in the language definition,
as well as some other useful utility functions that you will need for
code generation.
Your compiler must support a number of dump options for debugging purposes:
-dump_aa
" option prints the abstract assembly for each
function (separately).You may choose to add additional options not mentioned here as long as they do not interfere with the required functionality. For example, your compiler might accept more than one source file on the command line or support more debugging options than those listed. You may allow multiple dump options to be specified; we require support for only a single dump option on a particular command invocation.
Your compiler will produce assembly files (with an
.asm
extension) from correspondingly named files with a
.mod
extension. You then can use the assembler to convert
these assembly files into object files (.obj
files) and
the linker to convert a set of such object files into an executable
file (.exe
extension).
To invoke the assembler, the following command line is used for an
assembly file foo.asm
:
ml /Zd /Zi /c /coff foo.asm
Supposing that this module is the only module in the program, it can be linked as follows:
link /pdb:none /debug foo.obj iotaextra.lib iota.lib
The library file iota.lib is a collection of
.obj files bundled together, containing the code for the
standard modules io
and conv
that are
defined in the language specification, along with run-time support for
garbage collection and a few useful utility functions that can be
called from your generated code. The file iotaextra.lib is an
additional library that you may modify to include additional
utility functions.
The program ml
can be found in the directory \\goose\courses\cs412-sp00\masm611.
The program link
is from Microsoft Visual Studio VC++ 98
(C:\Program Files\Microsoft Visual Studio\VC98\Bin\ in the
undergraduate lab). However, to simplify the assembly and linking
process, we are providing scripts newpath.bat,
asm.bat and ilink.bat that package up these
operations for you, so that the lines above can be typed simply as
newpath asm foo ilink foo
The first argument to ilink
is the name of the module
containing the program's main
function. More module names
can be specified on the command line following this first module,
causing the linker to use the corresponding object files.
Examples of how to compile various Iota constructs are given in a companion handout, Pentium Code Samples. These examples show the extra assembler directives necessary to instruct the assembler to generate 32-bit code, and assembly code for some of the example programs found in the Iota language definition. You can work backward from this assembly code to help figure out what IR to generate for various programming constructs. You will also find the Intel Architecture Software Developer Manuals, volumes 1 and 2, to be helpful.
You will be generating 32-bit Pentium code, so all variables and registers will take a full 32 bits of storage. Although it would be possible to generate code that stores characters and booleans more compactly, these data types will be stored in 32-bit words as well. All stack locations and dynamically allocated objects will be aligned to 32-bit boundaries: the address of a variable or dynamically allocated object always will be divisible by four.
The library file iota.lib
contains the code for the
standard modules io
and conv
that are
defined in the language specification. It also contains several
additional routines that are useful when generating code. These
routines are described in the Pentium Code Samples
handout.
Arrays and strings will be stored in memory in the manner described in class.
String constants will be allocated statically in the data segment. To
create new arrays and strings, you will use the newarray
and strconcat
routines described in the Pentium Code
Samples handout. These routines allocate heap storage for arrays and
strings. The memory layout of arrays and strings is as
described in the Programming Assignment 3 handout.
You may find it useful to write other utility functions in a
language like C, and generate IR to call these functions rather than
generating IR to do the actual work. This is an acceptable
solution. If you do this, we expect you to explain what you did, to
supply the C source code, and to put an extended
iotaextra.lib
file in your top-level directory that
includes all the added functions in addition to the ones we
supplied. Visual Studio, gcc, or any other compiler
or language you like may be used to produce the
iotaextra.lib
file. A skeleton iotaextra.lib
with a Visual Studio workspace is located at \\goose\courses\cs412-sp00\iotalib\iotaextra\.
The canonical IR trees generated in programming assignment 3 will be converted to assembly code using tiling and the simple stack-frame register allocation scheme described in lecture. When the canonical IR code for f(x:int, y:int):int = g(y) + y:
MOVE(TEMP(t1), CALL(NAME(g), MEM(FP + 12))) MOVE(RV, MEM(FP + 12) + TEMP(t1))is converted to abstract assembly by tiling, the following code might result:
push [ebp + 12]
call g
mov t1, eax
mov t2, t1
add t2, [ebp + 12]
mov eax, t2
After the register allocation (t1
= t2
= eax
) and
elimination of unnecessary mov instructions, we might obtain the following efficient code:
push [ebp + 12]
call g
add eax, [ebp + 12]
However, for this programming assignment you will not make this final step. Instead,
all temporaries will be allocated on the stack. The registers eax
, ebx
,
and ecx
will be used for all instructions involving temporaries. Before the instruction, one or
both of these registers will be loaded from the appropriate stack location. After the
instruction, if necessary, the stack location will be updated from the corresponding
register. This register allocation strategy is not particularly efficient, but it is
simple and will result in working code. Assigning the temporary t1
to the
location [fp-4]
, and the temporary t2
to the location [fp-8]
,
the resulting code will look like the following:
push [ebp + 12]
call g
mov [ebp-4], eax ; mov t1, eax
mov eax, [fp-4]
mov [ebp-8], eax ; mov t2, t1
mov eax, [ebp-8]
add eax, [ebp + 12] ; add t2, [fp + 12]
mov [ebp-8], eax
mov eax, [ebp-8] ; mov eax, t2
Obviously there are many opportunities for even simple-minded optimization in this code, but the goal here is just to generate working code. In addition to temporaries, local variables and function arguments should be allocated on the stack relative to the frame pointer register, as discussed in lecture.
An Iota program may comprise several Iota modules that must be linked together. When you generate code for an Iota module, you will generate assembler PUBLIC directives that expose publicly visible symbols in the Iota program so that the linker can link external references from other modules to these global variables or functions. The public variables and functions in the module are, of course, those declared in the interface file of the module.
Global variables and function names in Iota are translated to assembler names using the name mangling scheme described in the Programming Assignment 3 handout.
Certain assembly instructions are generated at the beginning and end of every
function; these instructions are known as the prologue and epilogue,
respectively. The prologue and epilogue are not explicitly represented in the IR
for the function body. In the prologue, the current frame pointer (the ebp
register) is pushed onto the stack, and the stack pointer is adjusted to make
room for the local variables and temporaries used by the function. In the
epilogue, the return value is placed in the eax
register, the stack
pointer is restored to point to the control link, the previous frame pointer is
popped off the stack, and the function returns with a ret
instruction.
Although this should not be an issue for this programming assignment, the various
general-purpose registers on the Pentium architecture are by convention considered to be
caller-save or callee-save as follows: eax, ecx
, and edx
are caller-save, and
ebx
, esi
, edi
, ebp
, esp
are
callee-save. This means that a function can freely modify the registers eax,
ecx
, and edx
, but must
assume that their contents have been destroyed if it in turns calls a function.
Conversely, it may call another function and know that the callee-save registers have not
been modified; however, if it modifies these registers itself, it must restore them to
their original values before returning. Saving callee-save registers is usually done by
adding appropriate push
instructions to the function prologue, and
corresponding pop
instructions to the function epilogue. Caller-save
registers are usually saved and restored around every function call; before the arguments
are pushed onto the stack, any caller-save registers containing useful data are pushed
onto the stack. After the call, the stack pointer is popped to remove the arguments, and
the caller-save registers are then restored by popping them from the stack.
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date. Electronic submissions
after 10AM will be considered a day late. To submit Programming Assignment
4, please drop your files in \\goose\courses\cs412-sp00\grpX\pa4
,
where grpX
is your group identifier. Please organize your
top-level directory structure as follows :
src\
- all of your source code and class files. For example,
we expect to find a class file for your main program in src\Iota\Compiler.class,
and a companion .java file in the same directory.doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy in your src\
directory, brief
descriptions of the major classes, any known bugs, and any other information
that we might find useful when grading your assignment.test\
- any test cases you used in testing your project.Note: Failure to submit your assignment in the proper format may result in deductions from your grade.