CS 412/413
Introduction to Compilers
Cornell University Computer Science Department, Spring 2001

Programming Assignment 5: Iota+


Due Friday, April 27


In this programming assignment, you will build a static checker and code generator for the language Iota+, which is defined in the handout http://www.cs.cornell.edu/courses/cs412/2001sp/iota/iota+.html.  Your program will determine whether the input source file is a legal module, in conjunction with any interface files that it depends on, and report any lexical, syntactic, or semantic errors in the module or interfaces.  If the module is legal, it will be compiled to assembly code in the same fashion as for Programming Assignment 4.  We expect you to build upon the code that you wrote for Programming Assignments 1-4, and to fix any problems with your code generation that interferes with code generation for correct Iota+ programs.

To help you complete this assignment in a timely fashion, we are providing parsing code on which you may base your development.  We recommend, but do not require, that you implement the programming assignment using the provided code, which includes lexical analysis and most of the syntactic analysis.  The parsing code is written using Java CUP, an LR parser generator, which will make extending the grammar much simpler than it would be in a recursive-descent parser.  To use this code, you will need to integrate your implementation with the provided code and do the following pieces of work:

Few, if any, modifications to assembly code generation should be required, unless there are bugs in your assembly code generation that need to be fixed.  We still do not ask you to implement optimal register allocation.

Compiler requirements

The name of the compiler class and its syntax for invocation remain the same as in Programming Assignment 4. You are still required to implement the -dump_ast, -dump_ir, -dump_cir, and -dump_aa options from previous programming assignments

Global variable initialization is now supported, which leads to the question: where does the initialization code go?  Initialization of the global variables in each module will be performed by a special function that is invoked when the program starts.  We have provided some extra support to make this feasible:

  1. You need to download the new version of the Iota Standard Library, and a new jar file: Iota0.jar. The jar file contains the program Iota.Prelinker that will take object file names, extract the module name from them and generate assembler code for a function named _iota__init, which simply calls each of the modules' _foo$init functions.  The Iota Standard Library main function will now call _iota__init before calling your _iota__main.
  2. In each module foo.im, your compiler will generate a function named _foo$init that will perform all global variable initialization.  It should look like the following:
                  PUBLIC _foo$init
                  _foo$init PROC NEAR
                      push ebp
                      mov ebp, esp
                      ; global initialization code here
                      pop ebp
                      ret
                  _foo$init ENDP
    

    We are not worried about any naming clashes because the '$' character, while invalid as an Iota identifier character, is valid as a MASM identifier character.

  3. You will link your programs using the new iplink script.  It is the ilink script, modified to invoke Iota.Prelinker before linking your modules together to ensure that _iota__init is generated properly each time.

Runtime support

The library file iota.lib contains the code for the standard modules io, file and conv that are defined in the language specification.  This library file has been extended to include support for the new io objects described in the io and file interfaces.

Object representation

Iota+ adds class types to the set of types supported.  To allow standardization, we have selected a mandatory representation for values of class type, which is the format described in class for class hierarchies with single inheritance.  An example of this representation is depicted below.  An object begins with a single word pointing to the dispatch vector for the object's class.  The fields of the object follow in the order they are declared in the class.  All fields, even booleans, take up a single 4-byte word.  The dispatch vector contains a field at offset 0 that is used for implementing safe run-time casting (see below).  The remainder of the dispatch vector contains pointers to the code for each of the methods of the object.  The methods include not only the methods explicitly declared in the class of the object, but also methods from each of the interface types, if any, that the class declares.  The methods are numbered starting from one, with each method having an index one greater than the previous one in the same class or interface type.  The first method in a class or interface has an index one greater than the last method in the immediate supertype, or one if there is no immediate supertype.

[Object Layout Image]

For example, consider the following class and interface declarations, assuming that sqrt and atan2 are defined in some appropriate fashion, and that the code is in the file Points.im:

       interface Point {
           x() : int
           y() : int
       }
       class CartesianPoint implements Point {
           x_, y_ : int
           x(): int = x_
           y(): int = y_
           r(): int = sqrt(x*x + y*y)
           theta(): int = atan2(y, x)
       }
These definitions result in an object of type CartesianPoint having the memory layout shown in the picture above.

To create storage for a real object, it will be necessary to call the routine _iota__malloc, passing the size of the object in bytes.  This routine returns the address of the allocated storage.  The fields of the object then can be zeroed to give them all their default values, and the pointer to the dispatch vector must be installed.  Since all objects of a particular class share the same dispatch vector, there is no need to allocate space for the dispatch vector dynamically.  The dispatch vector is global storage and must be placed in the data segment along with other global variables. 

Methods are implemented in a manner similar to functions.  They take an additional first argument that is the object this, on which the method was invoked.  Another difference is that method names are mangled differently.  In the generated assembly code, the label beginning the code of a method m in a class C is C__m.  Methods are not exported as public identifiers, so the name of the containing module does not need to be part of the mangled name.

The special value null may be used as a value of any object type.  It is represented by a word containing the address 0.  

The special cast expression performs a dynamic cast of an expression to a new object type.  The Iota+ language definition requires the cast be checked for validity. If the check fails, the cast operation should evaluate to null.  To implement casting to a class type, the dispatch vector pointer should be compared against the address of the class dispatch vector; if they are equal, the cast succeeds. Implementing a cast to an interface type is more complex and uses the special dispatch vector slot at index 0. This slot stores a pointer to a run-time representation of the interface that the class implements. The run-time representation of an interface type I that is declared in module M is a public word of memory at the label _M__I  in the data segment of module M. The address of this word of memory is used to uniquely signify the interface in the same way that the address of the class dispatch vector signifies the class. If an interface extends another interface, its word of memory contains a pointer to the run-time representation of the extended interface; otherwise, it contains 0. By walking up the chain of pointers, it is possible to find all supertypes of an object. An object implements an interface if the address of the interface representation occurs anywhere in the chain of pointers.

The language definition gives special rules for casting from and to the types string and array[T].  These rules are needed because the implementation of casting just described does not work for objects lacking dispatch vectors.  To implement the language correctly, you can implement a cast to these types without any run-time check. Also, because casting from a string or array to any other type is allowed to have arbitrary behavior, the implementation above is sufficient. Providing more complete run-time checking of casts from and to strings and arrays would require a modification to their runtime representation.

Extra credit

For up to 5% extra credit, you may implement the run-time cast operation in a manner that is more efficient than simply walking up the chain of supertype pointers. In doing so, you must make sure that run-time casting still works even for objects defined in code that we give you (you can't change the Iota standard library or the MMII library to reorganize the object layouts defined there).

Unlike in Programming Assignments 3 and 4, you are required to implement array and string bounds checking and short-circuit boolean operations. There is no extra credit for these items, of course.

Partial implementation

The supplied code that partially implements Programming Assignment 5 is available here.  Please read the file README-pa5.txt for information about this code.  We will announce any updates to this code on the newsgroup.

What to turn in

Turn in on paper:

Turn in electronically:

Electronic submission instructions

Your electronic submission is expected at the same time as your written submission: at the beginning of class on the due date.  Electronic submissions after 10AM will be considered a day late.  To submit Programming Assignment 5, please drop your files in \\goose\courses\cs412-sp01\grpX\pa5, where grpX is your group identifier.  Please organize your top-level directory structure as follows :

Note: Failure to submit your assignment in the proper format may result in deductions from your grade.