Project, Part 4

Compiling Arrays and Other Objects

CS 212 - Fall 2007

Due

Objectives

Extend your compiler to handle arrays (one-dimensional) and classes (without inheritance).

Bali Part 4

As in previous assignments, we are going to be using a subset of Bali --- additional parts of Bali can be implemented for bonus points (see below). The Part 4 subset includes arrays and classes. A single program can contain multiple classes and multiple functions. Unlike Java, Bali can have functions that are not part of any class.  As before, there must be a main function; this function is called automatically when a Bali program is run.  Here's a sample program; this program has not yet been tested so there may be some errors.

# Sample Part 4 program that uses a Queue and a Stack
:
int main ( ) :
  int n, Stack s, Queue q :
  n = 0; s = Stack(); q = Queue();
  loop while n < 5;
    s.put(n); q.put(n);
    n = n + 1;
  endloop
  n = 0;
  loop while n < 5;
    print s.get(), q.get();
    n = n + 1;
  endloop
  return 0;
end

class Node :
  int data, Node link :
  
  # Constructor
  Node Node (int data, Node link) ::
      this.data = data;
      this.link = link;
  end
  
endclass

class Queue :
   Node head, Node last :
   
   void put (int i) :
       Node n :
       n = Node(i, n);
       if head == null then head = n;
       else last.link = n;
       endif
       last = n;
       return;
    end
    
    int get () :
       Node n :
       n = head;
       head = head.link;
       return n.data;
    end
endclass

class Stack :
   Node top :

   void put (int i) : :
       top = Node(i, top);
       return;
   end

   int get () :
       Node n :
       n = top;
       top = top.link;
       return n.data;
   end
endclass

The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 4.

Test Cases

You must also include five to ten Bali-Part4 programs that we can use as input to your compiler. The idea here is (1) to encourage you to think about test-cases before you complete your coding and (2) to give you a chance to show the graders, "Look, my compiler may not work on all your test cases, but it does work on these."  Each of your Bali programs should be clearly labeled to show what it's testing and the result expected when the program is run.

Implementing Arrays

In Bali, an array type is represented by a type-name followed by brackets. Thus the following Bali declaration declares an array of integers.

    int[ ] myIntegers;

After this declaration, myIntegers has the value null. To use an array it has to be initialized. This can be done by assigning an expression of the form type[size] to an array variable; this creates an array of the given size where each element has the default value for type.  (Each type has a default value; see the Bali Specs for Part 4.) The following example creates an array of size 4 all of whose elements are zero. 

    myIntegers = int[4];

You can also initialize an array by listing its elements. The following example creates an array of size 4 whose elements are 1, 2, 3, and 4.  (Note the use of curly brackets {}.)

    myIntegers = int{1, 2, 3, 4};

Once the array is initialized, you can use it as you would expect.  Array indices start at 0.  Note that every  initialized array has a "field" called size that reports the number of element in the array.  For example, myIntegers.size would have the value 4.

To implement an array, the instruction MALLOC is used to reserve space in the Heap. To reserve space for an array of size 4, use the following sam-code:

    PUSHIMM 5
    MALLOC

These instructions will reserve a block of size 5 in SaM's Heap, 4 words for the array and 1 word to indicate the size of the array; you'll need the size information to do bounds-checking for array subscripts. MALLOC leaves the block's address on top of the Stack. Typically,this address is the address of the word that holds the array size. The array itself is located at address+1, address+2, address+3, and address+4.

Each variable has a location (i.e., a single word at some known offset from the FBR). For a variable of type int or of type boolean, the value stored in this location is the actual value of the variable. For a variable of type int[] or of type boolean[], the value stored in this location is the address of the array. A subscript implies that some simple address-arithmetic is done to calculate the address of a particular array element. Once the address of a particular array element is determined STOREIND (store indirect) and PUSHIND (push indirect) can be used to access the value of the array element. Some example sam-code appears below.

An array that has not been initialized does not (yet) have an address for the elements of the array. One way to indicate this is to store zero as the array's address; zero cannot be the address of a valid heap-block. The zero that you use in your sam-code corresponds to the null that appears in the Bali-code. In other words, a Bali condition that checks for null (such as myArray == null) compiles into sam-code that checks for an address that is zero.

Here's an example showing some simple Bali code and some equivalent sam-code. Note the use of PUSHIND (push indirect) to retrieve the value of an array element and the use of STOREIND (store indirect) to store a value into an array element.

Bali Code Sam Code Comment
myIntegers = int[4]; PUSHIMM 5 Push (array's size + 1) onto Stack
MALLOC Create heap-block of size 5 (array size plus one more word); push block's address onto Stack
PUSHIMM 1  
ADD Address arithmetic; address of myIntegers[0] is now on top of Stack
STOREOFF 13 Store address of array into myIntegers; we arbitrarily assume that myIntegers is at offset 13 from the FBR.
     
myIntegers[2] = 44; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIMM 44  
STOREIND Stores 44 into myIntegers[2]
     
x = myIntegers[2]; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIND Value (i.e., 44) stored in myIntegers[2] is placed on top of Stack
STOREOFF 9 Store value at top of stack into x; we arbitrarily assume that x is at offset 9 from the FBR.

Implementing Classes

For Bali Part 4, we are implementing classes and objects, but we are not allowing inheritance.

A description of how classes are implemented appeared in lecture (Lecture 10 - Implementing Objects). Please feel free to ask questions on the newsgroup (cornell.class.cs212), via email, during lecture, or during Office Hours.

Recall that to implement a class, you need a dispatch vector to indicate the code-location for each method. Every instance of a class uses the same dispatch vector. The simplest way build a dispatch vector is to create it as part of the program code:

“DV$Queue”: 
  JUMP “M$Queue$put$i” 
  JUMP “M$Queue$get$” 
“DV$Stack”:
  JUMP “M$Stack$put$i” 
  JUMP “M$Stack$get$”

The idea is that the object-instance itself stores the location of its dispatch vector. For example, a Queue object-instance stores the address “DV$Queue”. You will need to manipulate program addresses (i.e., labels within the sam-code). To do this, use the instruction
            PUSHIMMPA label
which pushes a Program Address (indicated by label) onto the stack.

To jump to the start of a method, you (1) push the method’s offset, (2) push the address of the object’s dispatch vector, and (3) add (offset to addressOfDV).  At this point, the address of the correct JUMP instruction is on top of the stack.

The one remaining difficulty is that you don't want just to jump to the method's code; instead you need something like the JSR instruction so that the return address is preserved (i.e., the return address is the stored PC-value that indicates where you want to resume execution once the method finishes).  You need the JSRIND instruction (i.e., jump to a subroutine whose address is on top of the stack).  This instruction jumps to the program-address on top of the stack while placing the return address (PC+1) on top of the stack. In this case, since you are using a dispatch vector, you end up jumping to a JUMP instruction that is part of the dispatch vector; this JUMP instruction then takes you to the method's actual code.

Garbage Collection

We are pretending that Bali has automatic garbage collection just like Java.  Unfortunately, the SaM Simulator does not actually have automatic garbage collection, so a typical program will leave stuff on the heap. The SaM Simulator helpfully warns you that "Your program leaks memory".  You can ignore this warning.

Symbol Tables

Recall that, for a function, once your compiler has finished generating the code for a function, you no longer need to keep track of that function's local variables.  This is because the function's local variables are inaccessible outside the function.  

Classes work differently.  Each class needs its own (typically small) symbol table indicating the fields and methods of the class. In you compiler, a class's symbol table must be available wherever an instance of that class might be used. This symbol table is used to determine the correct code to generate for constructs such as myInstance.value and myInstance.resetValue().

Error Handling

The only change from Part 3 is that your compiler-generated code must respond reasonably to certain runtime errors (see below). Of course, there are also additional kinds of syntax and semantic errors that can occur because we are using a larger subset of Bali. 

For Part 4, you are responsible for handling 2 kinds of runtime errors: (1) array index out of bounds and (2) null pointer "exception".  There is no way to recover from these errors, but you need to make sure that when such an error occurs, your compiler-generated code responds correctly. For both errors, your compiler-generated code should clear the Stack, place an error code as the only item on the Stack, and then STOP.  We'll use error code -1 to indicate array index out of bounds and error code -2 to indicate an attempt to use a null pointer.

Syntax errors and semantic errors should be handled as in Part 3. We supply the same exception classes for your use; these are unchanged from Part 3.

Packages

As in Part 3, we encourage you to organize your compile project into appropriate packages.  You are welcome to use the bali package and you may want to consider creating subpackages.

Provided Files

These files are unchanged from Part 3.

These provided files should not be changed.  The exception is BaliCompiler.java; it is expected that you will replace this file with your own version.

Your Tasks: Bali Part 4 Compiler

Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 4) into valid sam-code. You may use any of the sam-code instructions; these are summarized on the SaM Simulator page

Bonus Work

Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 4.

The complete Bali Specifications are available on the Bali page. This includes features that we have not asked you to implement such as multidimensional arrays and single inheritance.  For the bonus work, it is up to you to convince the CS212 staff that your compiler does what you claim it does. This means that, for bonus credit, you must provide sample Bali programs that exhibit your bonus work, and a complete description of what kinds of programs your compiler can handle.

Multidimensional Arrays (up to 10 bonus points)

These work more-or-less like Java's multidimensional arrays.

Inheritance (up to 20 bonus points)

Syntax for single inheritance and for upcasting and downcasting is included in the Bali Specifications.

Heap Clean-Up (up to 10 bonus points)

Free all heap allocations as your compiled code reaches the end of its execution, thus avoiding the warning message, "Your program leaks memory".  You'll need to somehow keep track of each allocation on the heap as it occurs and you'll need to use the sam-code instruction FREE.