# Compiling Arrays and Other Objects

## CS 212 - Fall 2007

### Due

• Friday, Nov 30, 11:59PM
• Friday, Nov 30 is the last day of regular classes --- there is no final exam for this course

### Objectives

Extend your compiler to handle arrays (one-dimensional) and classes (without inheritance).

### Bali Part 4

As in previous assignments, we are going to be using a subset of Bali --- additional parts of Bali can be implemented for bonus points (see below). The Part 4 subset includes arrays and classes. A single program can contain multiple classes and multiple functions. Unlike Java, Bali can have functions that are not part of any class.  As before, there must be a main function; this function is called automatically when a Bali program is run.  Here's a sample program; this program has not yet been tested so there may be some errors.

```# Sample Part 4 program that uses a Queue and a Stack
:
int main ( ) :
int n, Stack s, Queue q :
n = 0; s = Stack(); q = Queue();
loop while n < 5;
s.put(n); q.put(n);
n = n + 1;
endloop
n = 0;
loop while n < 5;
print s.get(), q.get();
n = n + 1;
endloop
return 0;
end

class Node :

# Constructor
Node Node (int data, Node link) ::
this.data = data;
end

endclass

class Queue :

void put (int i) :
Node n :
n = Node(i, n);
endif
last = n;
return;
end

int get () :
Node n :
return n.data;
end
endclass

class Stack :
Node top :

void put (int i) : :
top = Node(i, top);
return;
end

int get () :
Node n :
n = top;
return n.data;
end
endclass```

The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 4.

### Test Cases

You must also include five to ten Bali-Part4 programs that we can use as input to your compiler. The idea here is (1) to encourage you to think about test-cases before you complete your coding and (2) to give you a chance to show the graders, "Look, my compiler may not work on all your test cases, but it does work on these."  Each of your Bali programs should be clearly labeled to show what it's testing and the result expected when the program is run.

### Implementing Arrays

In Bali, an array type is represented by a type-name followed by brackets. Thus the following Bali declaration declares an array of integers.

`    int[ ] myIntegers;`

After this declaration, myIntegers has the value null. To use an array it has to be initialized. This can be done by assigning an expression of the form type[size] to an array variable; this creates an array of the given size where each element has the default value for type.  (Each type has a default value; see the Bali Specs for Part 4.) The following example creates an array of size 4 all of whose elements are zero.

`    myIntegers = int[4];`

You can also initialize an array by listing its elements. The following example creates an array of size 4 whose elements are 1, 2, 3, and 4.  (Note the use of curly brackets {}.)

`    myIntegers = int{1, 2, 3, 4};`

Once the array is initialized, you can use it as you would expect.  Array indices start at 0.  Note that every  initialized array has a "field" called size that reports the number of element in the array.  For example, myIntegers.size would have the value 4.

To implement an array, the instruction MALLOC is used to reserve space in the Heap. To reserve space for an array of size 4, use the following sam-code:

```    PUSHIMM 5
MALLOC```

These instructions will reserve a block of size 5 in SaM's Heap, 4 words for the array and 1 word to indicate the size of the array; you'll need the size information to do bounds-checking for array subscripts. MALLOC leaves the block's address on top of the Stack. Typically,this address is the address of the word that holds the array size. The array itself is located at address+1, address+2, address+3, and address+4.

Each variable has a location (i.e., a single word at some known offset from the FBR). For a variable of type int or of type boolean, the value stored in this location is the actual value of the variable. For a variable of type int[] or of type boolean[], the value stored in this location is the address of the array. A subscript implies that some simple address-arithmetic is done to calculate the address of a particular array element. Once the address of a particular array element is determined STOREIND (store indirect) and PUSHIND (push indirect) can be used to access the value of the array element. Some example sam-code appears below.

An array that has not been initialized does not (yet) have an address for the elements of the array. One way to indicate this is to store zero as the array's address; zero cannot be the address of a valid heap-block. The zero that you use in your sam-code corresponds to the null that appears in the Bali-code. In other words, a Bali condition that checks for null (such as myArray == null) compiles into sam-code that checks for an address that is zero.

Here's an example showing some simple Bali code and some equivalent sam-code. Note the use of PUSHIND (push indirect) to retrieve the value of an array element and the use of STOREIND (store indirect) to store a value into an array element.

Bali Code Sam Code Comment
myIntegers = int[4]; PUSHIMM 5 Push (array's size + 1) onto Stack
MALLOC Create heap-block of size 5 (array size plus one more word); push block's address onto Stack
PUSHIMM 1
STOREOFF 13 Store address of array into myIntegers; we arbitrarily assume that myIntegers is at offset 13 from the FBR.

myIntegers[2] = 44; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
PUSHIMM 44
STOREIND Stores 44 into myIntegers[2]

x = myIntegers[2]; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
PUSHIND Value (i.e., 44) stored in myIntegers[2] is placed on top of Stack
STOREOFF 9 Store value at top of stack into x; we arbitrarily assume that x is at offset 9 from the FBR.

### Implementing Classes

For Bali Part 4, we are implementing classes and objects, but we are not allowing inheritance.

A description of how classes are implemented appeared in lecture (Lecture 10 - Implementing Objects). Please feel free to ask questions on the newsgroup (cornell.class.cs212), via email, during lecture, or during Office Hours.

Recall that to implement a class, you need a dispatch vector to indicate the code-location for each method. Every instance of a class uses the same dispatch vector. The simplest way build a dispatch vector is to create it as part of the program code:

```“DV\$Queue”:
JUMP “M\$Queue\$put\$i”
JUMP “M\$Queue\$get\$”
“DV\$Stack”:
JUMP “M\$Stack\$put\$i”
JUMP “M\$Stack\$get\$”```

The idea is that the object-instance itself stores the location of its dispatch vector. For example, a Queue object-instance stores the address “DV\$Queue”. You will need to manipulate program addresses (i.e., labels within the sam-code). To do this, use the instruction
PUSHIMMPA label
which pushes a Program Address (indicated by label) onto the stack.

To jump to the start of a method, you (1) push the method’s offset, (2) push the address of the object’s dispatch vector, and (3) add (offset to addressOfDV).  At this point, the address of the correct JUMP instruction is on top of the stack.

The one remaining difficulty is that you don't want just to jump to the method's code; instead you need something like the JSR instruction so that the return address is preserved (i.e., the return address is the stored PC-value that indicates where you want to resume execution once the method finishes).  You need the JSRIND instruction (i.e., jump to a subroutine whose address is on top of the stack).  This instruction jumps to the program-address on top of the stack while placing the return address (PC+1) on top of the stack. In this case, since you are using a dispatch vector, you end up jumping to a JUMP instruction that is part of the dispatch vector; this JUMP instruction then takes you to the method's actual code.

### Garbage Collection

We are pretending that Bali has automatic garbage collection just like Java.  Unfortunately, the SaM Simulator does not actually have automatic garbage collection, so a typical program will leave stuff on the heap. The SaM Simulator helpfully warns you that "Your program leaks memory".  You can ignore this warning.

### Symbol Tables

Recall that, for a function, once your compiler has finished generating the code for a function, you no longer need to keep track of that function's local variables.  This is because the function's local variables are inaccessible outside the function.

Classes work differently.  Each class needs its own (typically small) symbol table indicating the fields and methods of the class. In you compiler, a class's symbol table must be available wherever an instance of that class might be used. This symbol table is used to determine the correct code to generate for constructs such as myInstance.value and myInstance.resetValue().

### Error Handling

The only change from Part 3 is that your compiler-generated code must respond reasonably to certain runtime errors (see below). Of course, there are also additional kinds of syntax and semantic errors that can occur because we are using a larger subset of Bali.

For Part 4, you are responsible for handling 2 kinds of runtime errors: (1) array index out of bounds and (2) null pointer "exception".  There is no way to recover from these errors, but you need to make sure that when such an error occurs, your compiler-generated code responds correctly. For both errors, your compiler-generated code should clear the Stack, place an error code as the only item on the Stack, and then STOP.  We'll use error code -1 to indicate array index out of bounds and error code -2 to indicate an attempt to use a null pointer.

Syntax errors and semantic errors should be handled as in Part 3. We supply the same exception classes for your use; these are unchanged from Part 3.

### Packages

As in Part 3, we encourage you to organize your compile project into appropriate packages.  You are welcome to use the bali package and you may want to consider creating subpackages.

### Provided Files

These files are unchanged from Part 3.

• bali/BaliException.java: The parent class for errors that occur during compilation.  It resides in the bali  package; thus, it must be placed in a directory/folder named bali.
• bali/BaliSemanticException.java: Class for semantic errors that occur during compilation.  It resides in the bali package.
• bali/BaliSyntaxException.java: Class for syntax errors that occur during compilation.  It resides in the bali package.
• bali/Compiler.java: The interface required for the Bali compiler.  Your BaliCompiler.java must implement this interface.  It resides in the bali package.
• bali/MultipleBaliException.java: Class useful for multiple error reporting. It resides in the bali package.
• bali/Scanner212.java: A scanner (tokenizer) that we provide for your use. It makes use of the regular expression package provided as part of Java.  It resides in the bali package.
• BaliCompiler.java: A stub version of the Bali compiler.  You need to replace this file with one of your own.  It resides in the unnamed package and your version must also reside in the unnamed package.  You can use import statements to make use of classes and interfaces in the bali package.  The stub version makes use of such import statements.
• BC.java: Provides command-line control for testing/running BaliCompiler.java.  It resides in the unnamed package.

These provided files should not be changed.  The exception is BaliCompiler.java; it is expected that you will replace this file with your own version.

Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 4) into valid sam-code. You may use any of the sam-code instructions; these are summarized on the SaM Simulator page

• As part of your submission, you must include 5 to 10 Bali programs as described above in Test Cases. Be sure to explain for each program what it's testing and the result expected when the program is run.
• You have the option of building your Part 4 compiler by extending or own Part 3 compiler or by extending the sample solution compiler. The sample solution should appear on the web in the near future.
• Your compiler should handle arrays and classes as described in the Bali Specs for Part 4
• Your compiler must use an AST (Abstract Syntax Tree) as an intermediate stage between parsing and code generation.  The entire AST must be completed before code generation begins.
• Your compiler must use symbol tables to keep track of information about variables, functions, and classes. For this assignment, Bali includes the types int, boolean, and user-created types (i.e., classes).
• You must provide a class called BaliCompiler (in a file called BaliCompiler.java) that serves as the starting point for your compiler.
• BaliCompiler must implement the Compiler interface (Compiler.java, an interface that we provide).
• We provide the class BC.java, a main class that handles the file I/O, initializes the scanner (Scanner212.java, another class that we provide), and calls BaliCompiler. You can use this class to test your BaliCompiler.
• BaliCompiler.java is stub version of the class you need to create.  It also shows how you can set up a PrintWriter to make it easy to produce your code as a String (as required by the Compiler interface).
• Your compiler should not produce any output either to a file or to the standard output or err file. The only I/O that should take place is the I/O moderated by BC.java.
• You must generate sam-code that represents a valid translation of the supplied Bali program.  Remember to test your sam-code in the SaM Simulator.
• Your compiler must respond reasonably to Bali programs that contain errors, throwing either BaliSyntaxException or BaliSemanticException as appropriate. Once an error is detected, no further processing of the Bali program is necessary. The sam-code that your compiler creates should respond reasonably to array-index out of bounds and null pointer errors (see above).
• Place a comment block at the top of each file that you have created. The block must give the assignment number, due date, and creator name(s) and Net-ID(s). For example,
```/**********************************
* Assignment #0: Example format
* Date: 1/1/1111
*
* Don Key: dk0
* Elizabeth (Liz) Ard: ea0
***********************************```

### Bonus Work

Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 4.

The complete Bali Specifications are available on the Bali page. This includes features that we have not asked you to implement such as multidimensional arrays and single inheritance.  For the bonus work, it is up to you to convince the CS212 staff that your compiler does what you claim it does. This means that, for bonus credit, you must provide sample Bali programs that exhibit your bonus work, and a complete description of what kinds of programs your compiler can handle.

#### Multidimensional Arrays (up to 10 bonus points)

These work more-or-less like Java's multidimensional arrays.

#### Inheritance (up to 20 bonus points)

Syntax for single inheritance and for upcasting and downcasting is included in the Bali Specifications.

#### Heap Clean-Up (up to 10 bonus points)

Free all heap allocations as your compiled code reaches the end of its execution, thus avoiding the warning message, "Your program leaks memory".  You'll need to somehow keep track of each allocation on the heap as it occurs and you'll need to use the sam-code instruction FREE.