# Compiling Functions and Arrays

## CS 212 - Spring 2004

### Due: Monday, April 12, 11:00PM Revised Due Date: Monday, April 19, 11:00PM

Update: The example program appearing below had contained an error that has now been corrected. The base case for factorial now returns 1 instead the previous (incorrect) value of 0.  Both versions of the program are legal Bali programs, but the error caused factorial to be computed incorrectly.

Project Part 3 has been somewhat simplified from the version discussed in class during the March 17 lecture. Some of the extra material now appears as Bonus Work (see the end of this document).

### Objectives

• Extend your compiler to handle arrays and recursive functions.
• Extend your compiler to report reasonable error messages for syntax errors and for semantic errors.

### Bali Part 3

We are going to be using just a subset of Bali for this assignment --- a different subset than that used for Part 2. The Part 3 subset includes arrays and functions. Arrays are stored in SaM's Heap.  In the SaM Simulator, you can use the Display menu to show the Heap. A single program can include multiple functions, but there is still a main function that is called automatically when a Bali program is run.  Bali functions can be overloaded. In other words, different functions can have the same name; your compiler is supposed to know which one to call based on the number and types of the arguments. Here's a sample program:

```// Sample Part 3 program that does a factorial calculation
int main ( )
{int n;} {
while n >= 0 do {
print factorial(n);
}
return 0;
}
int factorial (int i)
{} {
if i < 2 then return 1;
return i * factorial(i - 1);
}```

The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 3.

### Arrays

In Bali, an array type is represented by a type followed by brackets. Thus the following Bali declaration declares an array of integers.

`    int[ ] myIntegers;`

After this declaration, myIntegers has the value null. To use an array it has to be initialized. This can be done by assigning an expression of the form type[size] to an array variable; this creates an array of the given size where each element has the default value for type.  (Each type has a default value; see the Bali Specs for Part 3.) The following example creates an array of size 4 all of whose elements are zero.

`    myIntegers = int[4];`

Once the array is initialized, you can use it as you would expect.  Array indices start at 0. The 4 elements of the myIntegers array are myIntegers[0], myIntegers[1], myIntegers[2], and myIntegers[3].

### Implementing Arrays

To implement an array, the instruction MALLOC is used to reserve space in the Heap. To reserve space for an array of size 4, use the following sam-code:

```    PUSHIMM 4
MALLOC```

These instructions will reserve a block of size 5 in SaM's Heap, 4 words for the array and 1 word to indicate the size of the block (in this case, the block size is 5). MALLOC leaves the block's address on top of the Stack. This address is the address of the word that holds the block size. The array itself is located at address+1, address+2, address+3, and address+4.

Each variable has a location (i.e., a single word at some known offset from the FBR). For a variable of type int or of type boolean, the value stored in this location is the actual value of the variable. For a variable of type int[] or of type boolean[], the value stored in this location is the address of the array. A subscript implies that some simple address-arithmetic is done to calculate the address of a particular array element. Once the address of a particular array element is determined STOREIND (store indirect) and PUSHIND (push indirect) can be used to access the value of the array element. Some example sam-code appears below.

An array that has not been initialized does not (yet) have an address for the elements of the array. One way to indicate this is to store zero as the array's address; it is easy to see that zero cannot be the address of a valid heap-block since the heap addresses start at 1000. The zero that you use in your sam-code corresponds to the null that appears in the Bali-code. In other words, a Bali condition that checks for null (such as myArray == null) compiles into sam-code that checks for an address that is zero.

Here's an example showing some simple Bali code and some equivalent sam-code. Note the use of PUSHIND (push indirect) to retrieve the value of an array element and the use of STOREIND (store indirect) to store a value into an array element.

Bali Code Sam Code Comment
myIntegers = int[4]; PUSHIMM 4 Push array's size onto Stack
MALLOC Create heap-block of size 5 (array size plus one more word); push block's address onto Stack
PUSHIMM 1
ADD Address arithmetic; address of myIntegers[0] is now on top of Stack
STOREOFF 13 Store address of array into myIntegers; we arbitrarily assume that myIntegers is at offset 13 from the FBR.

myIntegers[2] = 44; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIMM 44
STOREIND Stores 44 into myIntegers[2]

x = myIntegers[2]; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIND Value (i.e., 44) stored in myIntegers[2] is placed on top of Stack
STOREOFF 9 Store value at top of stack into x; we arbitrarily assume that x is at offset 9 from the FBR.

The SaM Simulator does automatic garbage collection, so there is no need to free heap-space for arrays that are no longer in use.

### Functions

Functions in Bali can be overloaded. In other words, two different functions can share the same name as long as they differ in number or type of parameters. Your Bali compiler should determine which function to call by finding the one with matching signature. The signature encodes the function's name as well as the number and types of the function's parameters (in order). Overloaded functions that share the same function name are required to all have the same return type; this is a potential semantic error that your compiler should detect.

Note that Bali does not do any automatic conversion of types, so function arguments and function parameters must match exactly. (This implies that, since the value null does not have a type of its own, it is a semantic error to use null as a function argument.)

### Implementing Functions

Because functions are potentially overloaded, you cannot determine which function to call by simply checking the function's name. You can, however, determine which function to call by checking the function's signature.  For a function call, you first determine the appropriate signature (based on the function's name and the types of its arguments) and then create code that calls the function whose signature matches. You can encode a function's signature in any way you want, but a Java String works fine.

The Stack is used to keep track of the function calls. Each time a function is called a new stack frame is created. The FBR (Frame Base Register) is used to indicate the current frame. Recall that the FBR is used in the SaM instructions PUSHOFF and STOREOFF. PUSHOFF offset pushes the value stored at FBR+offset onto the Stack.  STOREOFF offset stores the value on top of the stack into location FBR+offset.

What is kept in a function's frame? Here's a picture of a stack frame. The stack grows toward the top. See the March 10 lecture for more details.

 local variables saved PC saved FBR parameters return value

The local variables are the function's local variables. The saved PC indicates where in the code we are supposed to jump when the function is done. The saved FBR indicates the value the FBR should have when this frame is destroyed (i.e., the previous stack frame).  The parameters are the function's parameters and the return value is the function's return value.

The caller (i.e., the code that calls the function) and the callee (i.e., the function itself) have joint responsibility for building the stack frame when the function is called and for clearing the stack from when the function is done.  See the SamSummary page for details on the instructions used below.

Code Pattern for Caller
Example Bali code: func(exp1, exp2, exp3);
ADDSP 1 Push space for return value Frame Creation

code for exp1
code for exp2
code for exp3

Push arguments
LINK Create new frame
(update FBR, pushing old FBR onto stack)
JSR func Jump to callee, pushing return address onto stack
POPFBR Restore the FBR (from stack) Frame Clean-Up
ADDSP –3 Clear the arguments from stack

Code Pattern for Callee

Example Bali code:
retType func (type1 exp1, type2 exp2, type3 exp3)
{variables}
{statements; return exp}

ADDSP v Reserve space for local variables on stack Frame Creation
code for statements   Function Execution
code for exp
JUMP endfunc
endfunc: STOREOFF –4  Store the return value in stack frame
ADDSP –v Clear local variables from stack Frame Clean-Up
JUMPIND Return to caller (using return address stored in frame)

Note that the sam-code for a program must initially follow the Code Pattern for Caller in order to set up the stack frame for the main function. Of course, the main function itself must follow the Code Pattern for Callee.

### Symbol Tables

You will need a symbol table for each function. The symbol table for a function holds, for each variable, the variable's location (i.e., the offset from the FBR) and the variable's type. Note that the word variable here is being used to denote both parameter variables and local variables. A function's symbol table can be discarded after the code generation for the function is complete.

You may want to use a separate pass through your AST to collect all the function names (and signatures) before you start generating code. This information can be stored in a separate function symbol-table. Alternately, you can do without the separate pass through the AST by encoding the function's signature in the sam-code label for the function. Note that a sam-code label can contain any characters at all as long as the label is within double quotes. In either case, you should generate an appropriate error message if a Bali program contains a call to a nonexistent function.

### Error Handling

For this assignment, we will test your compiler's response to errors in supplied Bali programs. There are two kinds of errors that can occur in Bali programs:

• syntax errors, code that violates the rules of the Bali grammar, and
• semantic errors, code that violates the rules of Bali semantics.

For example, a missing semicolon or an unmatched parenthesis is a syntax error; a type mismatch or an undeclared variable is a semantic error.

Rather than printing an error message to System.out (or System.err), errors in supplied Bali programs can be handled most clearly and cleanly by using exceptions. We have supplied exception classes corresponding to the two kinds of errors that can occur:

Depending on how you write your code, you may also generate TokenizerExceptions; these are exceptions thrown by the tokenizer when various mismatches occur. The SamTokenizer includes several ways to examine/consume tokens; you can avoid TokenizerExceptions if you try. With correct error handling, TokenizerExceptions should never be thrown.

BC.java, the main program that runs your compiler, is designed so that it catches IllegalBaliExceptions and then prints the exception's message.

### Parsing an Expression Statement vs. Parsing an Assignment Statement

Note that according to the grammar, an expression statement and an assignment statement can both start with a set of tokens that look like an expression. There is no way to tell that you are parsing an assignment statement until you get to the equal sign (=).

The easiest way to handle this is to start parsing as if you are parsing an expression. Once the expression is complete you can check for the equal sign (=). At this point, if you find an equal sign, you need to re-examine the AST you just built for the expression to see if it is appropriate for use as the target of an assignment statement. Your compiler should throw a BaliSyntaxException if the expression is inappropriate as a target.

Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 3) into valid sam-code. You may use any of the sam-code instructions; these are summarized in SaM Summary

• You have the option of building your Part 3 compiler by extending or own Part 2 compiler or by extending the sample solution compiler. The sample solution should appear on the web in the near future.
• Your compiler should handle arrays and functions as described in the Bali Specs for Part 3
• Your compiler must use an AST (Abstract Syntax Tree) as an intermediate stage between parsing and code generation.
• The entire AST must be completed before code generation begins.
• The different node types of the AST should correspond to separate classes.  Note though that all of the AST classes should be subclasses of a single AST interface or single AST abstract class.  (For an example, see SimpleLanguage.java.txt, where an abstract class  is used.)
• Your compiler must use a symbol table to keep track of variable locations and variable types. For this assignment, Bali includes the types: int, boolean, and corresponding array types (one-dimensional arrays only; multidimensional arrays can be done for bonus credit; see below).
• Your functions must use stack frames as described in this document and in lecture.  (Previous incarnations of 212 used a different order for items in the stack frame.)
• You must provide a class called BaliCompiler (in a file called BaliCompiler.java) that serves as the starting point for your compiler.
• BaliCompiler must implement the Compiler interface (Compiler.java).
• BC.java is a main class that handles the file I/O, initializes the SamTokenizer, and calls BaliCompiler. You can use this class to test your BaliCompiler.
• BaliCompiler.java is stub version of the class you need to create.  It also shows how you can set up a PrintWriter to make it easy to produce your code as a String (as required by the Compiler interface).
• BC.java makes use of SamTokenizer, Tokenizer, and TokenizerException. The most up-to-date versions of these are stored on the SaM simulator page where there are also instructions on the use of the .jar file in which these are located.
• Your compiler should not produce any output either to a file or to the standard output or err file. It is OK for your program to produce such output while you are debugging, but any such debugging output must be turned off in your submitted version. The only I/O that should take place is the I/O moderated by BC.java.
• Your compiler must generate sam-code that represents a valid translation of the supplied Bali program.
• Your generated sam-code should contain no comments.
• Remember to test your sam-code in the SaM Simulator.
• Your compiler must also respond correctly to Bali programs that contain errors, throwing either BaliSyntaxException or BaliSemanticException as appropriate. Once an error is detected, no further processing of the Bali program is necessary; detection of multiple errors can be done for bonus points.
• You can assume the Bali programs that we use for grading do not produce runtime errors. In other words, you can assume that, for all our Bali programs that we use for grading, all subscripts are within bounds and arrays are always initialized before use. Runtime error reporting can be done for bonus points (see below).

Submission Instructions for Part 3:

• Submit all of the Java files associated with your compiler. These should be placed in a single .zip file
• Do not submit any of the files that we have provided. We will compile your code and run it with the reference copies posted on the website.
• Do not submit .class files or any other files that are produced when your classes compile. We only want the .java files.
• Make sure that your Java code compiles, even if you have trouble getting it to run properly. We will be recompiling all of your code and running it
• Do not include extraneous text in your programs, such as debugging statements or superfluous comments.
• Special files:
• You may include (within the .zip file) a brief file called `readme.txt` if there are special features or issues about your code that you need to communicate to the grading staff.
• You may also include (within the .zip file) a brief file called `instructions.txt` if there is anything the grading staff should know about how to run your compiler (i.e., if you have used packages, for instance).
• Please do not include these files unless it is truly necessary for you to communicate extra information to the grading staff.
• Place a comment block at the top of each file that you have created. The block must give the assignment number, due date, and creator name(s) and Net-ID(s). For example,
```/**********************************
* Assignment #0: Example format
* Date: 1/1/1111
*
* Tad Morose: tm0
* Sue Themall: st0
**********************************/```

### Bonus Work

Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 3.

There are 3 possibilities for bonus credit on this assignment: (1) multiple error reporting, (2) multidimensional arrays, and (3) runtime error reporting. For bonus credit, you must include a note in the special file readme.txt indicating which bonus work is included in your Bali compiler.

#### Multiple Error Reporting (up to 10 bonus points)

Detect and report multiple errors in Bali programs. We have provided the MultipleBaliException class for those groups who wish to try detecting multiple errors. Suggested usage instructions have been provided in this file. The basic idea is to catch and accumulate errors, continuing to parse after each error. How does one resume parsing after a syntax error? Consider ignoring tokens until you reach a token that you "recognize".

#### Multidimensional Arrays (up to 15 bonus points)

Multidimensional arrays can be created by adding more brackets. For instance, a 2-dimensional array of integers can be declared and initialized as follows.

```    {int[][] values;}
{values = int[2][3];               // 2-by-3 array of zeros
values = [[1, 2, 3], [4, 5, 6]];  // 2-by-3 array of integers
values = [[1], [4, 5, 6]];        // Rows of varying length
}```

The last initialization example shows that, as in Java, the rows of a multidimensional array do not all have to be the same size.

Multidimensional array require some changes to the grammar. Here are the new and/or changed productions.

type ->   ( int | boolean) ( [ ] )* Integer, boolean, integer
array, or boolean array
Multiple brackets
now allowed
target -> name [ subscript* ] Can assign to a variable
or an array element
Multiple subscripts
now allowed
expression   -> [ [ expressionList ] ] An entire array A new production
expPart -> name [ functionArgs | subscript* ]      Variable, function, or
subscripted variable
Multiple subscripts
now allowed

#### Runtime Error Reporting (up to 5 bonus points)

Some programming errors cannot be detected at compile time. Such errors are called runtime errors. Here are some potential runtime errors.

• An attempt can be made to divide by zero.
• An array subscript can be out of bounds (i.e., the subscript is negative or the subscript is greater than or equal to the array size).
• An array can be used before it is initialized.

For bonus credit, you must detect such errors, producing a reasonable error message.