Project, Part 3

Compiling Functions and Arrays

CS 212 - Spring 2004

Due: Monday, April 12, 11:00PM
Revised Due Date: Monday, April 19, 11:00PM

Update: The example program appearing below had contained an error that has now been corrected. The base case for factorial now returns 1 instead the previous (incorrect) value of 0.  Both versions of the program are legal Bali programs, but the error caused factorial to be computed incorrectly.

Project Part 3 has been somewhat simplified from the version discussed in class during the March 17 lecture. Some of the extra material now appears as Bonus Work (see the end of this document).

Objectives

Bali Part 3

We are going to be using just a subset of Bali for this assignment --- a different subset than that used for Part 2. The Part 3 subset includes arrays and functions. Arrays are stored in SaM's Heap.  In the SaM Simulator, you can use the Display menu to show the Heap. A single program can include multiple functions, but there is still a main function that is called automatically when a Bali program is run.  Bali functions can be overloaded. In other words, different functions can have the same name; your compiler is supposed to know which one to call based on the number and types of the arguments. Here's a sample program:

// Sample Part 3 program that does a factorial calculation
int main ( )
  {int n;} {
  n = readInt();
  while n >= 0 do {
    print factorial(n);
    n = readInt();
  }
  return 0;
}
int factorial (int i)
  {} {
  if i < 2 then return 1;
  return i * factorial(i - 1);
}

The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 3.

Arrays

In Bali, an array type is represented by a type followed by brackets. Thus the following Bali declaration declares an array of integers.

    int[ ] myIntegers;

After this declaration, myIntegers has the value null. To use an array it has to be initialized. This can be done by assigning an expression of the form type[size] to an array variable; this creates an array of the given size where each element has the default value for type.  (Each type has a default value; see the Bali Specs for Part 3.) The following example creates an array of size 4 all of whose elements are zero. 

    myIntegers = int[4];

Once the array is initialized, you can use it as you would expect.  Array indices start at 0. The 4 elements of the myIntegers array are myIntegers[0], myIntegers[1], myIntegers[2], and myIntegers[3].

Implementing Arrays

To implement an array, the instruction MALLOC is used to reserve space in the Heap. To reserve space for an array of size 4, use the following sam-code:

    PUSHIMM 4
    MALLOC

These instructions will reserve a block of size 5 in SaM's Heap, 4 words for the array and 1 word to indicate the size of the block (in this case, the block size is 5). MALLOC leaves the block's address on top of the Stack. This address is the address of the word that holds the block size. The array itself is located at address+1, address+2, address+3, and address+4.

Each variable has a location (i.e., a single word at some known offset from the FBR). For a variable of type int or of type boolean, the value stored in this location is the actual value of the variable. For a variable of type int[] or of type boolean[], the value stored in this location is the address of the array. A subscript implies that some simple address-arithmetic is done to calculate the address of a particular array element. Once the address of a particular array element is determined STOREIND (store indirect) and PUSHIND (push indirect) can be used to access the value of the array element. Some example sam-code appears below.

An array that has not been initialized does not (yet) have an address for the elements of the array. One way to indicate this is to store zero as the array's address; it is easy to see that zero cannot be the address of a valid heap-block since the heap addresses start at 1000. The zero that you use in your sam-code corresponds to the null that appears in the Bali-code. In other words, a Bali condition that checks for null (such as myArray == null) compiles into sam-code that checks for an address that is zero.

Here's an example showing some simple Bali code and some equivalent sam-code. Note the use of PUSHIND (push indirect) to retrieve the value of an array element and the use of STOREIND (store indirect) to store a value into an array element.

Bali Code Sam Code Comment
myIntegers = int[4]; PUSHIMM 4 Push array's size onto Stack
MALLOC Create heap-block of size 5 (array size plus one more word); push block's address onto Stack
PUSHIMM 1  
ADD Address arithmetic; address of myIntegers[0] is now on top of Stack
STOREOFF 13 Store address of array into myIntegers; we arbitrarily assume that myIntegers is at offset 13 from the FBR.
     
myIntegers[2] = 44; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIMM 44  
STOREIND Stores 44 into myIntegers[2]
     
x = myIntegers[2]; PUSHOFF 13 Get array's address from myIntegers and push it onto Stack
PUSHIMM 2 Subscript
ADD Address arithmetic; top of stack now holds address of myIntegers[2]
PUSHIND Value (i.e., 44) stored in myIntegers[2] is placed on top of Stack
STOREOFF 9 Store value at top of stack into x; we arbitrarily assume that x is at offset 9 from the FBR.

The SaM Simulator does automatic garbage collection, so there is no need to free heap-space for arrays that are no longer in use.

Functions

Functions in Bali can be overloaded. In other words, two different functions can share the same name as long as they differ in number or type of parameters. Your Bali compiler should determine which function to call by finding the one with matching signature. The signature encodes the function's name as well as the number and types of the function's parameters (in order). Overloaded functions that share the same function name are required to all have the same return type; this is a potential semantic error that your compiler should detect.

Note that Bali does not do any automatic conversion of types, so function arguments and function parameters must match exactly. (This implies that, since the value null does not have a type of its own, it is a semantic error to use null as a function argument.)

Implementing Functions

Because functions are potentially overloaded, you cannot determine which function to call by simply checking the function's name. You can, however, determine which function to call by checking the function's signature.  For a function call, you first determine the appropriate signature (based on the function's name and the types of its arguments) and then create code that calls the function whose signature matches. You can encode a function's signature in any way you want, but a Java String works fine.

The Stack is used to keep track of the function calls. Each time a function is called a new stack frame is created. The FBR (Frame Base Register) is used to indicate the current frame. Recall that the FBR is used in the SaM instructions PUSHOFF and STOREOFF. PUSHOFF offset pushes the value stored at FBR+offset onto the Stack.  STOREOFF offset stores the value on top of the stack into location FBR+offset.

What is kept in a function's frame? Here's a picture of a stack frame. The stack grows toward the top. See the March 10 lecture for more details.

local variables 
saved PC
saved FBR
parameters
return value

The local variables are the function's local variables. The saved PC indicates where in the code we are supposed to jump when the function is done. The saved FBR indicates the value the FBR should have when this frame is destroyed (i.e., the previous stack frame).  The parameters are the function's parameters and the return value is the function's return value.

The caller (i.e., the code that calls the function) and the callee (i.e., the function itself) have joint responsibility for building the stack frame when the function is called and for clearing the stack from when the function is done.  See the SamSummary page for details on the instructions used below.

Code Pattern for Caller
Example Bali code: func(exp1, exp2, exp3);
ADDSP 1 Push space for return value Frame Creation

code for exp1
code for exp2 
code for exp3
 

Push arguments
LINK Create new frame 
(update FBR, pushing old FBR onto stack)
JSR func Jump to callee, pushing return address onto stack
POPFBR Restore the FBR (from stack) Frame Clean-Up
ADDSP 3 Clear the arguments from stack
 
Code Pattern for Callee 

Example Bali code:
retType func (type1 exp1, type2 exp2, type3 exp3)
     {variables} 
     {statements; return exp}

ADDSP v Reserve space for local variables on stack Frame Creation
code for statements   Function Execution
code for exp  
JUMP endfunc  
endfunc: STOREOFF 4  Store the return value in stack frame
ADDSP v Clear local variables from stack Frame Clean-Up
JUMPIND Return to caller (using return address stored in frame)

Note that the sam-code for a program must initially follow the Code Pattern for Caller in order to set up the stack frame for the main function. Of course, the main function itself must follow the Code Pattern for Callee.

Symbol Tables

You will need a symbol table for each function. The symbol table for a function holds, for each variable, the variable's location (i.e., the offset from the FBR) and the variable's type. Note that the word variable here is being used to denote both parameter variables and local variables. A function's symbol table can be discarded after the code generation for the function is complete.

You may want to use a separate pass through your AST to collect all the function names (and signatures) before you start generating code. This information can be stored in a separate function symbol-table. Alternately, you can do without the separate pass through the AST by encoding the function's signature in the sam-code label for the function. Note that a sam-code label can contain any characters at all as long as the label is within double quotes. In either case, you should generate an appropriate error message if a Bali program contains a call to a nonexistent function.

Error Handling

For this assignment, we will test your compiler's response to errors in supplied Bali programs. There are two kinds of errors that can occur in Bali programs:

For example, a missing semicolon or an unmatched parenthesis is a syntax error; a type mismatch or an undeclared variable is a semantic error.

Rather than printing an error message to System.out (or System.err), errors in supplied Bali programs can be handled most clearly and cleanly by using exceptions. We have supplied exception classes corresponding to the two kinds of errors that can occur:

Depending on how you write your code, you may also generate TokenizerExceptions; these are exceptions thrown by the tokenizer when various mismatches occur. The SamTokenizer includes several ways to examine/consume tokens; you can avoid TokenizerExceptions if you try. With correct error handling, TokenizerExceptions should never be thrown.

BC.java, the main program that runs your compiler, is designed so that it catches IllegalBaliExceptions and then prints the exception's message.

Parsing an Expression Statement vs. Parsing an Assignment Statement

Note that according to the grammar, an expression statement and an assignment statement can both start with a set of tokens that look like an expression. There is no way to tell that you are parsing an assignment statement until you get to the equal sign (=).

The easiest way to handle this is to start parsing as if you are parsing an expression. Once the expression is complete you can check for the equal sign (=). At this point, if you find an equal sign, you need to re-examine the AST you just built for the expression to see if it is appropriate for use as the target of an assignment statement. Your compiler should throw a BaliSyntaxException if the expression is inappropriate as a target.

Tasks

Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 3) into valid sam-code. You may use any of the sam-code instructions; these are summarized in SaM Summary

Submission Instructions for Part 3:

Bonus Work

Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 3.

There are 3 possibilities for bonus credit on this assignment: (1) multiple error reporting, (2) multidimensional arrays, and (3) runtime error reporting. For bonus credit, you must include a note in the special file readme.txt indicating which bonus work is included in your Bali compiler. 

Multiple Error Reporting (up to 10 bonus points)

Detect and report multiple errors in Bali programs. We have provided the MultipleBaliException class for those groups who wish to try detecting multiple errors. Suggested usage instructions have been provided in this file. The basic idea is to catch and accumulate errors, continuing to parse after each error. How does one resume parsing after a syntax error? Consider ignoring tokens until you reach a token that you "recognize".

Multidimensional Arrays (up to 15 bonus points)

Multidimensional arrays can be created by adding more brackets. For instance, a 2-dimensional array of integers can be declared and initialized as follows.

    {int[][] values;}
    {values = int[2][3];               // 2-by-3 array of zeros
     values = [[1, 2, 3], [4, 5, 6]];  // 2-by-3 array of integers
     values = [[1], [4, 5, 6]];        // Rows of varying length
    }

The last initialization example shows that, as in Java, the rows of a multidimensional array do not all have to be the same size.

Multidimensional array require some changes to the grammar. Here are the new and/or changed productions.

Change Made
type ->   ( int | boolean) ( [ ] )* Integer, boolean, integer
array, or boolean array
Multiple brackets
now allowed
target -> name [ subscript* ] Can assign to a variable
or an array element
Multiple subscripts
now allowed
expression   -> [ [ expressionList ] ] An entire array A new production
expPart -> name [ functionArgs | subscript* ]      Variable, function, or
subscripted variable
Multiple subscripts
now allowed

Runtime Error Reporting (up to 5 bonus points)

Some programming errors cannot be detected at compile time. Such errors are called runtime errors. Here are some potential runtime errors.

For bonus credit, you must detect such errors, producing a reasonable error message.