Compiler Project, Part 2

Compiling Simple Bali# Programs

CS 212 - Spring 2008

Due: Sunday, March 9, 11:59PM



It is important that you start early on this assignment. There is a substantial amount of code to produce; if you wait until the last few days, it is unlikely that you will be able to finish.


We encourage you to participate in a group for this and future CS 212 assignments.  This alleviates some of the programming work and provides a chance to improve your group negotiation/participation skills, something that will be useful for future courses.  You can work in a group of size one if you wish, but groups of size two or three are allowed and encouraged. You are welcome to use the newsgroup (cornell.class.cs212.compiler) to look for partners. Keep in mind the guidelines for Working with Partners specified in the Course Info (e.g., you may not change or drop partners once you have started working on the assignment, but you can change partners between assignments).


We are going to be using a subset of Bali for this assignment. This subset allows only a single class with a single method (the main method) and has just two data types (int and boolean). The subset includes variables, expressions, statements, and some simple I/O. 

The specifications for the part of Bali being used for this assignment appear in Bali# Grammar for Part 2.

Allocating, Assigning, and Retrieving Variables

You must use relative addressing for this assignment. This will prepare you for implementing functions (in Part 3); the implementation of recursive functions requires relative addressing.

The Symbol Table

You need a strategy for remembering (1) where each variable is located and (2) the type of each variable (for this part of the project, either boolean or int). To do this, you should use a symbol table. You must explicitly create a symbol table within your compiler. Consider using java.util.HashMap to implement your symbol table.


Read the online chapter on Looping and Branching in SaM.  You'll need to understand the way that Labels and Jumps work to complete this part of the project.

Generating Code for an IF-statement

This section uses an example based on last semester's Bali.  Consider the following code fragment.

if x>2 then 
    y = 8; 
    y = 0;

As we generate the code, we use the symbol table to determine (1) that x and y are both integers and (2) the locations of x and y (say, offset 2 for x and offset 3 for y). 

To execute the if-statement, SaM first needs to evaluate a condition, which is placed on top of the stack, before jumping to the correct portion of the sam-code. To evaluate the condition x > 2, you must retrieve the value of x and then push the value 2. The top of the stack will hold the result of evaluating the condition. By calling GREATER, SaM will push 0 (false) or 1 (true), depending on the results of the comparison.

// Check if x > 2
PUSHOFF 2    // push the value of x (Vbot)
PUSHIMM 2    // push the value 2 to compare with x (Vtop)
GREATER      // Push result of (Vbot > Vtop) to top of stack

Now that you have accounted for the condition part of the if-statement, you must jump to an appropriate instruction. Calling JUMPC t will move to the label t if the condition is true. Otherwise, SaM will continue executing sam-code because the condition was false. The sam-code below finishes the example. A block of sam-code for the case when x > 2 is true is labeled as trueLabel, though we could have used another name. In addition, when the if and else clauses finish, SaM needs to move to the statement after the entire if-statement, so we have provided the label continue to finish the remaining program.

// Process if statement
JUMPC trueLabel // check if result is true (1) or false (0)
PUSHIMM 0       // false: push the value 0
STOREOFF 3      //        store the value 0 for y
JUMP continue   //        continue with remaining program
trueLabel:      // true:
PUSHIMM 8       //        push the value 8
STOREOFF 3      //        store the value 8 for y
JUMP continue   //        continue with remaining program
continue:       // continue with program

The code above is a bit redundant. You may remove the second JUMP continue statement since SaM will “fall through” the label after analyzing the block labeled trueLabel.

Generating Code for a WHILE-statement

Again, we use an example based on last semester's Bali.  The following steps are necessary for a while-statement.

Consider the following code fragment.

loop while x < 5;

You need one label to jump to when the loop is done (say, done) and another label to jump to when re-evaluating the condition (say, loop). The fragment leads to the following sam-code.

PUSHIMM 1      // push 1 on the stack
STOREOFF 2     // store the value 1 for x 
loop:          // label for the start of the loop
PUSHOFF 2      // retrieve the value of x
PUSHIMM 5      // push the value to compare x with
LESS           // is x < 5 ? push 1 if so; otherwise, 0
NOT            // negate the previous test
JUMPC done     // if not (x < 5), do statements after done
PUSHOFF 2      // retrieve the value of x
PUSHIMM 1      // push 1 onto the stack
ADD            // add 1 to the current value of x
STOREOFF 2     // store the new value of x
JUMP loop      // repeat the loop
done:          // continue with remaining program

You may choose arbitrary label names, though you should try to choose names that indicate their purpose. Be aware that a single program can contain multiple if- and while-statements, so make sure that you don't re-use labels.  One way to ensure that all labels are unique is to use a counter written into each label and then incremented.

Error Handling

We will test your compiler's response to errors in supplied Bali programs. There are three kinds of errors that can occur in Bali programs:

For example, a missing semicolon or an unmatched parenthesis is a syntax error; a type mismatch or an undeclared variable is a semantic error, and an attempt to divide by zero is a runtime error.

Note: You are not responsible for handling runtime errors. It's possible to show (CS 381) that there is no way to detect all division-by-0 errors (or other typical runtime errors) without actually running the program. If a runtime error occurs then we depend on SaM to detect it and provide an error message.

For the errors that you do handle, rather than printing an error message to System.out (or System.err), errors in supplied Bali programs can be handled most clearly and cleanly by using exceptions. We have supplied exception classes corresponding to the two kinds of errors that can occur during compilation:, the main program that runs your compiler, is designed so that it catches BaliExceptions and then prints the exception's message.


Good Java programming practice dictates that we make use of Java packages for a programming project of significant size.  To this end, most of the files we provide reside in the "bali" package. Java requires that the package statement at the beginning of a file must correspond to where the file is stored in the file system; in other words, you'll need to create a bali folder to hold any files that are part of the bali package.

We encourage you to organize your compiler project into appropriate packages.  You are welcome to use the bali package and you may want to consider creating subpackages.  If you haven't used packages before, you might want to take a look at the Java Package Tutorial.

Provided Files

These provided files should not be changed.  The exception is; it is expected that you will replace this file with your own version.

The Tasks

Write a Bali compiler that translates Bali code (satisfying the Bali# Grammar for Part 2) into valid sam-code. You may use any of the sam-code instructions; these are listed in the Sam Design Document

Submission Instructions for Part 2:

Bonus Work

Warning: Do not spend time on the bonus work unless your compiler already performs as required for Part 2.

Multiple Error Reporting (up to 10 bonus points)

Detect and report multiple errors in Bali programs. We have provided the MultipleBaliException class for those groups who wish to try detecting multiple errors. Suggested usage instructions have been provided within the file. The basic idea is to catch and accumulate errors, continuing to parse after each error. 

It's relatively easy to handle multiple semantic errors. How does one resume parsing after a syntax error? Consider ignoring tokens until you reach a token that you "recognize". Partial bonus credit is available for those who handle multiple semantic errors, but not multiple syntax errors.