Project, Part 2

Compiling Simple Programs

CS 212 - Spring 2004

Due: Monday, March 1, 11:00PM for Part 2A
Due: Monday, March 15, 11:00PM for Part 2B

Objectives

Bali

We are going to be using just a subset of Bali for this assignment. This subset allows only a single function (the main function) and has just two data types (int and boolean). The subset includes variables, expressions, statements, and some simple I/O. Here's a sample program:

// Sample program that does nothing very interesting
int main ( ) {
  int x, y;
  } {
  x = 2;
  if x > 1 then y = x;
  else y = 1;
  return y;
  }

The specifications for the part of Bali being used for this assignment appear in Bali Specs for Part 2.

Allocating, Assigning, and Retrieving Variables

You must use relative addressing for this assignment. This will prepare you for implementing functions (in Part 3); the implementation of recursive functions requires relative addressing.

The Symbol Table

You need a strategy for remembering (1) where each variable is located and (2) the type of each variable (for this assignment, either boolean or int). To do this, you should use a symbol table. In  Part 1, you managed your symbol table manually, but for Part 2, you must explicitly create a symbol table within your compiler. Consider using java.util.Hashtable to implement your symbol table.

Go To

In many programming circles, the goto keyword has a bad reputation. You might be surprised to learn that not all gotos are created equal! SaM requires a form of goto called a jump, so we want you to know a little history of the goto before introducing the jump. You should read this classic paper: “Go To Statement Considered Harmful” by E. W. Dijkstra, Communications of the ACM, Vol. 11, No. 3, March 1968, pp. 147–148, http://www.acm.org/classics/oct95/.

The basic problem is that goto, used indiscriminately, leads to code that is difficult for a human being to understand. Using goto, especially if code is modified by various authors/maintainers, it is easy to produce spaghetti code in which the order of statement execution is hopelessly scrambled. In general, if a language supplies a set of robust control structures, such as while and if, the goto statement is unnecessary.

In your introductory programming course, you might have been told that languages such as Java are high-level and that assembly language is low-level. One difference between high-level and low-level languages is that a high-level language provides more elegant control structures. For instance, suppose that you need to print the values from 1 to 4. A simple while loop easily handles this problem, as shown below in pseudocode.

x = 1
while x < 5:
  print x
  x = x + 1

Assembly language doesn't provide any real control structures, so accomplish the same task in assembly language, something similar to the following pseudocode must be used.

10 x = 1
20 if not (x < 5) goto 60
30   print x
40   x = x + 1
50   goto 20
60 continue

In assembly language, the goto is useful and necessary because the goto provides the only available mechanism for looping, an essential task for most programs.

Labels 

SaM counts each instruction inside a sam-code file, starting with zero. As SaM runs each instruction, SaM's PC register stores the address of the instruction currently being executed. Occasionally, you may need to go to an instruction besides the numerically subsequent instruction, as you would, for instance, in a while-statement. In this case, our assembly code will have to use a jump. Since a jump requires an address to jump to, you must either keep track of instruction numbers or use a label. 

A label is a text string followed by a colon. A label may be on a line before its instruction or on the same line as its instruction.

label:
instruction
label: instruction

Either form of labeling (above or on the same line) is acceptable in terms of style. Internally, SaM replaces each label with the actual instruction number of the instruction.

JUMP and JUMPC

SaM includes two instructions that assist in writing the code for control structures.

The JUMP instruction jumps to the instruction with address t. The JUMPC instruction pops the top of the stack. If the popped value was true, go to the instruction with address t; otherwise, go to the next instruction. The "C" in JUMPC stands for condition (i.e., conditional jump). 

So that an instruction “knows” where to jump, you must label your sam-code instructions, as discussed above. To jump to the instruction below or to the right of label, use JUMP label. The overall structure of a jump and label might look something like the following sam-code.

stuff1
stuff2
JUMP hello
stuff3
hello:
stuff4
STOP

For the code above, assuming the instructions in stuff1 and stuff2 contain no jumps, SaM will execute code in stuff1 and stuff2, then jump to the first instruction below hello: (stuff4), and finally, stop. SaM will not execute stuff3. Note that the following code is identical to the above example because you may write labels to the left of an instruction.

stuff1
stuff2
JUMP hello
stuff3
hello:stuff4
STOP

For a more specific example, try running the following sam-code.

PUSHIMM 2
JUMP blurt
PUSHIMM 100
blurt: PUSHIMM 3
ADD
STOP

After pushing 2 to the top of the stack, SaM jumps to the portion of the code that is labeled as blurt. Then, SaM executes the code in that block, which pushes the value of 3 and adds it to 2. Thus, you will see that SaM returns a value of 5. Note that PUSHIMM 100 never executes. Try this example in the SaM Simulator and keep track of the SP and PC registers.

Addresses

Upon loading a sam-code program, SaM stores each sam-code instruction along with its operand (if any) in internal memory. You can see this for yourself by examining Program.java and SamProgram.java in the source code available on the SaM simulator page. Consequently, each instruction has an address, which starts at 0 for the first instruction. After you load a SaM program, SaM shows the address of each instruction in the form address:instruction in the Program Code panel. SaM will show label information as well, as discussed below. 

Do not think of labels as instructions! SaM numbers each sam-code statement with an address, but the labels become placeholders that are not numbered directly. For instance, run the following program, which has twelve lines of sam-code.

// Program for adding 2 and 3 in a convoluted fashion:
JUMP a          // line 1
d:              // line 2
STOP            // line 3
a:              // line 4
PUSHIMM 3       // line 5
JUMP b          // line 6
b:              // line 7
PUSHIMM 2       // line 8
JUMP c          // line 9
c:              // line 10
ADD             // line 11
JUMP d          // line 12

Do you see how this sam-code will produce a result of 5? SaM knows that labels a, b, c, and d indicate a portion of code to which SaM will jump. Each instruction below a label gets the next address without counting the label itself. So SaM condenses the written sam-code into a set of instructions and their numerical addresses. The figure below shows the Program Code panel for the above example.

The symbols (<= a), (<= b), (<= c), and (<= d) indicate, for each instructions, which labels refer to that instruction. If you find the labeling difficult to trace, inspect the Capture Viewer after execution. The figure below (from the Capture Viewer) shows the instructions in the order in which they are actually executed.

Generating Code for an If-Statement

Consider the following code fragment.

if x>2 then 
    y = 8; 
else 
    y = 0;

As we are generating the code, we use the symbol table to determine (1) that x and y are both integers and (2) the locations of x and y (say, offset 2 for x and offset 3 for y). 

To execute the if-statement, SaM first needs to evaluate a condition, which is placed on top of the stack, before jumping to the correct portion of the sam-code. To evaluate the condition x > 2, you must retrieve the value of x and then push the value 2. The top of the stack will hold the result of evaluating the condition. By calling GREATER, SaM will push 0 (false) or 1 (true), depending on the results of the comparison  (in this case, the result is 1).

// Check if x > 2
PUSHOFF 2    // push the value of x (Vbot)
PUSHIMM 2    // push the value 2 to compare with x (Vtop)
GREATER      // Push result of (Vbot > Vtop) to top of stack

Now that you have accounted for the condition part of the if-statement, you must jump to an appropriate instruction. Calling JUMPC t will move to the label t if the condition is true. Otherwise, SaM will continue executing sam-code because the condition was false. The sam-code below finishes the example. A block of sam-code for the case when x > 2 is true is labeled as correct, though we could have used another name. In addition, when the if and else clauses finish, SaM needs to move to the statement after the entire if-statement, so we have provided the label continue to finish the remaining program.

// Process if statement
JUMPC correct // check if result is true (1) or false (0)
PUSHIMM 0     // false: push the value 0
STOREOFF 3    //        store the value 0 for y
JUMP continue //        continue with remaining program
correct:      // true:
PUSHIMM 1     //        push the value 8
STOREOFF 3    //        store the value 8 for y
JUMP continue //        continue with remaining program
continue:     // continue with program

The code above is a bit redundant. You may remove the second JUMP continue statement since SaM will “fall through” the label after analyzing the block labeled correct.

Generating Code for a While-Statement

The following steps are necessary for a while-statement.

Consider the following code fragment.

x=1; 
while x < 5 do 
   x=x+1;

You need one label to jump to when the loop is done (say, done) and another label to jump to when re-evaluating the condition (say, loop). The fragment leads to the following sam-code.

PUSHIMM 1      // push 1 on the stack
STOREOFF 2     // store the value 1 for x 
loop:          // label the loop starting at the condition
PUSHOFF 2      // retrieve the value of x
PUSHIMM 5      // push the value to compare x with
LESS           // is x < 5 ? push 1 if so; otherwise, 0
NOT            // negate the previous test
JUMPC done     // if not (x < 5), do statements after done
PUSHOFF 2      // retrieve the value of x
PUSHIMM 1      // push 1 onto the stack
ADD            // add 1 to the current value of x
STOREOFF 2     // store the new value of x
JUMP loop      // repeat the loop (goto loop condition)
done:          // continue with remaining program

You may choose arbitrary label names, though you should try to choose names that indicate their purpose. Be aware that a single program can contain multiple if- and while-statements, so make sure that you don't re-use labels. See the example program (SimpleLanguage.java.txt) for one way to produce unique labels when needed.

Error Handling

For this assignment, we will not be testing your compiler's response to errors in supplied Bali programs; all of our test cases for grading will be legal Bali programs. You may, however, want to think about error handling now so that you won't have to rewrite your code when working on Part 3. There are two kinds of errors that can occur in Bali programs:

For example, a missing semicolon or an unmatched parenthesis is a syntax error; a type mismatch or an undeclared variable is a semantic error.

Rather than printing an error message to System.out (or System.err), errors in supplied Bali programs can be handled most clearly and cleanly by using exceptions. We have supplied exception classes corresponding to the two kinds of errors that can occur:

Depending on how you write your code, you may also generate TokenizerExceptions; these are exceptions thrown by the tokenizer when various mismatches occur. The SamTokenizer includes several ways to examine/consume tokens; you can avoid TokenizerExceptions if you try. With correct error handling, TokenizerExceptions should never be thrown.

BC.java, the main program that runs your compiler, is designed so that it catches IlleglaBaliExceptions and then prints the exception's message. For more information on BC.java, see the instructions for Part 2B, below.

Part 2A (10 points)

Note that Part 2A is due before Part 2B.

Submission Instructions for Part 2A:

Part 2B (90 points)

Write a Bali compiler that translates Bali code (satisfying the Bali Specs for Part 2) into valid sam-code. You may use any of the sam-code instructions; these are summarized in SaM Summary

Submission Instructions for Part 2B: