Bali Specifications for Part 4

CS 212 - Fall 2007

Bali is designed to be reasonably simple to compile.  The Bali language changes every semester.

We use the following notation throughout this document:

Bali Syntax

program [declarations ] : (class | function)* A program consists of zero or more declarations (global variables) followed by zero or more classes and/or functions.
function type name ( [declarations] ) : [ declarations ] : statement* end A function has a return-type, a name, 0 or more parameters, and a body consisting of optional declarations (local variables) followed by zero or more statements.
class class name : [ declarations ] : function* endclass A class has a name.  To simplify Part 4, there is no inheritance.  The class's body consists of declarations (i.e., fields) followed by zero or more methods (each method looks like a function).
declarations type name ( , type name )* Each declaration is a type followed by a name.
type ( int | boolean | void | name ) [ [ ] ] There are predefined types (int, boolean, and void) and user-defined types (i.e., classes).  There are also corresponding array types (one-dimensional arrays, only). Note that it is a semantic error if void is used other than as a return type.
statement reference = expression ; An assignment statement.  
  reference ; Function call (useful for side-effects).
  if expression then statement* [ else statement*] endif An if-statement has an optional else part.
  loop statement* ( while | until ) expression ; statement* endloop Looping.  Note that there are two blocks of statements.  The loop works by executing the first block, then checking the condition and possibly exiting the loop, then executing the second block.  If the condition causes a loop-exit then the second block is not executed.  After executing the second block the loop starts over again with the first block.  Either statement-block in a loop can be empty or both blocks be nonempty.  
  return [ expression ] ; Return statement. The type of the expression must match the return-type of the function. The no-expression version is semantically valid within a constructor or if the function's return-type is void.
  print expression ( , expression )* ; Print statement.  Multiple expressions can be printed.  Ideally, all are printed on a single line with a space between adjacent items, but this isn't possible with SaM-code.
reference ( name | this ) modifier* A reference typically evaluates to a specific "location" appropriate for storage or retrieval.  Modifiers are evaluated left-to-right. Keyword this is only valid within a class method; this represents the current instance.  If this is used as a function, it calls the constructor for this class.
modifier subscript | functionArgs | fieldRef The modifiers for a reference.
subscript [ expression ] A subscript.
functionArgs ( [ expression ( , expression )* ] ) Function arguments.
fieldRef . name Field reference.
expression [ + | - | not ] term ( binaryOp term )* An expression is a a sequence of terms separated by binary operators.  There can be an optional sign or boolean-negation in front of the first term of an expression; it is applied to the first term.  Operators are evaluated left-to-right (i.e., no precedence rules).
binaryOp arithmeticOp | comparisonOp | booleanOp The three kinds of binary operators.
arithmeticOp + | - | * | / | %  Operators for arithmetic.
comparisonOp < | <= | == | != | > | >= Comparison operators.
booleanOp and | or Boolean  operators (both are short-circuiting).
term literal | ( expression ) | arrayValue | inputValue | reference A term is a literal, a parenthesized expression, an array value, an input value, or a reference.
literal integer | true | false | null Unsigned integers (e.g., 123, 6, 44), boolean constants, and the no-object indicator.
arrayValue type arrayElementList An array value consists of a type indicator (all elements of the array must be of this type) followed by a list of elements.
arrayElementList { [ expression ( , expression )* ] } The elements of the array.
inputValue readInt These "values" are read from standard input; readInt expects an integer (initial blanks are ignored, an initial + or - is OK).
name Any token whose kind is 'w' (see the Scanner212 code). Note that a keyword cannot be used as a name (because keywords are all kind 'K').


Bali Semantics

comments Anything following a # on a line is considered to be a comment. This doesn't show up in the grammar above because it's handled by the scanner (
  • Keywords (those shown in bold-blue above) cannot be used as names. We use 'name' in the set of productions above to refer to an arbitrary word (i.e., a token whose kind is 'w').  Thus, something like endloop as a variable name is a syntactic error.
  • The complete list of keywords:
    • end, void
    • class, extends, endclass, this, super, null
    • if, then, else, endif
    • loop, while, until, endloop
    • return, print
    • and, or, not, true, false
  • The following words have special meaning, but they are not keywords (i.e., they are defined in the global namespace---see below):
    • int, boolean
    • readInt


  • All functions and classes of a program must be in one file.
  • Like Java, Bali relies on an automatic garbage collector (that currently does not exist).
  • Running  a program means the program's main function is executed.
main function
  • There must be a function called main that has an empty parameter list. It is a semantic error if this function does not exist.
  • The return type for the main function must be int; this integer value appears as the exit code when the SaM Simulator halts.
  • Variables have default values : 0 for int, false for boolean, and null for an object reference (a class-type or an array).
  • Variables must be declared before use.
  • Variables declared at the program level are global variables.  Variables declared at the class-level are fields of that class.  Variables declared at the function level are local to that function.
  • At any point in a program there are at most 3 active namespaces: the global level, the class/function level, and the method level.
  • Any name can be used at most once per level.  This means, for instance, that a class name must be different from all global-variable names and different from all function names.  Names can be re-used if they are at different levels.  For instance, a function can have a local variable that has the same name as a global variable; in this case, the global variable is inaccessible from within the function.
  • To determine the meaning of a name that is not a type, Bali first checks the method-level namespace, then the class/function-level namespace, and finally the global-level namespace.
  • A global variable can be inaccessible due to its name being re-used at a lower level, but a class-level name is always accessible via the keyword this.
  • A name does not have to be defined before it is used.  This rule allows, for instance, class A and class B to each refer to the other.  A name always refers to the meaning within the current namespace.  [Comment: This rule is easy to implement because, in effect, we are designing a 2-pass compiler.  In the first pass, we build the AST and determine meaning for each name.  In the second pass, we walk over the AST to detect semantic errors and generate code.]
namespace rules for types
  • All type names in Bali reside in the global namespace.  The types in Bali consist of the predefined types and any classes created by the program.
  • For a type-name (a name that appears as syntactic type as specified in the grammar above), the name is looked up in the global namespace.  All other names are looked up in the up-to-three active namespaces as specified above.
  • The type of the expression in the return statement must match the type specified in the function header. A return statement with no expression is valid iff it's used either within a constructor or with a function whose return type is void.
  • If you "fall off the end" of the function body (i.e., you execute the last statement and that statement is not a return statement) then a default return is automatically executed.  This default-return returns a value using the same default rules as for a variable declaration (i.e., 0 for int, false for boolean, null for an object reference).
  • Functions cannot be overloaded.  There is at most one function with a given name.
  • The declared type (i.e., the type that can be determined from declarations within the program) for an argument of a function call must be compatible with the corresponding parameter type.
  • Functions cannot be assigned to variables since there is no type that can be used to declare such a variable. 
  • Parameters pass by value.
  • Parameter names are local to the current function (see namespaces).
  • Expressions in if and loop statements must be of type boolean.
  • For assignment statements, the types of the left-hand and right-hand sides must be compatible.
  • Rudimentary output is available via the print statement.
assignment statements
  • For an assignment statement to be semantically valid the target of the assignment statement (the reference on the left) must be something that can be assigned to (i.e., it must evaluate to something with an address).
  • This implies that you cannot assign to readInt (unless you redefine it as a local variable).
  • This also implies that you cannot assign to a function-call since the result of such a function-call is a value on top of the stack.  Note though that it's possible to assign to a function-call that is followed by a subscript (i.e., this occurs if the function-call returns an array and then we assign to one of the array-elements).
  • For all binary operators, the operands must be of the same type.
  • For logical operators (and, or, not), operands must be of type boolean.
  • For arithmetic operators (+, -, *, /), operands must be of type int.  Division (/) is integer division.
  • For the mod operator (%), operands must be of type int.
  • For the relational operators (<, <=, >, >=), operands must be of type int.
  • The equality (==) and inequality (!=) operators both work with all types of values, but both operands must be of the same type.  Two arrays are equal if and only if they are both the same object.
  • Bali does not support division by zero, NaN, or infinity.
  • When an array is declared, its initial value is null.
  • The expression type[size] creates an array of the given size. Each element of the array is set to the appropriate default value (see declarations, above).
  • It's also possible to create an array by listing its elements (see the syntax for arrayValue).
  • Each array has a "field" called size. The value of array.size is the declared size of the array.
  • As in Java, all array subscripts are checked at runtime to ensure that they are valid subscripts.
  • Rudimentary input is available using the predefined "variable" readInt.
  • It is a semantic error to use this variable on the left side of an assignment statement.
predefined names
  • The names int and boolean are predefined as global variables.  You can have local variables that use these names if you want; it's legal even though it's bad programming style.
  • The name readInt is also predefined global variables.
  • The namespace rules (see above) imply that it is illegal to have functions or classes that use any of these names.
  • Fields and methods are all local to the class's namespace; they are accessible using their unqualified names from within constructors and methods in the same class.  If there is a local variable with the same name as a field then the field can be accessed using the notation this.fieldName.
  • All fields and methods of a class are public.  There is no equivalent to Java's private.
  • A field and a method cannot share the same name (this is different from Java).
  • There is no new keyword as in Java; instead, a constructor is called much like a function with the class-name used in place of a function-name.
  • Within a constructor, the only semantically valid return-statement is the version with no expression. The effect of the expressionless return statement is as if it is "return this;"
  • If you "fall off the end" of a constructor then a default return is automatically executed.
  • Within a class, the constructor is simply a method with the class-name used as both the return-type and the function name. It can be called as a standard method, although it is only accessible using the notation this.ClassName(args) [because ClassName(args) acts as a constructor].
  • If no constructor is provided, an empty, default constructor (with no parameters) is used.
  • As in Java, this refers to the current class instance.  It is only valid within a method or constructor. 
  • The form this(arguments) calls the constructor (see constructors) of the class on this, the current instance.  Note that the constructor is called, but no new instance is created; in effect it re-initializes the current instance.  It is valid within any method.
  • The form refers to a field of the current instance.
  • The form refers to a method of the current instance.
  • The form this (by itself) refers to the current instance.