Bali Specifications

CS 212 - Fall 2007

Bali is designed to be reasonably simple to compile.  The Bali language changes every semester.

We use the following notation throughout this document:

Bali Syntax

program [declarations ] : (class | function)* A program consists of zero or more declarations (global variables) followed by zero or more classes and/or functions.
function type name ( [declarations] ) : [ declarations ] : statement* end A function has a return-type, a name, 0 or more parameters, and a body consisting of optional declarations (local variables) followed by zero or more statements.
class class name [ extends name ] : [ declarations ] : function* endclass A class has a name and can inherit from at most one class.  Its body consists of declarations (i.e., fields) followed by zero or more methods (each method looks like a function).
declarations type name ( , type name )* Each declaration is a type followed by a name.
type ( int | boolean | char | float | void | name ) ( [ ] )* There are predefined types (int, boolean, char, float, void) and user-defined types (i.e., classes).  There are also corresponding array types. Note that void is is only valid as a return type and cannot be used within an array type.
statement reference = expression ; An assignment statement.  
  reference ; Function call (useful for side-effects).
  if expression then statement* [ else statement*] endif An if-statement has an optional else part.
  loop statement* ( while | until ) expression ; statement* endloop Looping.  Note that there are two blocks of statements.  The loop works by executing the first block, then checking the condition and possibly exiting the loop, then executing the second block.  If the condition causes a loop-exit then the second block is not executed.  After executing the second block the loop starts over again with the first block.  Either statement-block in a loop can be empty or both blocks be nonempty.  
  return [ expression ] ; Return statement. The type of the expression must match the return-type of the function. The no-expression version is semantically valid within a constructor or if the function's return-type is void.
  print expression ( , expression )* ; Print statement.  Multiple expressions can be printed.  Ideally, all are printed on a single line with a space between adjacent items, but this isn't possible with SaM-code.
reference ( name | this | super ) modifier* A reference typically evaluates to a specific "location" appropriate for storage or retrieval.  Modifiers are evaluated left-to-right.  Keywords this and super are only valid within a class method; this represents the current instance; super represents the current instance treated as an instance of the (syntactic) super-class.  If this is used as a function, it calls the constructor for this class.  If super is used as a function it calls the constructor for the super class.
modifier subscript | functionArgs | fieldRef The modifiers for a reference.
subscript [ expression ] A subscript.
functionArgs ( [ expression ( , expression )* ] ) Function arguments.
fieldRef . name Field reference.
expression [ + | - | not ] term ( binaryOp term )* An expression is a a sequence of terms separated by binary operators.  There can be an optional sign or boolean-negation in front of the first term of an expression; it is applied to the first term.  Operators are evaluated left-to-right (i.e., no precedence rules).
binaryOp arithmeticOp | comparisonOp | booleanOp The three kinds of binary operators.
arithmeticOp + | - | * | / | %  Operators for arithmetic.
comparisonOp < | <= | == | != | > | >= Comparison operators.
booleanOp and | or Boolean  operators (both are short-circuiting).
term literal | ( expression ) | arrayValue | inputValue | reference A term is a literal, a parenthesized expression, an array value, an input value, or a reference.
literal integer | char | string | float | true | false | null Unsigned integers (e.g., 123, 6, 44), single characters (e.g., 'a', 'A', 'x'), strings (e.g., "hello", "there"), unsigned floating-point numbers (e.g., 3.1415926, 6.02e+23, .25, 18.), boolean constants, and the no-object indicator.
arrayValue type arrayElementList An array value consists of a type indicator (all elements of the array must be of this type) followed by a list of elements.
arrayElementList { [ arrayElement ( , arrayElement )* ] } The elements of the array.
arrayElement expression | arrayElementList An array value can contain an "untyped" array element list if it is a multidimensional array.
inputValue readInt | readChar | readLine | readFloat These "values" are read from standard input; readInt expects an integer (initial blanks are ignored, an initial + or - is OK), readChar corresponds to the next input character, readLine brings in the next line as a string (a character array), and readFloat expects a floating point number (initial blanks are ignored, an initial + or - is OK, exponents are OK.)
name any token whose kind is 'w' (see the Scanner212 code). Note that a keyword cannot be used as a name (because keywords are all kind 'K').

 

Bali Semantics

comments Anything following a # on a line is considered to be a comment. This doesn't show up in the grammar above because it's handled by the scanner (Scanner212.java).
keywords
  • Keywords (those shown in bold-blue above) cannot be used as names. We use 'name' in the set of productions above to refer to an arbitrary word (i.e., a token whose kind is 'w').  Thus, something like the use of endloop as a variable name is a syntactic error.
  • The complete list of keywords:
    • end, void
    • class, extends, endclass, this, super, null
    • if, then, else, endif
    • loop, while, until, endloop
    • return, print
    • and, or, not, true, false
  • The following words have special meaning, but they are not keywords (i.e., they are defined in the global namespace---see below):
    • int, boolean, char, float
    • readInt, readChar, readLine, readFloat

program

  • All functions and classes of a program must be in one file.
  • Like Java, Bali relies on an automatic garbage collector (that currently does not exist).
  • Running  a program means the program's main function is executed.
main function
  • There must be a function called main that has an empty parameter list. It is a semantic error if this function does not exist.
  • The return type for the main function must be int; this integer value appears as the exit code when the SaM Simulator halts.
declarations
  • Variables have default values : 0 for int, false for boolean, and null for an object reference (a class-type or an array).
  • Variables must be declared before use.
  • Variables declared at the program level are global variables.  Variables declared at the class-level are fields of that class.  Variables declared at the function level are local to that function.
namespaces
  • At any point in a program there are at most 3 active namespaces: the global level, the class/function level, and the method level.
  • Any name can be used at most once per level.  This means, for instance, that a class name must be different from all global-variable names and different from all function names.  Names can be re-used if they are at different levels.  For instance, a function can have a local variable that has the same name as a global variable; in this case, the global variable is inaccessible from within the function.
  • To determine the meaning of a name that is not a type, Bali first checks the method-level namespace, then the class/function-level namespace, and finally the global-level namespace.
  • A global variable can be inaccessible due to its name being re-used at a lower level, but a class-level name is always accessible via the keyword this.
  • A name does not have to be defined before it is used.  This rule allows, for instance, class A and class B to each refer to the other.  A name always refers to the meaning within the current namespace.  [Comment: This rule is easy to implement because, in effect, we are designing a 2-pass compiler.  In the first pass, we build the AST and determine meaning for each name.  In the second pass, we walk over the AST to detect semantic errors and generate code.]
namespace rules for types
  • All type names in Bali reside in the global namespace.  The types in Bali consist of the predefined types and any classes created by the program.
  • For a type-name (a name that appears as syntactic type as specified in the grammar above), the name is looked up in the global namespace.  All other names are looked up in the up-to-three active namespaces as specified above.
functions
  • The type of the expression in the return statement must match the type specified in the function header. A return statement with no expression is valid iff it's used either within a constructor or with a function whose return type is void.
  • If you "fall off the end" of the function body (i.e., you execute the last statement and that statement is not a return statement) then a default return is automatically executed.  This default-return returns a value using the same default rules as for a variable declaration (i.e., 0 for int, false for boolean, null for an object reference).
  • Functions cannot be overloaded.  There is at most one function with a given name.
  • The declared type (i.e., the type that can be determined from declarations within the program) for an argument of a function call must be compatible with the corresponding parameter type.
  • Functions cannot be assigned to variables since there is no type that can be used to declare such a variable. 
parameters
  • Parameters pass by value.
  • Parameter names are local to the current function (see namespaces).
statements
  • Expressions in if and loop statements must be of type boolean.
  • For assignment statements, the types of the left-hand and right-hand sides must be compatible.
  • Rudimentary output is available via the print statement.
assignment statements
  • For an assignment statement to be semantically valid the target of the assignment statement (the reference on the left) must be something that can be assigned to (i.e., it must evaluate to something with an address).
  • This implies that you cannot assign to readInt (unless you redefine it as a local variable).
  • This also implies that you cannot assign to a function-call since the result of such a function-call is a value on top of the stack.  Note though that it's possible to assign to a function-call that is followed by a subscript (i.e., this occurs if the function-call returns an array and then we assign to one of the array-elements).
expressions
  • For all binary operators, the operands must be of the same type.
  • For logical operators (and, or, not), operands must be of type boolean.
  • For arithmetic operators (+, -, *, /), operands must be of type int or of type float.  Division (/) is integer division when operands are of type int.
  • For the mod operator (%), operands must be of type int.
  • For the relational operators (<, <=, >, >=), operands must be of type int, char, or float.
  • The equality (==) and inequality (!=) operators both work with all types of values, but both operands must be of the same type.  Two arrays are equal if and only if they are both the same object.
  • Bali does not support division by zero, NaN, or infinity.
arrays
  • When an array is declared, its initial value is null.
  • The expression type[size] creates an array of the given size. Each element of the array is set to the appropriate default value (see declarations, above).
  • It's also possible to create an array by listing its elements (see the syntax for arrayValue).
  • Each array has a "field" called size. The value of array.size is the declared size of the array.
  • As in Java, all array subscripts are checked at runtime to ensure that they are valid subscripts.
multidimensional arrays
  • There are two ways to build a multidimensional arrayValue (see above):
    • int[]{int{1,2,3}, int{4,5,6}}
    • int{{1,2,3},{4,5,6}}
  • The two methods cannot both be used within a single arrayValue.
  • As in Java, ragged arrays are allowed (i.e., the rows do not all have to have the same length).
input
  • Rudimentary input is available using the predefined "variables" readInt, readChar, readLine, and readFloat.
  • It is a semantic error to use one of these variables on the left side of an assignment statement.
predefined names
  • The names int, boolean, char, and float are predefined as global variables.  You can have local variables that use these names if you want; it's legal even though it's bad programming style.
  • The names readInt, readChar, readLine, and readFloat are also predefined global variables.
  • The namespace rules (see above) imply that it is illegal to have functions or classes that use any of these names.
classes
  • Fields and methods are all local to the class's namespace; they are accessible using their unqualified names from within constructors and methods in the same class.  If there is a local variable with the same name as a field then the field can be accessed using the notation this.fieldName.
  • A class C inherits from at most one class (say B). B is called the super class of C. C is called a subclass of B.
  • All fields and methods of a class are public.  There is no equivalent to Java's private.
  • A class C inherits all the fields and methods of its super class (and of the super class's super class, etc.).
  • A method specified in some class C can override an inherited method that has the same name. These methods must have the same return type and the types of the parameters must agree exactly.
  • A field specified in some class C can override an inherited field that has the same name.  These fields must have the same type exactly.  (In other words, the same rules are used for both methods and fields; this is different from Java where fields can be shadowed.)
  • A field and a method cannot share the same name (this is different from Java).
constructors
  • There is no new keyword as in Java; instead, a constructor is called much like a function with the class-name used in place of a function-name.
  • Within a class, the constructor is simply a method with the class-name used as both the return-type and the function name. It can be called as a standard method, although it is only accessible using the notation this.ClassName(args) [because ClassName(args) acts as a constructor].
  • Within a constructor, the only semantically valid return-statement is the version with no expression. The effect of the expressionless return statement is as if it is "return this;"
  • If you "fall off the end" of a constructor then a default return is automatically executed.
  • If no constructor is provided, an empty, default constructor (with no parameters) is used.
  • Any call to the constructor of the super class must be done explicitly using a call to super(arguments). (This is different from Java where an implicit call to the no-argument constructor of the super class is made whenever no explicit call is present.)
this 
  • As in Java, this refers to the current class instance.  It is only valid within a method or constructor. 
  • The form this(arguments) calls the constructor (see constructors) of the class on this, the current instance.  Note that the constructor is called, but no new instance is created.  It is valid within any method.
  • The form this.name refers to a field of the current instance.
  • The form this.name(arguments) refers to a method of the current instance.
  • The form this (by itself) refers to the current instance.
super 
  • As in Java, super refers to the current class instance treated as if it were an instance of the super class.  It is only valid within a method.
  • The form super(arguments) calls the constructor (see constructors) of the super class on this, the current instance.  Note that the constructor is called, but no new instance is created.  It is valid within any method.
  • The form super.name refers to a field of the superclass.
  • The form super.name(arguments) refers to a method of the superclass.
  • The form super (by itself) is a semantic error.
upcasting
  • An instance of a class is also considered to be an instance of any of its super classes.
  • An instance of a class can be assigned (in an assignment statement) to a variable whose type is a super class.
  • An instance of a class can be used as an argument in a function that calls for a member of the super class.
  • Int to float is considered to be an instance of upcasting.
downcasting
  • Downcasting must be done explicitly; it is never automatic.
  • Downcasting uses the notation name.Subclass( ) where name is declared to be of type Class.  The result is the same instance, now considered to be of type Subclass.
  • Downcasting is checked at runtime since it cannot be checked at compile-time (aside from a sanity check that Subclass actually represents a subclass of Class).