CS412/413 Spring 1999 Introduction to Compilers and Translators
In the fourth programming assignment, we will be extending our Iota compilers to support a more sophisticated language called Iota+. This language extends Iota by adding support for object-oriented programming. It is backwards compatible with Iota. As with the Iota language specification given earlier, if you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision. However, you should contact the course staff to resolve ambiguities.
At the high level, Iota and Iota+ are similar: a program is composed of some number of modules that are described by interfaces. However, both modules and interfaces support additional kinds of declarations. New expression and statement forms also are supported in Iota+. Rather than describe the entire Iota+ language from scratch, this document describes only the additions and changes to the Iota language.
PrettyPrinter (180 lines)
New io
and file
interfaces
We will take the same approach as in the earlier language definition: grammatical productions are intermixed with descriptions of the new language features they correspond to.
As in Iota, a module implementation file starts with a declaration of what external
components it accesses (uses), then continues with a series of definitions. uses
declarations are needed for all components referenced from external interface
files, including interface types and constants. The syntax for a uses
declaration is the same as in Iota.
A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons. All top-level definitions are visible throughout the module; there is no need for forward declarations. Two definitions may not use the same name; nor may they conflict with any identifier defined by a use declaration.
In addition to definitions of functions and variables, as in Iota, new kinds of definitions are supported: interface type definitions, class definitions, and constant definitions. In addition, variable definitions are extended to support initialization expressions that are evaluated when the program starts. The initialization expressions within a module are evaluated in the order in which they occur within the module. When a program contains more than one module, the initialization expressions of the modules are evaluated in some order that is specified at link time. However, all of the initialization expressions of one module in the order are evaluated before the initialization expressions of the next module in the order are begun.
defn ::= formal [ = expr ]
| fn_header = expr
| id = constant
| interface-type
| class
The syntax of an interface also is extended to support new kinds of declarations. In addition to variable and function declaration, an interface supports interface type declarations and constant declarations.
decl ::= formal
| fn_header
| interface-type
| id = constantfn_header ::= id ( [ formals ] ) [ : type ]
An interface also may begin with a series of uses
declarations, which were
not permitted in Iota.
interface ::= [ uses ] decl*
These declarations are useful because they allow an interface to talk about interface
types and symbolic constants declared in the interfaces of other modules. Each compilation
unit in the language -- module or interface -- may have a set of uses declarations. The uses
declarations of other compilation units are not in force in a compilation unit unless it
explicitly contains these declarations. The only exception to this rule is that the
identifiers introduced by uses
declarations of the interface to a module are
in scope in the body of the module as well, just as the interface types and constants
declared by the module's interface are in scope. This rule ensures that a module is
consistent with its own interface.
The uses
declarations in all the interfaces in a program define a
dependency graph among these interfaces, in the sense that one interface depends on
another if it uses an identifier from the second. It is a static error for the graph
of interface dependencies in the program to contain a cycle. For example, two interfaces
may not each contain a uses
declaration that mentions an identifier in the
other interface.
Both interfaces and modules may contain declarations of constants. When an identifiers
is declared to have a constant value, the compiler treats that identifier everywhere as
representing that value, exactly as if the constant expression were used in place of the
identifier. Constants only may be integers, booleans, strings, or the new constant null
.
The type of the identifier is not declared explicitly; it is determined from the constant.
Note that constant identifiers may not be assigned to!
In addition to the built-in types supported by Iota, Iota+ supports two new
kinds of types: class types and interface types. Both class types and interface types are
denoted by using their declared names, or in the case of interface types referenced
through a uses
declaration, the appropriate name introduced by the
declaration. An interface type defined in an interface file may be referenced by both
interface files and module files; however, a uses
declaration is necessary if
the referring module is different from the defining module. A class or interface type may
be used anywhere that a built-in type may be used; in particular, it may be used as a
variable type, a function or method argument type, a return type, and as a parameter to an
array
type.
type ::= id
|object
|int
|string
|bool
|array
[ type ]
A class or interface type supports various operations that are described in the declaration of the type. The compiler permits exactly the operations described in the type declaration and the declarations of any supertypes of the type. An interface type declaration permits only the declaration of object methods:
interface-type ::=
interface
id [extends
id ] { fn_header* }
The method declarations look exactly like function declarations at the top level of an interface; however, they are methods rather than functions. The object of the interface type is implicitly an argument to the method.
An interface type I optionally may declare that it extends
another
interface type I'. In this case, the signature of each method of I must conform to the
signature of the method of the same name in I' (if any), and similarly in every other
interface that it transitively extends. Unlike in Java, different methods of interface and
class types never have the same name; method overloading is not permitted. If a supertype
of the interface type I has a method m with the same name as some method m' declared in I,
then the signature of m must conform to the signature of m'. Conformance requires that the
number of arguments in m and m' must be the same. In addition, the type of each argument
in m must be the same as, or a supertype of, the argument in the corresponding position in
m'. The names of arguments need not be identical in the two interface types. The return
type of m must the same as or a subtype of the return type of m'. If m' has no return
type, then m' may declare any return type or simply not declare a return type.
Just as an interface type declaration is similar to an entire Iota interface, a class
definition is similar to an entire Iota module. A class definition may contain definitions
of fields (instance variables) and methods. Unlike in Java, fields and methods do not
have visibility modifiers such as public:
and private:
; such
modifiers are unnecessary because interfaces determine visibility. Components of a module
that are declared in the module's interface are public, and other components are
automatically private. Note that unlike in Java, classes do not have final fields or
methods; nor are there static fields or methods. In Iota+, the role of static
fields and methods is fulfilled by module variables and functions, which are not attached
to a specific class.
class ::=
class
id [extends
id ] { classdefn* }
classdefn ::= formal
| fn_header = expr
When a class has its optional extends
clause, it must implement every
method that its supertypes declare. The signatures of class methods must conform to the
signatures of all of the interfaces that it extends, transitively. This checking is
performed in exactly the manner described above for interface subtype declarations.
Neither a class nor an interface may introduce a new member (field or method) with the
same name as a member of a supertype. However, the signature of a supertype method may be
refined as described above.
The type system of Iota+ is more complex than that of Iota. In addition to
the new class types and interface types that may be declared by programmers and the
pseudo-type void, we introduce the type Null to represent the type of the
expression null
, representing a null object reference. The new type object
is the supertype of all class and interface types, and of string and array types.
Finally, the pseudo-type bottom is used to represent the type of statements or
expressions that cannot terminate normally.
All types in the Iota+ type system are considered formally to be subtypes of void. This subtype relation makes sense because a containing context expecting a void will discard whatever value it receives; therefore, every value type may be used as a void.
As in Java, null references are considered to be members of every object type. Therefore, the type Null is considered to be a subtype of every class or interface type, and also a subtype of the built-in types string and array[T]. However, it is not a subtype of the built-in types bool and int.
The pseudo-type bottom represents non-termination, and is considered to be the
subtype of every other type in the type system, including Null. Its name makes
sense because it is the bottom of the pre-order (actually, lattice) on types defined by
the subtype relation. The top of the pre-order is the type void, which equally well
could be named top. The type bottom is the universal subtype because an
expression that does not terminate normally will never produce a value that its containing
context does not expect. For example, the return
statement can be given the
type bottom, since it does not terminate with an ordinary value. Consider the
following statement:
x: int = { return "foo"; }
This statement cannot result in a run-time type error, because no value ever will be assigned to x.
Note that Iota+ gives no way for the programmer to talk about the types void,
Null, and bottom. These types are part of the type system but may not be
used in declarations. Their names are not keywords or reserved words and may be used for
other purposes. However, the type object may be denoted in a program using the
keyword object
, just as with the types int and bool.
The subtype hierarchy of Iota+ just described is diagrammed as follows:
A method is defined as an expression that is evaluated at the time that the function is
called, initializing the variables of the formal arguments with the actual values passed.
As in Java, when a value of class type or interface type is passed, the called method
receives references to the same objects that the calling code passed. A method body
differs from a function body in that the additional implicit argument this
,
representing the receiver object, is in scope within the method body. Unlike the other
arguments, this
is not an assignable variable.
The syntax for formal arguments of both functions and methods is extended to allow a
more compact denotation of multiple formals arguments in a row sharing the same type. For
example, the argument list x,y: int, z: bool
is equivalent to the argument
list x:int, y:int, z:bool
.
formals ::= formals , formal
| formal
formal ::= ids : typeids ::= id | ids , id
Iota+ adds several new kinds of expressions to Iota.
Expressions may take one of several new forms.
simple_expr ::= ...
|new
typeprimary ::= ...
|this
| primary . function_call
|cast
( expr , type )variable ::= ...
| primary . idconstructor ::=
new
type [ expr ] ( expr )
The expression this
may be used within method bodies, as described
earlier, to denote the receiver object. In an ordinary function body, it is a static error
to use this expression. Note that the identifier this
is a keyword and cannot
be used as the name of an ordinary variable.
The expression new
C creates a new object of the class C,
where C is the name of the class. When a new object of a class is created, all of
its fields are initialized to their default values. For all types that are supertypes of Null,
the default value of the type is null
. For integers, the default value is 0,
and for booleans, it is false
. Unlike in Java, classes have no constructors.
By convention, classes declare a method init
that performs proper
initialization of an object, establishing necessary representation invariants. The
pretty-printer sample code above contains some examples of this convention.
The dot operator (.) is used to indicate an access to one of the members of an object, either a field or a method. These accesses operate as in Java. The static type of the left-hand-side of the dot operator (the receiver) is used to determine whether the field access or method invocation is legal. A field access can be legal only if the type of the receiver is a class type, because interface types have no fields. In the case of a method invocation, the receiver type may be either a class type or an interface type. The arguments passed to the method must agree with the types of formal arguments of the method, as declared in the static type of the receiver, or, if the static type of the method does not declare the method, in its closest (lowest) supertype that declares the method.
Within a method, the fields and methods of the receiver object are automatically placed
in scope and can be accessed as if they were module variables or functions. For example,
if the class of the receiver object contains a field named a
, then the
expressions this.a
and a
are equivalent within that method body.
No scoping conflict between fields and methods of the object and other identifiers is
permitted. The names of fields of a class must differ from the names of all visible
variables, functions, classes and interface types.
The binary operators are extended to complete the usual set of six relational
operators, with the usual semantics. Note that booleans can be compared using the
relational operators, and that false
is consider to be less than true
.
Strings are compared lexicographically, just as in Java (This is true in Iota as well).
Arrays only support the ==
and !=
relational operators, which
implement pointer equality: two arrays are considered equal only if they are exactly the
same object. Two values may be tested for equality or inequality even if they do not have
exactly the same static type. Two values may be compared as long as the static type of one
value is a supertype of the static type of the other, and neither has the static type void.
The pseudo-function cast
provides a way to change the type of an object
reference. It takes two arguments: an expression and a new type. The first argument is the
expression whose value is to be cast; the second argument is the desired type of the cast
expression. This expression requires that both the static type of the expression being
cast and the desired target type are subtypes of object. The validity of the cast
expression is checked at runtime, and invalid casts result in immediate termination of the
program.
binary_op ::= ... |
<=
|>=
|!=
The new constant expression null
returns the null object, whose type is Null,
as described earlier. Since Null is a subtype of every object type, the null object
is a member of every object type. The effect of this membership is that every access to an
object member (field or method) must check first whether the object is the null object. If
so, the program halts immediately with an appropriate error message. This check also is
required for all string and array operations, such as array and string index expressions
and string concatenation.
constant ::= ...
|null
Array index expressions are required in Iota+ to be checked at run time to ensure that the array index is not out of bounds.
In certain contexts where a value of type bool is expected in Iota, a value of
any subtype of object may be used instead. The value is automatically converted
to a boolean in the following way: null
is converted to false
,
and all other values are converted to true
. The contexts where an automatic
conversion is performed are the following: the test expressions of if
and while
statements, and the arguments to the logical operators !
, &
and |
. For example, the following two if
statements are
equivalent:
if (o != null) ... if (o) ...
Note that there is no automatic conversion from integers to booleans, as in the C language.
Iota+ adds a few more statement forms to Iota. The new increment and decrement statements increase or decrease the value of an integer variable by one:
stmt ::= ...
| variable++
;
| variable--
;
|break
;
The new statement break
terminates the closest enclosing while
statement, just as in Java. A break
statement may occur only within a while
statement.
The syntax of assignment is extended to permit multiple assignment within a single
statement. An assignment may take the form a = b = c =
E, where a-c
are variables, and E is the expression to be assigned to each of the variables. The
production stmt ::= variable = expr is replaced by a new
production stmt ::= assignment. The non-terminal assignment is
defined as follows:
assignment ::= variable = expr
;
| formal = expr;
| variable = assignment
| formal = assignment
Note that assignment is a right-associative operator: first, the expression is evaluated, then each of the expressions denoting variables are evaluated in turn from right to left, with each assignment completed before the evaluation of the previous variable.
In Iota, a local variable declaration may include an initializing expression. In
Iota+, multiple local variables may be declared in the same statement, because
of the redefinition of the formal non-terminal. For example, the statement i,j:
int = 1
introduces two variables i
and j
, and initializes
them both to 1. The right-hand side expression is evaluated only once, not once for each
variable being initialized. The right-hand side expression may be an assignment itself,
and the declaration and initialization of new variables may serve as assignments as well.
Note that a statement may still consist of a variable declaration without
initialization, as in Iota: the production stmt ::= id :
type [ = expr ] ;
is replaced by the production stmt ::= formal
;
Certain statements in Iota+ have the type bottom: break
statements and return
statements. In addition, an if
statement
produces the type bottom if both of its arms have that type, as in the following
example:
if (b) return 0; else return 1;
However, the type of this if
statement results simply from the application
of the usual rule for checking if
statements statically.
A statement whose type is bottom may not be followed by any other statement, since that following statement could not be executed. In other words, if such a statement occurs within a block of statements, it must occur at the end of the block.
This typing has the advantage that it permits functions to end with return
statements, which avoids confusing programmers used to C or Java syntax. For example, the
function
inc(x: int): int = { return x+1; }
is legal because the type of the function body is bottom, which is a subtype of int, and the return statement returns the proper type.
Iota+ introduces a number of new keywords that have been mentioned in this
document. The keywords of the language include all the keywords of Iota ( array
,
bool
, false
, if
, int
, length
,
return
, string
, true
, uses
, while
),
and the new keywords mentioned here ( break
, cast
, new
,
null
, object
, this
).
Note also that Iota+ adds new multi-character non-alphabetic tokens, such as
++
, --
, <=
, >=
, !=
.
The following list summarizes all the new features of Iota+, for convenience in checking whether the compiler properly implements the language.
uses
in interface with acyclic constraint extends
clause for both interfaces types and classes new C
expression init()
method this
expression cast
expression a = b = c
) return
and break
as bottom Iota+ provides the same two standard modules for programming: io
and conv
. However,
the io
module is extended with some additional functionality. It also adds
new interfaces for input and output stream objects, and provides objects stdin
and stdout
that may be used to read and write data from and to the standard
input and output device. The new declarations in the interface are as follows:
interface OutputStream { print(s: string) // output the string s putc(c: int) // output the character whose ASCII value is c flush() // make sure any pending output is flushed } interface InputStream { readln(): string // read a line from the input getc(): int // read a single character from the input eof(): bool // return whether the end of file has been reached on the input } stdout: OutputStream stdin: InputStream
In addition, a new file
module is provided for file
I/O:
interface FileOutput extends OutputStream { close() } interface FileInput extends InputStream { close() } read(filename: string): FileInput // Open the named file for reading. Return null on failure. create(filename: string): FileOutput // If the file does not exist, create it and return a FileOutput that // will write to the file. If the file cannot be created, return null. append(filename: string): FileOutput // If the file can be opened, return a FileOutput that appends to it. // Otherwise, return null. remove(filename: string): bool // Return true if the file can be removed