Iota Language Definition

CS 412/413
Introduction to Compilers
Cornell University Computer Science Department, Spring 2001

Iota Language Definition

io.ta (n) 1: the 9th letter of the Greek alphabet (i) 2: an infinitesimal particle or amount

Iota is a simple language for which we will be writing compilers in CS412/413. The letter iota is the Greek equivalent of the letter "I"; you can think of this language as "J--". This language definition will be updated periodically as the need for clarifications or corrections arises. If you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision. You also may contact the course staff to resolve ambiguities. Later in the course, some extensions will be added to Iota that will make it into an object-oriented language a little like the Java programming language; the extended version of Iota is known as Iota⁺.

A program in Iota is composed of some number of modules. A module collects together some functions and state in the form of variables. Components of a module may be used from other modules if they are exported in the module's interface. A module is defined in two files: an implementation file and an interface file. The implementation file provides all the actual code associated with the module, and the interface file declares which parts of the module are accessible from other modules. The name of a module is determined by the name of the file containing its definition. A module named M has an implementation file named M.im and an interface file named M.ii.

A module must define everything that its interface declares, but it may also define additional variables and functions that are not accessible from outside the module.

Examples

The following Iota module implementation defines the Fibonacci function and a function for squaring numbers:

fibonacci(x: int): int = (
    if (x < 2) 1 else fibonacci(x - 1) + fibonacci(x - 2)
)

square(x: int):int = x*x

An interface for the same module might look as follows:

fibonacci(x: int): int
  /* compute the x'th Fibonacci number */
square(x: int):int
  /* compute the square of x */

Here is some Iota code to find the greatest common divisor of two numbers:

gcd(x: int, y: int): int = (
    while (!(x == 0)) (
	if (!(x < y)) x = x % y
	else ( // swap x and y
            temp:int = x;
            x = y;
            y = temp;
        )        
    );
    y
)

Code for a quicksort routine and a Hello World program is also given at the end of this document. Examples of interfaces are presented in the description of the standard Iota libraries.

Language grammar and overview

In the following section, the grammar and semantics of the language are described in an interleaved fashion. The grammar is presented in a version of extended Backus–Naur form. A production written in class as A ! B C will be written here as A ::= B C. Non-terminals are written in italics, terminals in normal or typewriter font. Terminals that represent tokens are written in bold type (e.g., string). Quotation marks are placed around terminals where there might be an ambiguity. Italic brackets ( [ ] ) indicate an optional item in the production (sometimes a question mark is used instead in EBNF to indicate optional syntax), and a superscript asterisk (*) indicates a list of zero or more items. Italic parentheses ( ( ) ) are used for grouping.

Module and declaration Structure

A module implementation file starts with a declaration of what external components it accesses (uses), then continues with a series of definitions.

module ::= [ uses ] defn*
uses ::= uses use ( , use )*
use ::= id . id | id = id . id

The non-terminal uses represents a declaration that the code of this module will use an external component. There are two kinds of use declarations. A use of the form M.N, where M and N are identifiers, means that the identifier N in module M may be used in this module under the name N. A use of the form V = M.N means that the identifier N in module M may be used in this module under the name V. All identifiers are simple identifiers; qualified identifiers of the form M.N may appear only in uses declarations. The latter form is helpful in avoiding name conflicts between different modules. Note that to understand what kind of component N is, the interface file for the other module M is checked.

A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons. All top-level definitions are visible throughout the module; there is no need for forward declarations. Two definitions may not use the same name; nor may they conflict with any identifier defined by a use declaration.

A definition may define two kinds of functions: functions that return a value, and functions that do not. Functions of the former sort are indicated by writing the type of their return value after the closing parenthesis of the formal arguments.

defn ::= id : type
| id ( [ formals ] ) [ : type ] = expr

Types and Formals

Iota only supports a limited set of types: the built-in types: the types int, string, and bool. Also, for any type T, the type array[T] represents an array of that type. So we can write the types array[bool] and array[array[string]]. No record (struct) or object types are supported; they will appear in Iota+.

As in Java, the built-in types int, string, andbool are all immutable. A variable of this type can be reassigned to a different value of the type, but the value itself cannot be modified. Arrays, as in Java, are mutable values. Array elements can be assigned new values, changing the contents of the array. The length of an array is not mutable, just as with Java.

type ::= int | bool | string | array [ type ]

A formal declares an argument to a function and its type. A function may be called only if the actual arguments to the function have a type that is compatible with (in Iota, the same as) the type of the formal arguments.

formals ::= formals , formal | formal
formal ::= id : type

Interface and declarations

The syntax of an interface is similar to that of an implementation; it is a series of declarations of variables or functions. The types of any identifiers defined in the interface must agree with the corresponding types in the module implementation of the same name.

interface ::= decl*
decl ::= id : type
| id ( [ formals ] ) [ : type ]

Expressions

A function is defined as an expression that is evaluated at the time that the function is called, initializing the variables of the formal arguments with the actual values passed. As in Java, the calling semantics are call-by-sharing: the formal argument variable and the actual argument share the same value, at least until the argument variable is assigned to. Assignments to the argument variable do not affect the value passed; however, if the value passed was an array, assignments to elements of that array will be visible from the calling context as well, since it shares the same array object.

Expressions may take one of several forms:

expr ::= constant
               |    primary
               |    expr binary_op expr
               |    unary_op expr
               |    constructor
constant ::= string
               |   integer
               |   true
               |   false
primary ::= variable
               | function_call
               | stmt_expr
variable ::= id
               |   primary [ expr ]

There are three kinds of constants in Iota: integer constants, string constants, and boolean constants. The forms of integers and strings are described below in Lexical considerations. The non-terminal primary represents an expression with high precedence -- an expression that can be used as the left-hand-side of an array index. These expressions include variable names, array index expressions, and function call expressions (which are described below).

Operators in Iota may be binary operators such as the operator +, which are written in the usual form, e.g. "1+2". There are also unary operators such as negation (e.g., -2) and logical negation (e.g., !done). Operators have the same precedence and associativity that they do in Java, and unless specified, the same meaning.

binary_op ::= + | - | * | / | % | & | | | == | < | >
unary_op ::= - | ! | length

The operator + may be be used to add two integers or to concatenate two strings. The operators -, *, /, and % operate only on integers. A divisor or modulus of zero is checked, and results in the program terminating. The operators & and | operate only on booleans; they are short-circuit operators as in Java. The operator == operates on any two values of the same type. For strings, it tests whether the two strings contain exactly the same characters, which is different from Java. For arrays, the operator == tests whether they are the same array object (pointer equality). The < and > operators may be used to compare integers in the ordinary way; it may also be used to compare strings to determine their lexicographic ordering. The unary operator length reports the length of arrays and strings, as an integer.

Brackets may be used to index both arrays and strings, extracting either the component of the array with that index, or the ASCII code of the character with that index, respectively. Both arrays and strings are zero-indexed, so the largest index that may be used is one less than the length of the array or string. Out-of-bounds errors are checked for and terminate the program with an error if encountered.

constructor ::= new type [ expr ] ( expr )

A new array object is created by an array constructor expression. The keyword new is followed by element type, the number of elements in the array, and the expression to initialize the elements with. For example, the expression (new int[5](1)) creates an array[int] containing 5 elements, which all are initialized to contain the integer 1.

An important feature of array constructors is that the expression that denotes the value to be stored into the array elements behaves as if it is evaluated separately for every distinct array index. This semantics would make no difference for the example above; however, it matters in the case of arrays of arrays. The following expression produces an array of arrays of integers—effectively a matrix—and initializes every element to 0. It is important that the expression new int[2](0) be evaluated twice so that there are actually four separate integer cells in the final data structure.

a: array[array[int]] = new array[int][2](new int[2](0));

A function call looks the same as in Java. The optional list of expressions is separated by commas. The types of the expressions must match the declared types of the corresponding formal arguments in the interface of the module that the identifier comes from.

function_call ::= id ( [ exprs ] )
exprs ::= exprs , expr
| expr

Statements

An expression may also be a sequence of statements contained in parentheses, as shown below. When a sequence of statements is placed in parentheses and used as an expression, the value of the expression is the value of the last statement in the sequence. Not all statements have a value; a sequence of statements in parentheses may be used as an expression only if the last statement has a value. Statements must be separated by a semicolon. A list of statements may optionally be followed by a semicolon, which preserves some syntactic compatibility with Java.

stmt_expr ::= ( [ stmt ( ; stmt )* ] [ ; ] )

The possible statement forms are shown below. A list of statements in parentheses can be used as a statement, as can a simple (non-parenthesized) expression. The value of this statement is the value of the expression.

A new variable may be introduced by a statement, as in the third production. As in Java, this variable remains in scope in subsequent statements until the closing brace of the tightest enclosing pair of braces that surround this statement. The new variable may optionally be initialized with an assignment to an expression. If not, the value of the variable is undefined until its first assignment. An assignment to an existing variable (which may be an array index) may be used as a statement in Iota, but not as an expression (unless it is placed in braces). The value of an assignment or variable initialization is the assigned value.

stmt ::= expr
         |   id: type [ = expr ]
         |   variable = expr
         |   if( expr ) stmt [ else stmt ]
         |   while( expr ) stmt
         |   return
         |   returnexpr

An if statement evaluates the initial expression and executes the following statement if the expression evaluates to true. The expression must have type bool. If the expression evaluates to false, the else clause is evaluated if present. The statement has a value only if it has an else clause, both possible statements have values, and their type is the same -- this type is then the type of the whole if statement. Thus, a Java statement such as

    x = (x < y) ? x : y;

can be expressed in Iota as

    x = (if (x < y) x else y)

A while statement operates as in Java. It has no value. The expression tested must have boolean type.

A return statement may either return a value or not. A return statement in a function with no return type may not return a value; a return statement in a function with a return type must return a value. A return statement has no value itself.

Identifiers

In Iota, two identifiers with the same name never overlap in scope. The globally visible names in a module (whether variables, functions, or imported identifiers) all must have distinct names. Similarly, local variables and formal argument variables within a particular function definition may not have the same names as any globally visible identifiers, or the same name as any other variable that is in scope at the point of their declaration. Local variables in different functions may have the same names, since their scopes do not overlap, and local variables within the same function also may have the same names if their scopes do not overlap, as in the following example:

( temp: int = x1; x1 = y1; y1 = temp )
( temp: int = x2; x2 = y2; y2 = temp )

Since the scope of each variable temp is limited to its respective block of statements, the variables may have the same name.

In the case of an identifier that is imported through the "usesid = id . id" syntax, only the left-hand side of the assignment is considered to be the globally visible name; the identifiers on the right-hand-side are not globally visible and therefore may be the same as other identifiers in the module. The reason for the id = id . id syntax is to allow external identifiers to be renamed in cases where they might otherwise create a conflict.

Lexical considerations

Both forms of Java comments are supported in Iota; a comment beginning with the characters // indicates that the remainder of the line of code is a comment. In addition, a comment may begin with the characters/*, in which case all of the characters, including newline characters, up to the next occurrence of the characters */, is part of a comment.

Integer literals (integer, above) are either 0 or a digit in the range 1..9, followed by a sequence of digits in the range 0..9. Negative integers are expressed as a unary negation operator followed by a positive integer constant. Integers in Iota have the same range as in Java. The largest integer literal is 2147483648 (2³¹). All integer literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator -.

String literals (string, above) are largely as defined for Tiger (Appel, p. 526), with a couple of exceptions. A new escape sequence \N represents the line separator character sequence for the system on which the compiler is run. This sequence can be obtained by evaluating System.getProperty("line.separator"); on Windows machines it stands for the two ASCII characters CR (13) and LF (10) in that order. As defined in Appel, the escape sequence \^c, where c is an arbitrary character, stands for a control character. To clarify this definition, c may be any character whose ASCII code is in the range 64..126; the escape sequence stands for the control character whose ASCII code is in the range 0..31 and is equal to the ASCII code of c, modulo 32.

Boolean literals must be either the keyword true or the keyword false.

Identifiers must begin with an alphabetic character. Following the initial alphabetic character may be any sequence of alphabetic characters, numeric characters, or the underscore character ( _ ). Uppercase and lowercase alphabetic characters are both considered alphabetic and are distinguished, so x and X are different identifiers. Keywords in the language (int, string, bool, array, uses, if, else, while, return, new, length, true and false) may not be used as identifiers.

The main program

Any module can serve as a main program, by defining a function main with the following signature:

main(args: array[string]): int

Iota requires that a function named main in any module must have exactly this signature. When the Iota program is run, this function will be evaluated, passing any command-line arguments to the argument variables args. The expression (length args) can be used to determine the number of arguments. The return status of the program is defined by the return value of main. When a set of modules are combined to form a program, there may be only one function named main among all the modules.

Standard Iota Libraries

Several standard modules are provided to make Iota more convenient. Some of these interfaces you will implement yourselves.

io

The io module (io.ii) provides several functions for performing input and output, which are defined as follows:

print(s: string)
    // output the string s to the standard output device
printi(i: int)
    // output the integer i as a formatted integer to standard
    // output. Equivalent to print(itos(i))
putc(c: int)
    // output the character whose ASCII value is c to standard output
readln( ): string
    // read a line from the standard input device
getc( ): int
    // read a single character from standard input
eof(): bool
    // return whether the end of file has been reached on standard input

conv

The conv module (conv.ii) provides conversions between the types of Iota.

itos(i: int): string
    // Produce a formatted version of i.
stoi(s: string, err: int): int
    // Parse s as a formatted integer.
    // Return err if the string could not be parsed.
itoc(i: int): string
    // Produce a string containing the single character whose ASCII code is i.
atos(a: array[int]): string
    // Produce a string containing the characters whose ASCII codes are in a,
    // in proper sequence
stoa(s: string): array[int]
    // produce an array containing the ASCII codes of all the characters in s,
    // in proper sequence

Quicksort

This code is an implementation of the classic quicksort routine; for randomly chosen arrays of length n it sorts in expected O(n lg n) time, but its worst-case time is O(n²).

quicksort.ii:

quicksort(a: array[int], low: int, high: int)
    /* Put the array elements a[low]..a[high] into ascending order.
       Requires that low and high be valid array indices
       and that low <= high. */

quicksort.im:

quicksort(a: array[int], low: int, high: int) = (
    if (!(low < high)) return; 
    mid: int = partition(a, low, high); 
    quicksort(a, low, mid); 
    quicksort(a, mid + 1, high) 
)

partition(a: array[int], low: int, high: int): int = ( 

    /* Reorder the elements in a into two contiguous groups. If
       ret is the return value, then the first group is the
       elements in low..ret, and the second group is the elements 
       in ret+1..high. Each element in the second group will be at 
       least as large as every element in the first group. */

    x: int = a[low]; 
    i: int = low - 1; 
    j: int = high + 1; 
    while (true) ( 
        while (a[(j = j - 1)] > x) (); 
        while (a[(i = i + 1)] < x) (); 
        if (i < j) ( 
            temp: int = a[i];
            a[i] = a[j]; a[j] = temp;
        ) else 
            return j
    );
    0 // dummy value
)

Hello World

Every language must support this classic program!

uses io.print
main (args: array[string]) : int = 
    ( print ("Hello World!\N"); 0 )

CS 412/413 Introduction to Compilers Cornell University Computer Science Department, Spring 2001