io.ta (n) 1: the 9th letter of the Greek alphabet (i) 2: an infinitesimal particle or amount
Iota is a simple language for which we will be writing compilers in CS412/CS413. The letter iota is the Greek equivalent of the letter "I"; you can think of this language as "J--". Later in the course some extensions will be added to Iota that will make it into a stripped-down version of the Java programming language; the extended version of Iota will be known as Iota+. This language definition will be updated periodically as the need for clarifications or corrections arises. If you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision. You also may contact the course staff to resolve ambiguities.
A program in Iota is composed of some number of modules. A module collects together some functions and state in the form of variables. Components of a module may be used from other modules if they are exported in the module's interface. A module is defined in two files: an implementation file and an interface file. The implementation file provides all the actual code associated with the module, and the interface file declares which parts of the module are accessible from other modules. The name of a module is determined by the name of the file containing its definition. A module named M has an implementation file named M.mod and an interface file named M.int.
A module must define everything that its interface declares, but it may also define additional variables and functions that are not accessible from outside the module.
fibonacci(x: int): int = ( if (x < 2) 1 else fibonacci(x - 1) + fibonacci(x - 2) ) square(x: int):int = x*xAn interface for the same module might look as follows:
fibonacci(x: int): int /* compute the x'th Fibonacci number */ square(x: int):int /* compute the square of x */Here is some Iota code to find the greatest common divisor of two numbers:
gcd(x: int, y: int): int = ( while (!(x == 0)) ( if (!(x < y)) x = x % y; else ( // swap x and y temp:int = x; x = y; y = temp; ) ); y )Code for a quicksort routine and a Hello World program is also given at the end of this document. Examples of interfaces are presented in the description of the standard Iota libraries.
The non-terminal uses represents a declaration that the code of this module will use an external component. There are two kinds of use declarations. A use of the form M.N, where M and N are identifiers, means that the identifier N in module M may be used in this module under the name N. A use of the form V = M.N means that the identifier N in module M may be used in this module under the name V. All identifiers are simple identifiers; qualified identifiers of the form M.N may appear only in uses declarations. The latter form is helpful in avoiding name conflicts between different modules. Note that to understand what kind of component N is, the interface file for the other module M is checked.
module ::= [ uses ] defn*
uses ::= uses use ( , use )*
use ::= id . id | id = id . id
A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons. All top-level definitions are visible throughout the module; there is no need for forward declarations. Two definitions may not use the same name; nor may they conflict with any identifier defined by a use declaration.
A definition may define two kinds of functions: functions that return a value, and functions that do not. Functions of the former sort are indicated by writing the type of their return value after the closing parenthesis of the formal arguments.
defn ::= id : type
| id ( [ formals ] ) [ : type ] = expr
As in Java, the built-in types int, string, and bool are all immutable. A variable of this type can be reassigned to a different value of the type, but the value itself cannot be modified. Arrays, as in Java, are mutable values. Array elements can be assigned new values, changing the contents of the array. The length of an array is not mutable, just as with Java.
A formal declares an argument to a function and its type. A function may be called only if the actual arguments to the function have a type that is compatible with (in Iota, the same as) the type of the formal arguments.
type ::= id | array [ type ]
formals ::= formals , formal | formal
formal ::= id : type
interface ::= decl*
decl ::= id : type
| id ( [ formals ] ) [ : type ]
Expressions may take one of several forms:
There are three kinds of constants in Iota: integer constants, string constants, and boolean constants. The forms of integers and strings are described below in Lexical considerations. The non-terminal primary represents an expression with high precedence -- an expression that can be used as the left-hand-side of an array index. These expressions include variable names, array index expressions, and function call expressions (which are described below).
expr ::= constant
| expr binary_op expr
| unary_op expr
constant ::= string
primary ::= variable
variable ::= id
| primary [ expr ]
Operators in Iota may be binary operators such as the operator +, which are written in the usual form, e.g. "1+2". There are also unary operators such as negation (e.g., -2) and logical negation (e.g., !done). Operators have the same precedence and associativity that they do in Java, and unless specified, the same meaning.
The operator + may be be used to add two integers or to concatenate two strings. The operators -, *, /, and % operate only on integers. A divisor or modulus of zero is checked, and results in the program terminating. The operators & and | operate only on booleans; they are short-circuit operators as in Java. The operator == operates on any two values of the same type. For strings, it tests whether the two strings contain exactly the same characters, which is different from Java. For arrays, the operator
binary_op ::= + | - | * | / | % | & | | |
unary_op ::= - | ! | length
==tests whether they are the same array object (pointer equality). The
>operators may be used to compare integers in the ordinary way; it may also be used to compare strings to determine their lexicographic ordering. The unary operator length reports the length of arrays and strings, as an integer.
Brackets may be used to index both arrays and strings, extracting either the component of the array with that index, or the ASCII code of the character with that index, respectively. Both arrays and strings are zero-indexed, so the largest index that may be used is one less than the length of the array or string. Out-of-bounds errors are checked for and terminate the program with an error if encountered.
constructor ::= new type [ expr ] ( expr )
A new array object is created by an array constructor expression. The keyword new is followed by element type, the number of elements in the array, and the expression to initialize the elements with. For example, the expression (new int(1)) creates an array[int] containing 5 elements, which all are initialized to contain the integer 1.
An important feature of array constructors is that the expression that
denotes the value to be stored into the array elements behaves as if it is
evaluated separately for every distinct array index. This semantics would make
no difference for the example above; however, it matters in the case of arrays
of arrays. The following expression produces an array of arrays of integers --
effectively a matrix -- and initializes every element to 0. It is important that
new int(0) be evaluated twice so that there are
actually four separate integer cells in the final data structure.
a: array[array[int]] = new array[int](new
A function call looks the same as in Java. The optional list of expressions is separated by commas. The types of the expressions must match the declared types of the corresponding formal arguments in the interface of the module that the identifier comes from.
function_call ::= id ( [ exprs ] )
exprs ::= exprs , expr
The possible statement forms are shown below. A list of statements in parentheses can be used as a statement, as can a simple (non-parenthesized) expression. The value of this statement is the value of the expression.
stmt_expr ::= ( [ stmt ( ; stmt )* ] [ ; ] )
A new variable may be introduced by a statement, as in the third production. As in Java, this variable remains in scope in subsequent statements until the closing brace of the tightest enclosing pair of braces that surround this statement. The new variable may optionally be initialized with an assignment to an expression. If not, the value of the variable is undefined until its first assignment. An assignment to an existing variable (which may be an array index) may be used as a statement in Iota, but not as an expression (unless it is placed in braces). The value of an assignment or variable initialization is the assigned value.
An if statement evaluates the initial expression and executes the following statement if the expression evaluates to true. To preserve some syntactic compatibility with Java, an extra semicolon may optionally be written before the keyword else. The expression must have type bool. If the expression evaluates to false, the else clause is evaluated if present. The statement has a value only if it has an else clause, both possible statements have values, and their type is the same -- this type is then the type of the whole if statement's value. Thus, a Java statement such as
stmt ::= expr
| id: type [ = expr ]
| variable = expr
| if ( expr ) stmt [ [ ; ] else stmt ]
| while ( expr ) stmt
| return expr
x = (x < y) ? x : y;can be expressed in Iota as
x = (if (x < y) x else y)A while statement operates as in Java. It has no value. The expression tested must have boolean type.
A return statement may either return a value or not. A return
statement in a function with no return type may not return a value; a return
statement in a function with a return type must return a value. A return
statement has no value itself.
Since the scope of each variable temp is limited to its respective block of statements, the variables may have the same name.( temp: int = x1; x1 = y1; y1 = temp ) ( temp: int = x2; x2 = y2; y2 = temp )
In the case of an identifier that is imported through the "uses id = id . id" syntax, only the left-hand side of the assignment is considered to be the globally visible name; the identifiers on the right-hand-side are not globally visible and therefore may be the same as other identifiers in the module. The reason for the id = id . id syntax is to allow external identifiers to be renamed in cases where they might otherwise create a conflict.
Integer literals (integer, above) are either 0 or a digit in the range 1-9, followed by a sequence of digits in the range 0-9. Negative integers are expressed as a unary negation operator followed by a positive integer constant. Integers in Iota have the same range as in Java. The largest integer literal is 2147483648 (231). All integer literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator -.
String literals (string, above) are defined as in Tiger (Appel,
p. 526), except that the additional escape sequence
the two-character sequence (carriage return, line feed) (ASCII 13, ASCII 10).
Boolean literals must be either the keyword true or the keyword false.
Identifiers must begin with an alphabetic character. Following the initial
alphabetic character may be any sequence of alphabetic characters, numeric
characters, or the underscore character ( _ ). Uppercase and lowercase
alphabetic characters are both considered alphabetic and are distinguished,
X are different identifiers. Keywords in the language (int,
array, string, if,
and false) may not be used as identifiers.
main(args: array[string]): intIota requires that a function named main in any module must have exactly this signature. When the Iota program is run, this function will be evaluated, passing any command-line arguments to the argument variables args. The expression (length args) can be used to determine the number of arguments. The return status of the program is defined by the return value of main. When a set of modules are combined to form a program, there may be only one function named main among all the modules.
print(s: string) // output the string s to the standard output device printi(i: int) // output the integer i as a formatted integer to standard // output. Equivalent to print(itos(i)) putc(c: int) // output the character whose ASCII value is c to standard output readln( ): string // read a line from the standard input device getc( ): int // read a single character from standard input eof(): bool // return whether the end of file has been reached on standard input
itos(i: int): string // Produce a formatted version of i. stoi(s: string, err: int): int // Parse s as a formatted integer. // Return err if the string could not be parsed. itoc(i: int): string // Produce a string containing the single character whose ASCII code is i. atos(a: array[int]): string // Produce a string containing the characters whose ASCII codes are in a, // in proper sequence stoa(s: string): array[int] // produce an array containing the ASCII codes of all the characters in s, // in proper sequence
quicksort(a: array[int], low: int, high: int) = ( // A good sorting routine for large arrays. Not a stable sort. if (!(low < high)) return; mid: int = partition(a, low, high); quicksort(a, low, mid); quicksort(a, mid + 1, high) ) partition(a: array[int], low: int, high: int): int = ( /* Reorder the elements in a into two contiguous groups. If ret is the return value, then the first group is the elements in low..ret, and the second group is the elements in ret+1..high. Each element in the second group will be at least as large as every element in the first group. */ x: int = a[low]; i: int = low - 1; j: int = high + 1; while (true) ( while (a[(j = j - 1)] > x) (); while (a[(i = i + 1)] < x) (); if (i < j) ( temp: int = a[i]; a[i] = a[j]; a[j] = temp; ) else return j ) )
uses io.print main (args: array[string]) : int = ( print ("Hello World!\N"); 0 )