CS 412/413 Spring 2001

Static Semantics for Iota


NOTE: This page uses some dynamic fonts for some of the math symbols. If the turnstile symbol (`) looks more like a back tick (`) than this (|-), try downloading and installing the TrueType TeX math symbols font. Make sure dynamic fonts are turned on on your browser. Netscape users on Linux may experience problems, apparently because dynamic fonts are not implemented correctly.

This handout presents static semantics for the Iota programming language. The static semantics mirror the syntactic specifications given in that definition; the rules given here can be applied straightforwardly to the result of parsing using those specifications. The static semantics capture the essence of type-checking Iota.

Type contexts

The goal of the type checker is to determine whether a module M is well-typed. We will use "well-typed" and "well-formed" as synonyms, because any well-typed module in Iota is also well-formed: it is a legal module. To check whether a module is well-typed, we must check that all of the definitions in the module are well-typed, which requires showing that all expressions are well-typed. If we see the expression 2 + 2 in a program, we expect that it is well-typed and that it produces an integer (type int). We can expect that the expression 2 + true is not well-typed, at least in most programming languages. What about the expression 2 + x, where x is a variable name? We don't know whether the expression is well-typed unless we know what the type of x is. Because Iota is a statically-typed language, x must have a declared type that we can use to decide whether expressions using it are well-typed.

Therefore, the type-checking of various syntactic constructs (modules, definitions, statements, and expressions) is performed with respect to a type context that keeps track of the legal identifiers that can be used in the construct.  A type context is an environment mapping identifiers to their types: a set of identifier-type bindings of the form x: T, where no identifier x occurs twice. Usually, a compiler implements a type context using a symbol table, which binds an identifier to other information as well (for example, its location in memory). However, keep in mind that a type context is not a mutable data structure; it is a mathematical abstraction that can be thought of as a formal description of the contents of a symbol table at a particular moment in time. We will write a context formally as a sequence of such bindings; for example, x1:T1, ..., xn:Tn where each of the xi are variable names and each of the Ti are types. An empty context (containing no bindings) is written as Ø. We will often denote a context as an extension to an existing context, with new bindings appended to it.  For example, if the context A contains some set of bindings x1:T1, ..., xn:Tn, then the context A, y:Ty contains the bindings x1:T1, ..., xn:Tn, y:Ty assuming that y is not equal to any of the xi. If y=xi for some i21..n, then A, y:Ty has no meaning and is not a legal type context. Thus, the new bindings may not shadow any existing bindings.  Thus, Iota differs from the Tiger language (Appel), where new bindings take precedence.

Type-checking modules

Now, suppose that we want to show that a module is well-typed. The statement, "module M is provably well-typed with respect to its interface I," is an example of a judgment, and we will write this judgment formally as ` M : module, where M is the text of the module and I is the text of the interface. Note that if the module has no interface file, its interface I is the empty interface. The word module is used in the judgment to distinguish judgments about modules from ordinary type judgments and from other judgments that the static checker makes. To check a module, we first construct an type context in which all of the module uses definitions and the components of the module themselves appear. The construction of this initial module context is performed by the function module-ctxt, which is defined below. Similarly, we process the interface file for the module to create a corresponding context containing bindings for all the declarations in the module's interface. Suppose that U represents a single "use" declaration (of the form id.id or id = id.id) and D represents a single definition within a module. For a module to be type-safe, each of its definitions Di must be type-safe in the context of the module. The module must also correctly implement its interface. These checks are captured in the following rule:

M = uses U1...Un D1...Dm
Am
= module-ctxt({U1, ..., Un}, {D1, ..., Dm})
Am
` Di  defn   (i21..m)
interface-ctxt( I ) ² module-ctxt({}, {D1, ..., Dm})


(module)

` M : module

In this rule, the word defn is placed after the variable D to indicate that this is a judgment that a definition is well-formed: the judgment A ` D defn means that the definition D is well-formed in the context A.

The function module-ctxt performs the gathering of information about variables and functions that are visible at the top level, and can be defined recursively as shown below. Here, the function interface-type(N, x) finds the type of the identifier x in the interface named N. It obeys the invariant that T = interface-type(N, x) means x:2 interface-ctxt(parse(N.ii)), where parse(F) represents the parsing of the Iota source file F as a module or interface. The context interface-ctxt(I) is computed similarly to module-ctxt.

Processing definitions:
module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1, id0(id1 : T1, id2 : T2, ..., idn : Tn) = E}) =
    module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1}),  id0: T1 × T2 × ... × Tn!unit
module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1, id0(id1 : T1, id2 : T2, ..., idn : Tn): T = E}) =
    module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1}),  id0: T1 × T2 × ... × Tn!T
module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1, idm : Tm}) =
    module-ctxt
({U1, ..., Un}, {D1, ..., Dm-1}),  idm: Tm
Processing uses declarations:
module-ctxt
({U1, ..., Un-1, id1.id2},{}) =
    module-ctxt({U1, ..., Un-1},{}), id2: interface-type(id1, id2)
module-ctxt({U1, ..., Un-1, id1 = id2.id3}, {})  =
    module-ctxt({U1, ..., Un-1},{}), id1: interface-type(id2, id3)
module-ctxt({},{}) = Ø

Type-checking definitions

There are two kinds of definitions in a module: variable definitions and function definitions. Variable definitions in the module are automatically considered to be safe, so we have this as an axiom: A ` id : T  defn. Function definitions are less trivial. The rule for checking functions that return or do not return a value is as follows:

A, a1 : T1, ..., an : Tn, return : Tr ` E : Tr


(fcn defn)

A ` id(a1 : T1, ..., an : Tn) : Tr = E  defn 


A, a1 : T1, ..., an : Tn, return : unit ` E : Tr


(proc defn)

A ` id(a1 : T1, ..., an : Tn) = E  defn 

These rules say that to check the type safety of a function definition, we check that the body of the function, E, has the expected type Tr that is the declared return type of the function. The second rule describes the case of a function that does not return a value, where the expression E may have any type Tr, since its value, if any, is discarded. The type of the expression E is determined in an context that includes all of the argument variables ai of the function, bound to their corresponding types Ti, which is necessary because E may mention these variables. 

To support return statements, the type context may include a binding for the pseudo-identifier return; this binding keeps track of the return type of the function and is used to make sure that return statements within the function return the right type of value.  The Iota type system has one special type, unit, which does not appear in the syntax of the language.  The type unit is the type of expressions or statements that produce no value. You can think of unit as a type that has only one value -- sort of like a defective boolean that can only represent false. This one value represents the normal, terminating evaluation of an expression or statement. Although the type unit cannot be used in declarations, it is useful in type-checking Iota as a fiction for describing those expressions that lack a useful result value for their surrounding context to use.

There is one additional condition on function definitions not captured by these rules: a function named main must have the type array[string]!int, as described in the Iota language definition.

Type-checking expressions

As in lecture, the judgment for typing both expressions and statements has the same form: A ` E : T and A ` S : T, respectively. Here, E and S represent an arbitrary expression or statement, and T represents the type of the expression or statement's value. These judgments are known as typing judgments or typing assertions.

Judgments for literal expressions are trivial:

A ` string_constant : string
A ` integer_constant : int
A ` true : bool
A ` false : bool

Inference rules for the various binary and unary operations also are straightforward. The rules for the operators -, /, *, and % are like the first rule for +.

A ` E1 : int
A ` E2 : int
(The rules for -, *, /, % are similar)    A ` E1 : string
A ` E2 : string


A ` E1 + E2 : int

A ` E1 + E2 : string

 

A ` E1 : bool (The rule for the logical or operator (|) is similar)
A ` E2 : bool

A ` E1 & E2 : bool

 

T is not unit
A ` E1 : T
A
` E2 : T
A ` E1 : T
A
` E2 : T


(T must be int or string.
The rule for > is similar)

A ` E1 == E2 : bool A ` E1 < E2 : bool

 

A ` E : bool           A ` E : int


A ` ! E : bool A ` - E : int
 
A ` E : string           A ` E : array[T]


A ` length E : int A ` length E : int

Identifiers representing variables are type-checked simply by looking them up in the type context. Array index expressions representing variables also are straightforward.

id:T  2 A A ` E1 : array[T]
A
` E2 : int
A ` E1 : string
A ` E2 : int



A ` id : T A ` E1 [E2 ] : T A ` E1 [E2 ] : int

The special array constructor expression in Iota requires that the initialization expression E2 has the appropriate type:

A ` E1 : int
A ` E2 : T


(new array)
A ` new T [ E1 ] ( E2 ) : array[T]

A function call is checked by ensuring that the types of each of the actual expressions supplied as arguments are the same as the declared types of the formal arguments. Note that this rule looks in the environment to find the signature of the function named id.

id : T1 × T2 × ... × Tn!T  2 A
A
` Ei : Ti    (i 2 1..n)

(fcn call)

A ` idE1,..., En )  : T

Type-checking statements

In the following rules, the letter S will be used to denote a single statement. First, let us consider rules associated with the lists of statements. The type of an empty list of statements is unit:


(empty block)

A ` ( ) : unit

The type of a statement list is the same as the type of the final statement. Variable declarations in the sequence apply to all subsequent statements in the block.

A ` S : T


(singleton)

A ` ( S ) : T

 

A ` S1 : T1
S1 is not a declaration
A
` ( S2 ; ... ; Sn ) : Tn

A ` S1 : T1
S1 = id : T [ = E ]
A
, id : T   `  ( S2; ... ; Sn ) : Tn

(block   
(decl block)

A ` ( S1 ; ... ; Sn ) : Tn

A ` ( S1 ; ... ; Sn ) : Tn

Now we will consider type-checking some more interesting statements.  A statement that is a variable declaration has type unit if the variable is not initialized by the declaration, or the type (and value) of the variable if it is initialized:

A ` E : T

(decl)    
(decl init)
A `id : T : unit A ` id : T = E    : T

There are two kinds of assignment statements: assignments to ordinary variables, and assignments to array elements:

T not a function type
id
: T  2 A
A
` E : T
A ` E1 : array[T]
A
` E2 : int
A ` E3 : T

(assign)    
(array assign)
A ` id = E : T A ` E1 [ E2 ] = E3 : T

Now let us consider Iota's statements for control flow. The if statement requires that its two arms (if present) have the same type in order for the statement to have a value:

A ` E : bool
A ` S : T
A ` E : bool
A ` S1 : T
A
` S2 : T

(if)    
(if/else1)
A ` if ( E ) S : unit A ` if ( E ) S1 else S2T

If the two arms have different types, the if statement has no value and is given type unit:

A ` E : bool
A ` S1 : T1
A ` S2 : T2
T1 not equal to T2

(if/else2)
A ` if ( ES1 else S2unit

[Aside: An alternative strategy for formalizing the type-checking of if statements is to consider all well-typed statements to have type unit in addition to what we would consider to be their principal types. With this formalization, we do not need the previous rule for clauses that have different types, because  the following general inference rule ensures that both statements S1 and S2 have at least the type unit in common:

A ` S : T

A ` S : unit

Using this rule, an if statement always can be given type unit regardless of the types of the two arms. Note that with this rule, it is possible for a statement to be given two types: its principal type, and the type unit. The principal type for an expression or statement is its most specific type; in a language in which an expression can have more than one type, it is important that it have a principal type. In this case, the ambiguity about its type, because a statement or expression is well-typed as long as there is any way to prove it well-typed. Practically it is not a problem, either, because the type checker can always determine the principal type of a statement and then relax its type to some other type (i.e., unit) if that is what the containing context demands.]

A while statement always has the type unit, and requires that its test expression has type bool.

A ` E : bool
A ` S : T

(while)
A ` while ( E ) : unit

A return statement is checked by ensuring that the return type matches the declared return type of the containing function; this declared return type is stored in the symbol table under the identifier return. Note that this means that statements such as int x = (return "hello") will not type-check even though they cannot generate dynamic type errors; thus, the rules are overly conservative. (We will see later in Iota+ how to relax these rules safely, which will allow certain common, useful programming idioms.)

return: unit 2 A A ` E : T
T
is not unit
return: T  2 A

(return1)
(return2)
A ` return : unit A ` return E : unit