Iota Static Semantics

In this handout, we present static semantics for the Iota programming language defined earlier. The static semantics mirror the syntactic specifications given in that definition; the rules given here can be applied straightforwardly to the result of parsing using those specifications. The static semantics capture the essence of type-checking Iota. Certain details are ignored, such as the interpretation of type expressions, which is straightforward in Iota in any case.

The goal of the type checker is to determine whether a module M is well-typed. This judgment will be written as |– M, where M is the text of the module. To check a module, we first construct an environment in which all of the module "uses" definitions and the components of the module themselves appear. This environment construction is performed by the function module-env, which is defined below. Suppose that U represents a single "use" declaration (of the form id.id or id = id.id) and D represents a single definition within a module. A module is type-safe if each of its definitions D_i is type-safe in the module environment:

M = uses U₁...U_n D₁...D_m

module-env({U₁...U_n, D₁...D_m}) |– [ D_i ] (for all D_i)

|– M

In this rule, brackets are placed around the definition D to distinguish the judgment that a declaration is type-safe from other judgments that the static checker makes.

The function module-env performs the gathering of information about variables and functions that are visible at the top level, and can be defined recursively. Here, the function interface-type(i,n) finds the type of the identifier n in the interface named i.

module-env({id₁.id₂, U₂,...,U_n, D₁,...,D_m}) =
        { id₂: interface-type(id₁, id₂) } + module-env({U₂...U_n, D₁...D_m})
module-env({id₁ = id₂.id₃, U₂,...,U_n, D₁,...,D_m}) =
        { id₁: interface-type(id₂, id₃) } + module-env({U₂...U_n, D₁...D_m})
module-env({id₁ : T₁, D₂,...,D_m}) =
        { id₁: T₁} + module-env({D₂,...,D_m})
module-env({id₀(id₁: T₁, id₂: T₂,...,id_n: T_n) = E, D₂,...,D_m}) =
        { id₀ : T₁ ´ T₂ ´ ... ´ T_nŽ void } + module-env({D₂,...,D_m})
module-env({id₀(id₁: T₁, id₂: T₂,...,id_n: T_n) : T = E, D₂,...,D_m}) =
        { id₀ : T₁ ´ T₂ ´ ... ´ T_nŽ T} + module-env({D₂,...,D_m})
module-env({}) = {}

There are two kinds of definitions in a module: variable definitions and function definitions.Variable definitions in the module are automatically considered to be safe: A |– [ id : T ]. Function definitions are less trivial. The rule for checking functions that return or do not return a value is as follows:

A + {..., a_i : T_i, ...} + {return : T_r} |– E : T_r

A |– [ id ( ..., a_i : T_i, ...) : T_r = E ]

A + {..., a_i : T_i, ...} + {return : void} |– E : T_r

A |– [ id ( ..., a_i : T_i, ...) = E ]

These rules say that to check the type safety of a function definition, we check that the expression E that is the body of the function has the expected type T_r that is the declared return type of the function. In the case of a function that does not return a value, the expression E may have any type T_r. The type of the expression E is determined in an environment that includes all of the argument variables a_i of the function, bound to their corresponding types T_i, which is necessary because E may mention these variables.

Note that in the preceding rules, environments are extended by using an operator +. The addition of two environments is the union of those environments as long as the two environments do not contain any definitions with the same name. If two environments contain definitions with the same name, the addition of the environments is undefined. This approach differs from that in Appel, where the second environment takes precedence.

To support return statements, the environment contains a binding for the pseudo-identifier return; this binding keeps track of what the return type of the function is and is used to check return statements that occur within the function.

Type-Checking Expressions

As in lecture, the judgment for type-checking both expressions and statements will have the same form: A |– E : T and A |– S : T, respectively. Here, E and S represent an arbitrary expression or statement, and T represents the type of the expression or statement's value. In the case where a statement returns no value, the corresponding type judgment will be A |– S : void.

A |– string_constant : string
A |– integer_constant : int
A |– true : bool
A |– false : bool

Inference rules for the various binary and unary operations also are straightforward. The rules for the operators -, /, *, and % are like the first rule for +.

A \|– E₁ : int	(There are similar rules for `-`,`*`, `/`, `%)`
A \|– E₂ : int

A \|– E₁ + E₂ : int

A |– E₁ : string

A |– E₂ : string

A |– E₁ + E₂ : string

A \|– E₁ : bool	The rule for the \| operator is similar
A \|– E₂ : bool

A \|– E₁ `&` E₂ : bool

A |– E₁ : T

A |– E₂ : T

A |– E₁ == E₂ : bool

A \|– E₁ : int	There are similar rules for the operator `>` and for types string and bool as the types of E₁ and E₂
A \|– E₂ : int

A \|– E₁ `<` E₂ : bool

A \|– E : bool		A \|– E : int

A \|– `!` E : bool		A \|– `-` E : int

A \|– E : string		A \|– E : array(T)

A \|– `length` E : int		A \|– `length` E : int

Identifiers representing variables are type-checked simply by looking them up in the environment. Array indexes representing variables also are straightforward.

	A \|– E₁ : array(T)	A \|– E₁ : string
{id : T} Î A	A \|– E₂: int	A \|– E₂: int

A \|– id : T	A \|– E₁ [ E₂ ] : T	A \|– E₁ [ E₂ ] : int

A function call is checked by ensuring that the types of each of the actual expressions supplied as arguments are the same as the declared types of the formal arguments. Note that this rule looks in the environment to find the signature of the function named id.

{id : T₁ ´ T₂ ´ ... ´ T_nŽ T} Î A

"(i Î [1, n]) A |– E_i : T_i

A |– id ( E₁,...,E_n ) : T

Type-Checking Statements

In the following rules, the letter S will be used to denote a single statement. First, let us consider rules associated with the grouping of statements. The type of an empty group of statements or an empty statement is void:

The type of a block containing a single statement is the same as the type of that statement; the type of a block containing a sequence of statements is the same as the type of the final statement. Variable declarations in the sequence apply to all subsequent statements in the block.

A |– S : T

A |– { S } : T

A \|– S₁ : T₁		A \|– S₁ : T₁
S₁ is not a declaration		S₁= id : T [ = E ]
A \|– { S₂ ... S_n }: T		A + { id : T } \|– { S₂ ... S_n }

A \|– { S₁ ... S_n } : T		A \|– { S₁ ... S_n } : T

Now we will consider type-checking some more interesting statements. A statement may be just an expression, in which case it has the same type as the expression:

A |– E : T

A |– E ; : T

A statement that is a variable declaration has type void if the variable is not initialized by the declaration, or the type (and value) of the variable if it is initialized:

There are two kinds of assignment statements: assignments to ordinary variables, and assignments to array elements:

		A \|– E₁: array(T)
		A \|– E₂: int
A \|– E : T		A \|– E₃ : T

A \|– id = E `;` : T		A \|– E₁ [ E₂ ] = E₃ `;` : T

Now let us consider Iota's statement for structured control flow. The if statement requires that its two arms (if present) have the same type.

		A \|– E : bool
A \|– E : bool		A \|– S₁ : T
A \|– S : T		A \|– S₂ : T

A \|– `if` ( E ) S : void		A \|– `if` ( E ) S₁ else S₂ : T

If the two arms have different types, the whole if statement has type void:

A |– E : bool

A |– S₁ : T₁

A |– S₂ : T₂

T₁š T₂

A |– if ( E ) S₁ else S₂ : void

An alternative strategy for formalizing the type-checking of if statements is to consider all well-typed statements to have type void in addition to what we would consider to be their principal types. With this formalization, we do not need the previous rule for arms have different types, because the following general inference rule ensures that both statements S₁ and S₂ have at least the type void in common:

Using this rule, an if statement always can be given type void regardless of the types of the two arms. Note that with this rule, it is possible for a statement to be given two types: its principal type, and the type void. This ambiguity is not a problem formally, because a statement or expression is well-typed as long as there is any way to prove it well-typed. Practically it is not a problem, either, because we can always try to derive the principal type of a statement and then relax its type to void if that is what the containing context demands.

A while statement always has the type void, and requires that its test expression has type bool:

A |– E : bool

A |– S : T

A |– while ( E ) S : void

A return statement is checked by ensuring that the return type matches the declared return type of the containing function; this declared return type is stored in the symbol table under the identifier return. A return statement has the type void, because it returns no value to the context in which it occurs. For example, the statement int x = {return 2;} is not well-typed.

		A \|– E : T
{ return : void }Î A		{ return : T }Î A

A \|– `return;` : void		`return` E `;`: void

A \|– E₁ : int	(There are similar rules for `-`,`*`, `/`, `%)`
A \|– E₂ : int

A \|– E₁ + E₂ : int

A \|– E₁ : bool	The rule for the \| operator is similar
A \|– E₂ : bool

A \|– E₁ `&` E₂ : bool

A \|– E₁ : int	There are similar rules for the operator `>` and for types string and bool as the types of E₁ and E₂
A \|– E₂ : int

A \|– E₁ `<` E₂ : bool

A \|– E : bool		A \|– E : int

A \|– `!` E : bool		A \|– `-` E : int

A \|– E : string		A \|– E : array(T)

A \|– `length` E : int		A \|– `length` E : int

Static Semantics for Iota

Type-Checking Expressions

Type-Checking Statements