Is the expression below well-typed in Mini-SML
? Yes, and its
return type is determined to be (int -> int) list list
.
let fun f(x: int): int = 1 fun g(x: undefined): undefined = 2 in [[f],[],[g]] end
What about SML
? No, even if we replace undefined
with
type variables.
- let fun f(x: int): int = 1 fun g(x: 'a): 'b = 2 in [[f],[],[g]] end; stdIn:40.28-40.48 Error: right-hand-side of clause doesn't agree with function result type [literal] expression: int result type: 'b in declaration: g = (fn x : 'a => 2: 'b)
This example shows that type-correct Mini-SML
programs are not necessarily also type-correct in
SML
. The differences are due to our desire to have a
simple implementation in Mini-SML
but also, and mostly,
to our desire to illustrate the liberty the designer and implementer
of a programming language has in defining its features.
How does Mini-SML
reach its conclusion? Let's take a look at the
types of various subexpressions:
f : int -> int [f]: (int -> int) list g : undefined -> undefined [g]: (undefined -> undefined) list [] : undefined list
At the top level expression [[f],[],[g]]
is a
list. Mini-SML
knows that all elements of a list must have the same
type, and it tries to unify the types of the various elements. This
unification is 'optimistic' - unless the system finds an
irreconcilable type clash, it will match (or unify) the candidate
types. This is a two step, top-to-bottom process, in which the type of
[f]
is unified with the type of []
, and the
resulting type is unified with the type of [g]
.
Here is the code that achieves type unification in Mini-SML
:
fun unifyTypes (t: typ, t': typ): typ = case (t, t') of (Undef_t, _ ) => t' | (_, Undef_t ) => t | (Int_t, Int_t ) => Int_t | (Real_t, Real_t ) => Real_t | (Bool_t, Bool_t ) => Bool_t | (Char_t, Char_t ) => Char_t | (String_t, String_t) => String_t | (Tuple_t(tl), Tuple_t(tl')) => if List.length(tl) <> List.length(tl') then raise TypeUnification else Tuple_t(ListPair.map (fn (ta,tb) => unifyTypes(ta,tb)) (tl, tl')) | (List_t(tl), List_t(tl')) => List_t(unifyTypes(tl, tl')) | (Fn_t(fa, rt), Fn_t(fa', rt')) => Fn_t(unifyTypes(fa, fa'), unifyTypes(rt, rt')) | _ => raise TypeUnification
We can now understand how dynamic type checking is done in
Mini-SML
. Keep in mind that the result of each computation is a
(value, value type)
pair.
A simple case:
and evaluateBinop (bop:binop, (v1,t1):value*typ,(v2,t2):value*typ):value*typ = ... (* binary operator *) case (bop, v1, v2) of (Plus, Int_v a, Int_v b) => (Int_v (a+b), Int_t ) | (Plus, Real_v a, Real_v b) => (Real_v (a+b), Real_t) | (Plus, _, _ ) => err "type error (+)" ...
... and a more complicated one:
fun evaluate (ex: exp, en: env): value * typ = ... case ex of ... | List_e elist => let val (v, t) = foldr (fn ((v, t), (vl, tl)) => (v::vl, t::tl)) ([],[]) (map (fn(e) => evaluate (e, en)) elist) in (List_v v, List_t (foldl (fn (ta, tb) => unifyTypes (ta, tb)) Undef_t t)) handle TypeUnification => err "typewise inhomogenous list" end ...
Static type-checking can be performed by "executing" the program
without performing computations, but only retrieving and checking type
information. As opposed to dynamic type checking "execution" here
requires that all alternatives (i.e. both branches in an
if
) be examined. As discussed above, in general we can't
predict the execution path, so we have to examine types along all
possible execution paths. The execution context will be retained in a
type environment (an environment where only name and type information
is retained, there is no need for storing values as well).
Static type checking can be easily formalized in a manner analogous to that employed when we defined the substitution model:
(* Constants: *) tcheck(env, c) = bool (when c is a boolean) tcheck(env, c) = int (when c is an integer) tcheck(env, c) = real (when c is a real) tcheck(env, c) = char (when c is a char) tcheck(env, c) = string (when c is a string) tcheck(env, []) = undefined list (* Variables: *) tcheck(env, id) = lookupBinding(env,id) (* Anonymous Functions: *) tcheck(env, fn (id: t1): t2 => e) = t1 -> t4 when tcheck(insertBinding(env, id, t1), e) = t3 and unifyTypes(t2, t3) = t4 (* Function Applications:*) tcheck(env, e1(e2)) = t2 when tcheck(env, e1) = t1 -> t2 and tcheck(env, e2) = t3 and unifyTypes(t1, t3) = _ (* Unary Operations: *) tcheck(env, u e) = unary_op_result_type(u, t1) when tcheck(env, e) = t1 (* Binary Operations: *) tcheck(env, e1 b e2) = binary_op_result_type(b, t1, t2) when tcheck(env, e1) = t1 and tcheck(env, e2) = t2 (* Lists: *) tcheck(env, (e1, e2, ..., en)) = t when tcheck(env, ei) = ti (for 1 <= i <= n) and unifyTypes(t1, t2, ..., tn) = t (* Tuples: *) tcheck(env, (e1, e2, ..., en)) = (t1 * t2 * ... * tn) when tcheck(env, ei) = ti (for 1 <= i <= n) (* Tuple Projections: *) tcheck(env, #i e) = ti when tcheck(env, e) = (t1 * t2 * ... * tn) and (1 <= i <= n) (* If: *) tcheck(env, if c then e1 else e2) = t3 when tcheck(env, c) = bool and tcheck(env, e1) = t1 and tcheck(env, e2) = t2 and unifyTypes(t1, t2) = t3 (* Let: *) tcheck(env, let d in e end) = t when declcheck(env, d) = env' and tcheck(env', e) = t (* Val Declarations: *) declcheck(env, val v:t1 = e) = env' when tcheck(env, e) = t2 and unifyTypes(t1, t2) = t3 and insertBinding(env, v, t3) = env' (* Fun Declarations: *) declcheck(env, fun id1(id2: t1): t2 = e) = env'' when env' = insertBinding(insertBinding(env, id1, t1 -> t2), id2, t1) and tcheck(env', e) = t3 and unifyTypes(t2, t3) = t4 and insertBinding(env, id1, t1 -> t4) = env''
If any of the conditions that is checks is false, or if a type unification fails, the entire type checking procedure fails. The initial type environment should contain bindings for all predefined functions and special forms.
Functions unary_op_result
and
binary_op_result
are used to express the fact that the
type of result for overloaded operators depends in general on both the
operation and the type of the arguments. For example,
binary_op_result(op +, integer, integer) = integer
,
binary_op_results(op +, real, real) = real
, but
binary_op_results(op +, integer, real)
produces an error.
The rules above do not cover the entire language, but they can
easily be extended to handle all constructs in Mini-SML
(and SML
).
Notice that checking the type of a function relies on the
assumption that the function has the very type that we are trying to
check! This assumption is only used when type checking recursive
functions, and it is similar to assuming that P(n)
is
true in order to prove P(n+1)
in an induction.
Let us examine whether the definiton of fact
below
type-checks:
[00] declcheck([], fun fact(n: int): int = if n = 0 then 1 else n * fact(n - 1)) = [(fact, int->int)] when env' = insertBinding(insertBinding([], fact, int->int), n, int) = insertBinding([(fact, int->int)], n, int) = [(n, int), (fact, int->int)] and [01] tcheck([(n, int), (fact, int->int)], if n = 0 then 1 else n * fact(n - 1)) = int and unifyTypes(int, int) = int and insertBindin([], fact, int->int) = [(fact, int->int)] [01] tcheck([(n, int), (fact, int->int)], if n = 0 then 1 else n * fact(n - 1)) = int when [02] tcheck([(n, int), (fact, int->int)], n = 0) = bool and [04] tcheck([(n, int), (fact, int->int)], 1) = int and [05] tcheck([(n, int), (fact, int->int)], n * fact(n - 1)) = int and unifyTypes(int, int) = int [02] tcheck([(n, int), (fact, int->int)], n = 0) = unary_op_result_type(op =, int) = bool when [03] tcheck([(n, int), (fact, int->int)], n) = int [03] tcheck([(n, int), (fact, int->int)], n) = lookupBinding([(n, int), (fact, int->int)], n) = int [04] tcheck([(n, int), (fact, int->int)], 1) = int [05] tcheck([(n, int), (fact, int->int)], n * fact(n - 1)) = binary_op_result_type(op *, int, int) = int when [06] tcheck([(n, int), (fact, int->int)], n) = int and [07] tcheck([(n, int), (fact, int->int)], n * fact(n - 1)) = int [06] tcheck([(n, int), (fact, int->int)], n) = lookupBinding([(n, int), (fact, int->int)], n) = int [07] tcheck([(n, int), (fact, int->int)], fact(n-1)) = int when [08] tcheck([(n, int), (fact, int->int)], fact) = int->int and [09] tcheck([(n, int), (fact, int->int)], n - 1) = int and unifyTypes(int, int) = _ [08] tcheck([(n, int), (fact, int->int)], fact) = lookupBinding([(n, int), (fact, int->int)], fact) = int->int [09] tcheck([(n, int), (fact, int->int)], n - 1) = binary_op_result_type(op -, int, int) = int when [10] tcheck([(n, int), (fact, int->int)], n) = int and [11] tcheck([(n, int), (fact, int->int)], 1) = int [10] tcheck([(n, int), (fact, int->int)], n) = lookupBinding([(n, int), (fact, int->int)], n) = int [11] tcheck([(n, int), (fact, int->int)], 1) = int
Type environments allow for shadowing. The type of variable
x
in environment [(s, string), (x, int), (b, bool),
(x, string)]
is int
because the first binding for
x
we find in the environment when examining it from left
to right is (x, int)
.
Take a look at this SML
example:
- fun nice(s) = s ^ ", please."; val nice = fn : string -> string
How did SML
determine the type of function nice
? Well, it knows that the type of op ^
is string * string -> string
, which means that s
must be of type string
. Thus the function takes a string and returns the result of op ^
(a string), which means that its type must be string -> string
.
Mini-SML
does not do type inference, except when it infers the type of a function, given the type of its arguments and that of the return value. It is a good exercise for you to think of how to implement type inference in Mini-SML
.