Type inference refers to the process of determining the appropriate types for
expressions based on how they are used. For example, in the
f 3, OCaml knows that
f must be a function, because it is
applied to something (not because its name is
f!) and that it takes
int as input. It knows nothing about the output type.
Therefore the type inference mechanism of OCaml would assign
f the type
# fun f -> f 3;; - : (int -> 'a) -> 'a =
There may be many different occurrences of a symbol in an expression, all leading to different typing constraints, and these constraints must have a common solution, otherwise the expression cannot be typed.
# fun f -> f (f 3);; - : (int -> int) -> int =
# fun f -> f (f "hello");; - : (string -> string) -> string = # fun f -> f (f 3, f 4);; Error: This expression has type 'a * 'a but an expression was expected of type int
In the first example, how does it know that the output type of
int? Because the
input type of
int, and the output of
f is fed into
f again, so the output type of
f has to be the same as the input type of
Let's illustrate the inference process with a simple example. Suppose we have matrices of different sizes and shapes. Let's write the type of an m x n matrix as m -> n. I can multiply two matrices if and only if the column dimension of the first is equal to the row dimension of the second. This can be represented by a typing rule
A : s -> t B : t -> u ----------------------- AB : s -> u
which says that if matrix
A has row and column dimensions
t, respectively, and
B has row and column dimensions
u, respectively, then I can multiply them,
and the resulting product has row and column dimensions
Similarly, there might be other typing rules that say when I can add two matrices
or square a matrix and what the result types will be:
A : s -> t B : s -> t A : s -> s ----------------------- ----------- A + B : s -> t A2 : s -> s
Now given some matrix expression, say
(AB + CD)2, for what dimensions of
D will this
expression make sense? We can start out with abstract types for each
subexpression, then add constraints
as necessary so that the typing rules will apply. So we start out with types
A : s -> t AB : a -> b B : u -> v CD : c -> d C : w -> x AB + CD : e -> f D : y -> z (AB + CD)2 : g -> h
where the small letters are distinct type variables.
Now in order for
AB to make sense as determined by the typing
rule for multiplication, we had better have
t = u, so that the multiplication
can be performed, and
a = s and
b = v, so that the type of the
result is the same as the type of the product. Similarly, in order for
CD to make sense, we must
x = y,
c = w, and
d = z, etc. Collecting
all these constraints, we get
t = u, a = s, b = v for AB x = y, c = w, d = z for CD a = c = e, b = d = f for AB + CD e = f = g = h for (AB + CD)2
Solving these constraints, we get three equivalence classes
a = b = c = d = e = f = g = h = s = v = w = z t = u x = y
which we can represent by one of the members from each, say
x. Thus we have
A : a -> t B : t -> a C : a -> x D : x -> a
and this is the most general typing for which this expression makes sense. Any values
x can be used here as long as
D have the given types.
Both polymorphic type inference and pattern matching in OCaml are instances of a
very general mechanism called
x), whereas type inference is done by applying unification to type
'a -> 'b -> 'a). It is interesting that
both these procedures turn out to be applications of the same general
mechanism. There are many other applications of unification in computer
science; e.g., the programming language Prolog is based on it.
The essential task of unification is to find a substitution
s S for the result of
applying the substitution
S to the term
s. Thus, given
t, we want to
S such that
s S = t S. Such a substitution
S is called a
t. For example, given the two terms
f x (g y) f (g z) w
w are variables,
S = [x <- g z, w <- g y]
would be a unifier, since
f x (g y) [x <- g z, w <- g y] = f (g z) w [x <- g z, w <- g y] = f (g z) (g y).
Note that this is a purely syntactic definition; the meaning of expressions is not taken into consideration when computing unifiers.
Unifiers do not necessarily exist. For example, the terms
f x cannot be unified, since no substitution for x can make the
two terms equal. Even when unifiers exist, they are not necessarily
unique. For example, the substitution
T = [x <- g (f a b), y <- f b a, z <- f a b, w <- g (f b a)]
is also a unifier for the two terms above:
f x (g y) T = f (g z) w) T = f (g (f a b)) (g (f b a)).
However, when a unifier exists, there is a
t is an
Sis a unifier for
S; that is,
Tcan be obtained from
Sby doing further substitutions.
For example, the substitution
S in the example above is an mgu for
f x (g y) and
f (g z) w. The unifier
T is a refinement of
T = SU,
U = [z <- f a b, y <- f b a].
f x (g y) S U = f x (g y) [x <- g z, w <- g y] [z <- f a b, y <- f b a] = f (g z) (g y) [z <- f a b, y <- f b a] = f (g (f a b)) (g (f b a)) = f x (g y) T.
We need unification for not just for pairs of terms, but more generally, for sets
of pairs of terms. We say that a substitution
S is a unifier for
si S = ti S for all
1 <= i <= n.
The unification algorithm consists of two mutually recursive procedures
unify_one, which try to unify a list of pairs and a single pair,
respectively. The result of the computation is the most general unifier
for the list of pairs or the pair, respectively.
Now we show how type inference in OCaml can be done with unification on type expressions. Keep the matrix example above in mind; we will be doing roughly the same thing, but with different typing rules.
For simplicity, let's take a very small subset of OCaml consisting of
variables x, y, ... expressions e ::= x | fun x -> e | (e1 e2)
This subset has a name: the
type variables 'a, 'b, ... type expressions s ::= 'a | s -> t
Take care that these are two separate classes of expressions; they cannot be mixed.
We will assume for simplicity that all bound variables are distinct; that is,
no variable is bound twice in two different subexpressions of the form
fun x -> .... We can always rename bound variables if
necessary to make this true. This is called
x -> x + 3 and
fun y -> y + 3 are semantically equivalent.
The typing rules are
e1 : s -> t e2 : s x : s e : t --------------------- ------------------- (e1 e2) : t fun x -> e : s -> t
The first rule says that the function application
(e1 e2) only
makes sense if
e1 is a function, i.e. has a type of the form
s -> t,
and the input type of
e1 is the same as the type of its argument
When these premises are satisfied, then the result, represented by the expression
has the same type as the result type of
second rule says that the expression
fun x -> e represents a function
taking elements of the same type as
x to elements of the type of
The rules are slightly more complicated without α-conversion, but not much. Essentially, it is necessary to maintain a type environment, and type inferences are done with respect to that environment.
These rules impose constraints as follows. Suppose we want to do type
inference on a given expression
e. We first assign unique type
Note that in the former clause, the type variable is associated with the
variable, and in the latter, it is associated with the occurrence of the
Call the type variable assigned to
x in the former clause
and call the type variable assigned to occurrence of a subexpression
in the latter clause
Now we take the following constraints:
u(x) = v(x)for each occurrence of a variable
v(e1) = v(e2) -> v((e1 e2))for each occurrence of a subexpression
v(fun x -> e) = v(x) -> v(e)for each occurrence of a subexpression
fun x -> e.
This gives us a list of pairs of type expressions representing type constraints imposed by the typing rules above.
Now given an expression
e whose polymorphic type we would like to infer,
we can walk the abstract syntax tree of
e and collect these constraints, then
perform unification on the constraints to obtain their most general unifier.
The resulting substitution applied to the type variable
the most general polymorphic type of
The complete code for unification and simple polymorphic type inference can be downloaded from here. The following is some sample output. The last term is not typable because unification results in a circularity.
? fun x -> x fun x -> x : 'a -> 'a ? fun x -> fun y -> x fun x -> fun y -> x : 'a -> 'b -> 'a ? fun x -> fun y -> y fun x -> fun y -> y : 'a -> 'b -> 'b ? fun f -> fun g -> fun x -> f (g x) fun f -> fun g -> fun x -> f (g x) : ('e -> 'd) -> ('c -> 'e) -> 'c -> 'd ? fun x -> fun y -> fun z -> x z (y z) fun x -> fun y -> fun z -> x z (y z) : ('c -> 'e -> 'd) -> ('c -> 'e) -> 'c -> 'd ? fun f -> (fun x -> f x x) (fun y -> f y y) not unifiable: circularity ?