Type inference refers to the process of determining the appropriate types for
expressions based on how they are used. For example, in the
expression `f 3`

, OCaml knows that `f`

must be a function, because it is
applied to something (not because its name is `f`

!) and that it takes
an `int`

as input. It knows nothing about the output type.
Therefore the type inference mechanism of OCaml would assign `f`

the type ```
int
-> 'a
```

.

# fun f -> f 3;; - : (int -> 'a) -> 'a =

There may be many different occurrences of a symbol in an expression, all leading to different typing constraints, and these constraints must have a common solution, otherwise the expression cannot be typed.

# fun f -> f (f 3);; - : (int -> int) -> int =# fun f -> f (f "hello");; - : (string -> string) -> string = # fun f -> f (f 3, f 4);; Error: This expression has type 'a * 'a but an expression was expected of type int

In the first example, how does it know that the output type of `f`

is `int`

? Because the
input type of `f`

is `int`

, and the output of `f`

is fed into
`f`

again, so the output type of `f`

has to be the same as the input type of `f`

.

Let's illustrate the inference process with a simple example. Suppose we have matrices of different sizes and shapes. Let's write the type of an m x n matrix as m -> n. I can multiply two matrices if and only if the column dimension of the first is equal to the row dimension of the second. This can be represented by a typing rule

A : s -> t B : t -> u ----------------------- AB : s -> u

which says that if matrix ` A`

has row and column dimensions ` s`

and
`t`

, respectively, and
matrix ` B`

has row and column dimensions ` t`

and `u`

, respectively, then I can multiply them,
and the resulting product has row and column dimensions ` s`

and `u`

, respectively.
Similarly, there might be other typing rules that say when I can add two matrices
or square a matrix and what the result types will be:

A : s -> t B : s -> t A : s -> s ----------------------- ----------- A + B : s -> t A^{2}: s -> s

Now given some matrix expression, say ` (AB + CD)`

, for what dimensions of
^{2}`A`

, `B`

, `C`

, and ` D`

will this
expression make sense? We can start out with abstract types for each
subexpression, then add constraints
as necessary so that the typing rules will apply. So we start out with types

A : s -> t AB : a -> b B : u -> v CD : c -> d C : w -> x AB + CD : e -> f D : y -> z (AB + CD)^{2}: g -> h

where the small letters are distinct type variables.
Now in order for `AB`

to make sense as determined by the typing
rule for multiplication, we had better have `t = u`

, so that the multiplication
can be performed, and `a = s`

and `b = v`

, so that the type of the
result is the same as the type of the product. Similarly, in order for ` CD`

to make sense, we must
have ` x = y`

, `c = w`

, and `d = z`

, etc. Collecting
all these constraints, we get

t = u, a = s, b = v for AB x = y, c = w, d = z for CD a = c = e, b = d = f for AB + CD e = f = g = h for (AB + CD)^{2}

Solving these constraints, we get three equivalence classes

a = b = c = d = e = f = g = h = s = v = w = z t = u x = y

which we can represent by one of the members from each, say `a`

, `t`

,
and `x`

. Thus we have

A : a -> t B : t -> a C : a -> x D : x -> a

and this is the most general typing for which this expression makes sense. Any values
of `a`

, `t`

, and `x`

can be used here as long as `A`

,
`B`

, `C`

, and ` D`

have the given types.

Both polymorphic type inference and pattern matching in OCaml are instances of a
very general mechanism called ```
Some
x
```

), whereas type inference is done by applying unification to type
expressions (e.g. `'a -> 'b -> 'a`

). It is interesting that
both these procedures turn out to be applications of the same general
mechanism. There are many other applications of unification in computer
science; e.g., the programming language Prolog is based on it.

The essential task of unification is to find a substitution `S`

that `s S`

for the result of
applying the substitution `S`

to the term `s`

. Thus, given
`s`

and `t`

, we want to
find `S`

such that `s S = t S`

. Such a substitution `S`

is called a `s`

and `t`

. For example, given the two terms

f x (g y) f (g z) w

where `x`

, `y`

, `z`

, and `w`

are variables,
the substitution

S = [x <- g z, w <- g y]

would be a unifier, since

f x (g y) [x <- g z, w <- g y] = f (g z) w [x <- g z, w <- g y] = f (g z) (g y).

Note that this is a purely syntactic definition; the meaning of expressions is not taken into consideration when computing unifiers.

Unifiers do not necessarily exist. For example, the terms `x`

and `f x`

cannot be unified, since no substitution for x can make the
two terms equal. Even when unifiers exist, they are not necessarily
unique. For example, the substitution

T = [x <- g (f a b), y <- f b a, z <- f a b, w <- g (f b a)]

is also a unifier for the two terms above:

f x (g y) T = f (g z) w) T = f (g (f a b)) (g (f b a)).

However, when a unifier exists, there is a `S`

for `s`

and
`t`

is an `s`

and `t`

if

`S`

is a unifier for`s`

and`t`

; and- any other unifier
`T`

for`s`

and`t`

is arefinement of`S`

; that is,`T`

can be obtained from`S`

by doing further substitutions.

For example, the substitution `S`

in the example above is an mgu for
`f x (g y)`

and `f (g z) w`

. The unifier `T`

is a refinement of
`S`

, since `T = SU`

,
where

U = [z <- f a b, y <- f b a].

Note that

f x (g y) S U = f x (g y) [x <- g z, w <- g y] [z <- f a b, y <- f b a] = f (g z) (g y) [z <- f a b, y <- f b a] = f (g (f a b)) (g (f b a)) = f x (g y) T.

We need unification for not just for pairs of terms, but more generally, for sets
of pairs of terms. We say that a substitution `S`

is a unifier for
`[(s`

if _{1},t_{1}),...,(s_{n},t_{n})]`s`

for all _{i} S = t_{i} S`1 <= i <= n`

.
The unification algorithm consists of two mutually recursive procedures `unify`

and `unify_one`

, which try to unify a list of pairs and a single pair,
respectively. The result of the computation is the most general unifier
for the list of pairs or the pair, respectively.

Turn on Javascript to see the program.

Now we show how type inference in OCaml can be done with unification on type expressions. Keep the matrix example above in mind; we will be doing roughly the same thing, but with different typing rules.

For simplicity, let's take a very small subset of OCaml consisting of

variables x, y, ... expressions e ::= x | fun x -> e | (e_{1}e_{2})

This subset has a name: the *type expressions:*

type variables 'a, 'b, ... type expressions s ::= 'a | s -> t

Take care that these are two separate classes of expressions; they cannot be mixed.

We will assume for simplicity that all bound variables are distinct; that is,
no variable is bound twice in two different subexpressions of the form
`fun x -> ...`

. We can always rename bound variables if
necessary to make this true. This is called ```
fun
x -> x + 3
```

and
`fun y -> y + 3`

are semantically equivalent.

The typing rules are

e_{1}: s -> t e_{2}: s x : s e : t --------------------- ------------------- (e_{1}e_{2}) : t fun x -> e : s -> t

The first rule says that the function application `(e`

only
makes sense if _{1} e_{2})`e`

is a function, i.e. has a type of the form _{1}`s -> t`

,
and the input type of `e`

is the same as the type of its argument _{1}`e`

.
When these premises are satisfied, then the result, represented by the expression _{2}`(e`

,
has the same type as the result type of _{1} e_{2})`e`

. The
second rule says that the expression _{1}`fun x -> e`

represents a function
taking elements of the same type as `x`

to elements of the type of `e`

.

The rules are slightly more complicated without α-conversion, but not
much. Essentially, it is necessary to maintain a *type environment*,
and type inferences are done with respect to that environment.

These rules impose constraints as follows. Suppose we want to do type
inference on a given expression `e`

. We first assign unique type
variables `'a`

, ...

- one to each variable
`x`

occurring in`e`

, and - one to each
*occurrence*of each subexpression of`e`

.

Note that in the former clause, the type variable is associated with the
variable, and in the latter, it is associated with the *occurrence* of the
subexpression in `e`

.
Call the type variable assigned to `x`

in the former clause `u(x)`

,
and call the type variable assigned to occurrence of a subexpression `e'`

in the latter clause `v(e')`

.

Now we take the following constraints:

`u(x) = v(x)`

for each occurrence of a variable`x`

`v(e`

for each occurrence of a subexpression_{1}) = v(e_{2}) -> v((e_{1}e_{2}))`(e`

_{1}e_{2})`v(fun x -> e) = v(x) -> v(e)`

for each occurrence of a subexpression`fun x -> e.`

This gives us a list of pairs of type expressions representing type constraints imposed by the typing rules above.

Now given an expression `e`

whose polymorphic type we would like to infer,
we can walk the abstract syntax tree of `e`

and collect these constraints, then
perform unification on the constraints to obtain their most general unifier.
The resulting substitution applied to the type variable `v(e)`

gives
the most general polymorphic type of `e`

.

The complete code for unification and simple polymorphic type inference can be downloaded from here. The following is some sample output. The last term is not typable because unification results in a circularity.

? fun x -> x fun x -> x : 'a -> 'a ? fun x -> fun y -> x fun x -> fun y -> x : 'a -> 'b -> 'a ? fun x -> fun y -> y fun x -> fun y -> y : 'a -> 'b -> 'b ? fun f -> fun g -> fun x -> f (g x) fun f -> fun g -> fun x -> f (g x) : ('e -> 'd) -> ('c -> 'e) -> 'c -> 'd ? fun x -> fun y -> fun z -> x z (y z) fun x -> fun y -> fun z -> x z (y z) : ('c -> 'e -> 'd) -> ('c -> 'e) -> 'c -> 'd ? fun f -> (fun x -> f x x) (fun y -> f y y) not unifiable: circularity ?