Intro to Polymorphism

There are 3 basic kinds of polymorphism:

* Subtyping (or inclusion) polymorphism.
* Parametric polymorphism.
* Ad hoc polymorphism.

Subtyping arises when we can pass an object of type T1 to an operation
(including a function or method) that expects a T2 and we don't get
a type-error.  Most modern languages, including Java and C#, have some
notion of subtyping as the basic form of polymorphism.  

Parametric polymorphism is a generalization of abstract data types
and languages such as ML and Haskell are notable for their inclusion
of parametric polymorphism.  Parametric polymorphism arises naturally
for functions, such as the identity, or list reversal, where we
manipulate objects in a way that's oblivious to the type.  Unlike
subtyping, it's possible to express input-output relationships
on type signatures which gives the caller more reasoning power.
But, subtype polymorphism tends to work better (or at least more
easily) for dealing with heterogeneous data structures.  

Note, however, that you can simulate subtyping in a pretty
straightforward way with parametric polymorphism, but it requires
passing around coercion functions.  You can also simulate parametric
polymorphism in a language like Java, but it requires run-time type
tests.  So neither situation is ideal, and next-generation languages,
including Java 1.5 and C# include a notion of "generics" which
generalizes both subtyping and parametric polymorphism.

Ad hoc polymoprhism is everything else.  Overloading is a good
example.  Something like "+" appears to work on both integers
and reals, but note that we get different code for the different
operations.  Furthermore, "+" is only defined on some types
and there's no way to abstract over those types.  Languages such
as C++ provide a lot of support for ad hoc polymorphism through
templates, but anyone who has used template libraries can tell
you that this was not well thought through.  Language researchers,
particularly in the Haskell camp, have been working on better
ways to support the power of ad hoc polymorphism (sometimes called
polytypism) and this, coupled with notions of reflection, are
hot research topics.  This is mostly because we don't have a
clean handle on these concepts the same way we do subtyping and
parametric polymorphism.  The funny thing is that it's only in 
the last 15 years that we've gotten a good handle on those two!

---------------------------------------------------------------------
Subtyping:

This arises naturally if we think about the idea of "types as sets",
for then "subtypes are subsets".  More abstractly, the principle
behind subtyping is that, if you can use a value of type T1 anywhere
that you expect to have a value of type T2, then you ought to be able
to treat T1 as a subtype of T2.

For example, given record types:

  type point2d = {x:int, y:int}
  type point3d = {x:int, y:int, z:int}

then we ought to be able to pass in a point3d to any function
expecting a point2d, because all that function can do is select out
the x or y components, and those operations are defined on point3d
values as well as point2d values.

If we think of point2d and point3d as interfaces (a la Java or C#)
then it's natural to think that anything that implements the
point3d interface also implements the point2d interface.  

We can extend our little functional language with subtyping by
adding one new rule called subsumption:

  G |- e : t1     t1 <= t2
  ------------------------
      G |- e : t2

The rule says that if e has type t1 and t1 is a subtype of t2,
then e also has type t2.  

Of course, now we need to define what it means for one type to be a
subtype of another.  Remember the principle to follow is that t1 <= t2
means that for any operation "op" you can perform on a t2 value, you
should be able to perform "op" on a t1 value.

All subtyping frameworks yield a partial order so they include
rules for reflexivity and transitivity:

  t <= t

  t1 <= t2   t2 <= t3
  -------------------
       t1 <= t3

For something like (immutable) records, the subtyping rule might look
like this:

        t1 <= t1'     ...     tn <= tn'
  --------------------------------------------------
  {l1:t1,...,ln:tn,ln+1:tn+1} <= {l1:t1',...,ln:tn'}

This rule says that a record type R1 is a subtype of a record type R2
if for all l:t' in R2, there exists a label l with type t in R1, and
furthermore, t <= t'.  This rule is sound because the only operation
we can apply to an R2 value is to extract a component using the #l
operation.  If #l is defined in R1, then the operation will succeed.
And intuitively, if that yields a value of type t, then we can use
subtyping to show that t <= t'.

What about sums?  Consider the datatypes:

  datatype t1 = C1 of int
  datatype t2 = C1 of int | C2 of bool
  datatype t3 = C1 of int | C2 of bool | C3 of real

The only operation on sums is to do a case on them.  If I have a case
that can handle all of the constructors of t3, then it's safe to pass
in a t2 or a t1 value because the set of constructors in those types
is a subset of the set used in the definition of t3.  So, for sums,
the subtyping rule is the opposite -- the subtype must have fewer
labels.

  t1 <= t1' ... tn <= tn'
  ----------------------------------------------------------------------------
  [l1 of t1 | ... | ln of tn] <= [l1 of t1' | ... | ln of tn' | ln+1 of tn+1']

So, once again, sums are pleasingly dual to products.  To see why this
works out, consider the two definitions:

  datatype t1 = C1 of {x:int,y:int,z:int} 
  datatype t2 = C1 of {x:int,y:int} | C2 of {b:bool}

According to the definition, t1 <= t2.  Let us write a function
that manipulates t2 objects:

  fun f(v:t2) = 
    case v of
      C1(r1:{x:int,y:int}) => #x r1 + #y r1
    | C2(r2:{b:bool}) => 0

Is it safe to pass a t1 object to f?  The answer is yes because
(a) f must have enough cases to cover all of the constructors
in t1, since t2 has at least the constructors of t1 and
(b) for any case of a pattern match, f can only assume that
the value carried by the constructor has t2's type.  So, it
should be safe to pass it a value with t1's type, as long
as t1's type is a subtype of t2's for the given constructor.

(I should remark that these rules only work for non-recursive
datatypes.)

What about functions?  The only operation on a function is to
apply it to some argument.  So, suppose I have a function:

  f : ({x:int,y:int} -> {x:int,y:int}) -> int

Suppose that I have another function:

  g : {x:int} -> {x:int,y:int,z:int}

Does it make sense to pass g to f?  That is, what can go
wrong if we do f(g)?  

Well, f might try to apply g to some argument.  It thinks that g takes
in {x:int,y:int} arguments.  So, it might pass an {x:int,y:int} value
to g.  Now g thinks that it's being given an argument of type {x:int},
so the only thing it can do is apply #x to the argument.  That seems
to be okay because conceptually, we can use subtyping to treat the
{x:int,y:int} value as if it's a {x:int} value.

What about the result?  f thinks that g returns {x:int,y:int} so it
might do #x or #y on the result.  But g returns a {x:int,y:int,z:int}
value.  Again, this is okay because the operations that f might do on
g's result are okay.  Or conceptually, we can apply subtyping to
coerce g's result to {x:int,y:int} if you like.

So it appears as if g's type is a subtype of the argument type of f.
That is, we have:

  ({x:int} -> {x:int,y:int,z:int}) <= ({x:int,y:int} -> {x:int,y:int})

In general, the subtyping rule for functions is:

    t1' <= t1   t2 <= t2'
  --------------------------
  (t1 -> t2) <= (t1' -> t2')

So note that the argument types are *contra-variant* while the result
types are *co-variant*.  There are a lot of ways to see why
contra-variance in the argument type is necessary.  The example above
demonstrates this, but so does drawing a Venn diagram of sets,
subsets, and mappings from one set to another.  Or from a logical
standpoint, we can think of (t1 -> t2) as similar to (not(t1) or t2)
--- the negation on the argument suggests that we'll have to flip the
subtyping around.

-----------------------------------------------------------------------
Subtypes as Subsets vs. Coercions:

Earlier, I said that we can think of subtypes as "subsets" and this is
the general principle that most languages follow.  However, it
requires that we have a uniform representation for values that lie in
related types.  For instance, in Java, all Objects are one word so
that we can pass them around, stick them in data structures, etc.  But
that means that double values are not a subtype of Objects.  A *boxed*
double value (e.g., Double) is a subtype of Object, and we can coerce
a double to a Double.  So, in some sense, double is a
"coerceable-subtype" of Object.

In fact, we can think of all of the subtyping rules as being witnessed
by some coercion function.  In particular, if we think of t1 <= t2 as
a function C:t1->t2, then the subsumption rule becomes:

  G |- e : t1    C:t1->t2
  -----------------------
     G |- C e : t2

The other coercions can be calculated as follows:


  \x:t.x : t->t      (reflexivity is the identity)

  C1:t1->t2    C2:t2->t3
  ---------------------- (transitivity is composition)
    (C1 o C2) : t3 

Then record subtyping is witnessed by this coercion:  

                C1 : t1->t1' ...   Cn : t1->tn'
   -----------------------------------------------------------------   
   \r:{l1:t1,...,ln:tn,ln+1:tn+1}.{l1 = C1(#l1 r),...,ln = Cn(#ln r)}

Coercion-based subtyping is simpler from the perspective of
the language designer and implementor, and it doesn't require
us to have uniform representations for objects.  But coercions
aren't so attractive for the programmer.  First, they cost
time.  Second, they don't work for mutable objects since they
fundamentally operate by making a copy of the object.  (You can
achieve this by providing a level of indirection.  That's what
Java interfaces do.)

Another issue with coercions is that you must have some notion of
coherence.  This means that, no matter how we prove that the code
type-checks, its meaning doesn't change.  In particular, if I have a
proof P1 that |- e : t and another proof P2 that |- e : t, whether I
use P1 to insert the coercions or P2 shouldn't matter.  I should get
the same value out at the end of the day.

Note that the coercions of double to Double in Java are not
coherent.  The problem is that if d is a double, then

                 d == d

does not return the same thing as

         (Double)d == (Double)d

Similar problems arise in C.  In general, (implicit) coercions turn
out to be a good idea only in pure, functional languages.  In
imperative languages, it's almost always a disaster (witness the
problems of arithmetic in C/C++.)