# Variants
* * *
*
Topics:
* type synonyms
* variants
* catch-all cases
* recursive variants
* parameterized variants
* built-in types that are variants
* exceptions
* trees
* natural numbers
*
* * *
## Type synonyms
A *type synonym* is a new name for an already existing type. For example,
here are some type synonyms that might be useful in representing some
types from linear algebra:
```
type point = float * float
type vector = float list
type matrix = float list list
```
Anywhere that a `float*float` is expected, you could use `point`, and vice-versa.
The two are completely exchangeable for one another. In the following code,
`getx` doesn't care whether you pass it a value that is annotated as
one vs. the other:
```
let getx : point -> float =
fun (x,_) -> x
let pt : point = (1.,2.)
let floatpair : float*float = (1.,3.)
let one = getx pt
let one' = getx floatpair
```
Type synonyms are useful because they let us give descriptive names
to complex types. They are a way of making code more self-documenting.
## Variants
Thus far, we have seen variants simply as enumerating a set of constant values,
such as:
```
type day = Sun | Mon | Tue | Wed
| Thu | Fri | Sat
type ptype = TNormal | TFire | TWater
type peff = ENormal | ENotVery | Esuper
```
But variants are far more powerful that this. Our main goal today is to
see all the various things that variants can do.
As a running example, here is a variant type that does more than just
enumerate values:
```
type shape =
| Point of point
| Circle of point * float (* center and radius *)
| Rect of point * point (* lower-left and
upper-right corners *)
```
This type, `shape`, represents a shape that is either a point, a circle,
or a rectangle. A point is represented by a constructor `Point` that
*carries* some additional data, which is a value of type `point`.
A circle is represented by a constructor `Circle` that carries
a pair of type `point * float`, which according to the comment
represents the center of the circle and its radius. A rectangle
is represented by a constructor `Rect` that carries a pair of type
`point*point`.
Here are a couple functions that use the `shape` type:
```
let area = function
| Point _ -> 0.0
| Circle (_,r) -> pi *. (r ** 2.0)
| Rect ((x1,y1),(x2,y2)) ->
let w = x2 -. x1 in
let h = y2 -. y1 in
w *. h
let center = function
| Point p -> p
| Circle (p,_) -> p
| Rect ((x1,y1),(x2,y2)) ->
((x2 -. x1) /. 2.0,
(y2 -. y1) /. 2.0)
```
The `shape` variant type is the same as those we've seen before in that
it is defined in terms of a collection of constructors. What's different
than before is that those constructors carry additional data along with them.
Every value of type `shape` is formed from exactly one of those constructors.
Sometimes we call the constructor a *tag*, because it tags the data it carries
as being from that particular constructor.
Variant types are sometimes called *tagged unions*. Every value of the type
is from the set of values that is the union of all values from the underlying
types that the constructor carries. For the `shape` type, every value
is tagged with either `Point` or `Circle` or `Rect` and carries a value
from the set of all `point` valued unioned with the set of all `point*float`
values unioned with the set of all `point*point` values.
Another name for these variant types is an *algebraic data type*. "Algebra"
here refers to the fact that variant types contain both sum and product types,
as defined in the previous lecture. The sum types come from the fact that
a value of a variant is formed by *one of* the constructors. The product
types come from that fact that a constructor can carry tuples or records,
whose values have a sub-value from *each of* their component types.
Using variants, we can express a type that represents the union of several
other types, but in a type-safe way. Here, for example, is a type that
represents either a `string` or an `int`:
```
type string_or_int =
| String of string
| Int of int
```
If we wanted to, we could use this type to code up lists (e.g.) that
contain either strings or ints:
```
type string_or_int_list = string_or_int list
let rec sum : string_or_int list -> int = function
| [] -> 0
| (String s)::t -> int_of_string s + sum t
| (Int i)::t -> i + sum t
let three = sum [String "1"; Int 2]
```
Variants thus provide a type-safe way of doing something that might
before have seemed impossible.
Variants also make it possible to discriminate which tag a value was
constructed with, even if multiple constructors carry the same type.
For example:
```
type t = Left of int | Right of int
let x = Left 1
let double_right = function
| Left i -> i
| Right i -> 2*i
```
**Syntax.**
To define a variant type:
```
type t = C1 [of t1] | ... | Cn [of tn]
```
The square brackets above denote the the `of ti` is optional. Every
constructor may individually either carry no data or carry date.
We call constructors that carry no data *constant*; and those that
carry data, *non-constant*.
To write an expression that is a variant:
```
C e
---or---
C
```
depending on whether the constructor name `C` is non-constant or constant.
**Dynamic semantics.**
* if `e==>v` then `C e ==> C v`, assuming `C` is non-constant.
* `C` is already a value, assuming `C` is constant.
**Static semantics.**
* if `t = ... | C | ...` then `C : t`.
* if `t = ... | C of t' | ...` and if `e : t'` then `C e : t`.
**Pattern matching.**
We add the following new pattern form to the list of legal patterns:
* `C p`
And we extend the definition of when a pattern matches a value and produces
a binding as follows:
* If `p` matches `v` and produces bindings \\(b\\), then
`C p` matches `C v` and produces bindings \\(b\\).
## Catch-all cases
One thing to beware of when pattern matching against variants is what
*Real World OCaml* calls "catch-all cases". Here's a simple example of
what can go wrong. Let's suppose you write this variant and function:
```
type color = Blue | Red
(* a thousand lines of code in between *)
let string_of_color = function
| Blue -> "blue"
| _ -> "red"
```
Seems fine, right? But then one day you realize there are more colors
in the world. You need to represent green. So you go back and add green
to your variant:
```
type color = Blue | Red | Green
```
But because of the thousand lines of code in between, you forget that
`string_of_color` needs updating. And now, all the sudden, you are
red-green color blind:
```
# string_of_color Green
- : string = "red"
```
The problem is the *catch-all* case in the pattern match inside `string_of_color`:
the final case that uses the wildcard pattern to match anything. Such code
is not robust against future changes to the variant type.
If, instead, you had originally coded the function as follows, life would be better:
```
let string_of_color = function
| Blue -> "blue"
| Red -> "red"
```
Now, when you change `color` to add the `Green` constructor, the OCaml type checker
will discover and alert you that you haven't yet updated `string_of_color` to
account for the new constructor:
```
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
Green
```
The moral of the story is: catch-all cases lead to buggy code. Avoid using them.
## Recursive variants
Variant types may mention their own name inside their own body.
For example, here is a variant type that could be used to represent
something similar to `int list`:
```
type intlist = Nil | Cons of int * intlist
let lst3 = Cons (3, Nil) (* similar to 3::[] or [3]*)
let lst123 = Cons(1, Cons(2, l3)) (* similar to [1;2;3] *)
let rec sum (l:intlist) : int=
match l with
| Nil -> 0
| Cons(h,t) -> h + sum t
let rec length : intlist -> int = function
| Nil -> 0
| Cons (_,t) -> 1 + length t
let empty : intlist -> bool = function
| Nil -> true
| Cons _ -> false
```
Notice that in the definition of `intlist`, we define the `Cons`
constructor to carry a value that contains an `intlist`. This makes
the type `intlist` be *recursive*: it is defined in terms of itself.
Record types may also be recursive, but plain old type synonyms may not be:
```
type node = {value:int; next:node} (* OK *)
type t = t*t (* Error: The type abbreviation t is cyclic *)
```
Types may be mutually recursive if you use the `and` keyword:
```
type node = {value:int; next:mylist}
and mylist = Nil | Node of node
```
But any such mutual recursion must involve at least one variant or record type
that the recursion "goes through". For example:
```
type t = u and u = t (* Error: The type abbreviation t is cyclic *)
type t = U of u and u = T of t (* OK *)
```
## Parameterized variants
Variant types may be *parameterized* on other types. For example,
the `intlist` type above could be generalized to provide lists (coded
up ourselves) over any type:
```
type 'a mylist = Nil | Cons of 'a * 'a mylist
let lst3 = Cons (3, Nil) (* similar to [3] *)
let lst_hi = Cons ("hi", Nil) (* similar to ["hi"] *)
```
Here, `mylist` is a *type constructor* but not a type: there is no
way to write a value of type `mylist`. But we can write value of
type `int mylist` (e.g., `lst3`) and `string mylist` (e.g., `lst_hi`).
Think of a type constructor as being like a function, but one that
maps types to types, rather than values to value.
Here are some functions over `'a mylist`:
```
let rec length : 'a mylist -> int = function
| Nil -> 0
| Cons (_,t) -> 1 + length t
let empty : 'a mylist -> bool = function
| Nil -> true
| Cons _ -> false
```
Notice that the body of each function is unchanged from its previous
definition for `intlist`. All that we changed was the type annotation.
And that could even be omitted safely:
```
let rec length = function
| Nil -> 0
| Cons (_,t) -> 1 + length t
let empty = function
| Nil -> true
| Cons _ -> false
```
The functions we just wrote are an example of a language feature
called **parametric polymorphism**. The functions don't care what the `'a`
is in `'a mylist`, hence they are perfectly happy to work
on `int mylist` or `string mylist` or any other `(whatever) mylist`.
The word "polymorphism" is based on the Greek roots "poly" (many) and
"morph" (form). A value of type `'a mylist` could have many forms,
depending on the actual type `'a`.
As soon, though, as you place a constraint on what the type `'a` might be,
you give up some polymorphism. For example,
```
# let rec sum = function
| Nil -> 0
| Cons(h,t) -> h + sum t;;
val sum : int mylist -> int
```
The fact that we use the `(+)` operator with the head of the list
constrains that head element to be an `int`, hence all elements
must be `int`. That means `sum` must take in an `int mylist`, not any other
kind of `'a mylist`.
It is also possible to have multiple type parameters for a parameterized
type, in which case parentheses are needed:
```
# type ('a,'b) pair = {first: 'a; second: 'b};;
# let x = {first=2; second="hello"};;
val x : (int, string) pair = {first = 2; second = "hello"}
```
## OCaml's built-in variants
**OCaml's built-in list data type is really a recursive, parameterized
variant.** It's defined as follows:
```
type 'a list = [] | :: of 'a * 'a list
```
So `list` is really just a type constructor, with (value) constructors
`[]` (which we pronounce "nil") and `::` (which we pronounce "cons").
The only reason you can't write that definition yourself in your own
code is that the compiler restricts you to constructor names that begin
with initial capital letters and that don't contain any punctuation
(other than `_` and `'`).
**OCaml's built-in option data type is really a parameterized
variant.** It's defined as follows:
```
type 'a option = None | Some of 'a
```
So `option` is really just a type constructor, with (value) constructors
`None` and `Some`.
You can see both `list` and `option` defined in the [Pervasives module][pervasives]
of the standard library.
[pervasives]: http://caml.inria.fr/pub/docs/manual-ocaml/core.html
**OCaml's exception values are really extensible variants.**
All exception values have type `exn`, which is a variant
defined in the [Pervasives module][pervasives]. It's an unusual
kind of variant, though, called an *extensible* variant, which allows
new constructors of the variant to be defined after the variant type
itself is defined. See the OCaml manual for more information about
[extensible variants][extvar] if you're interested.
[extvar]: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251
## Exception semantics
Since they are just variants, the syntax and semantics of exceptions
is already covered by the syntax and semantics of variants—with
one exception (pun intended), which is the dynamic semantics of
how exceptions are raised and handled.
**Dynamic semantics.**
As we originally said, every OCaml expression either
* evaluates to a value
* raises an exception
* or fails to terminate (i.e., an "infinite loop").
So far we've only presented the part of the dynamic semantics that handles
the first of those three cases. What happens when we add exceptions?
Now, evaluation of an expression either produces a value or produces an
*exception packet*. Packets are not normal OCaml values; the only pieces
of the language that recognizes them are `raise` and `try`. The exception value
produced by (e.g.) `Failure "oops"` is part of the exception packet produced
by `raise (Failure "oops")`, but the packet contains more than just the exception value;
there can also be a stack trace, for example.
For any expression `e` other than `try`, if evaluation of a subexpression of `e`
produces an exception packet `P`, then evaluation of `e` produces packet `P`.
But now we run into a problem for the first time: what order are subexpressions
evaluated in? Sometimes the answer to that question is provided by the semantics
we have already developed. For example, with let expressions, we know that the
binding expression must be evaluated before the body expression. So the following
code raises `A`:
```
exception A
exception B
let x = raise A in raise B
```
And with functions, the argument must be evaluated before the function. So
the following code also raises `A`:
```
(raise B) (raise A)
```
It makes sense that both those pieces of code would raise the same exception,
given that we know `let x = e1 in e2` is syntactic sugar for `(fun x -> e2) e1`.
But what does the following code raise as an exception?
```
(raise A, raise B)
```
The answer is nuanced. The language specification does not stipulate what order the
components of pairs should be evaluated in. Nor did our semantics exactly determine
the order. (Though you would be forgiven if you thought it was left to right.)
So programmers actually cannot rely on that order. The current implementation of OCaml,
as it turns out, evaluates right to left. So the code above actually raises `B`.
If you really want to force the evaluation order, you need to use let expressions:
```
let a = raise A in
let b = raise B in
(a,b)
```
That code will raise `A`.
One interesting corner case is what happens when a raise expression itself has
a subexpression that raises:
```
exception C of string
exception D of string
raise (C (raise D "oops"))
```
That code ends up raising `D`, because the first thing that has to happen is
to evaluate `C (raise D "oops")` to a value. Doing that requires evaluating
`raise D "oops"` to a value. Doing that causes a packet containing `D "oops"`
to be produced, and that packet then propagates and becomes the result of
evaluating `C (raise D "oops")`, hence the result of evaluating
`raise (C (raise D "oops"))`.
Once evaluation of an expression produces an exception packet `P`, that packet
propagates until it reaches a `try` expression:
```
try e with
| p1 -> e1
| ...
| pn -> en
```
The exception value inside `P` is matched against the provided patterns using the
usual evaluation rules for pattern matching—with one exception
(again, pun intended). If none of the patterns matches, then instead of producing
`Match_failure` inside a new exception packet, the original exception packet `P`
continues propagating until the next `try` expression is reached.
**Pattern matching.** There is a pattern form for exceptions. Here's an example
of its usage:
```
match List.hd [] with
| [] -> "empty"
| h::t -> "nonempty"
| exception (Failure s) -> s
```
Note that the code is above is just a standard `match` expression, not a `try` expression.
It matches the value of `List.hd []` against the three provided patterns. As we know,
`List.hd []` will raise an exception containing the value `Failure "hd"`.
The *exception pattern* `exception (Failure s)` matches that value. So the above
code will evaluate to `"hd"`.
In general, exception patterns are a kind of syntactic sugar. Consider this code:
```
match e with
| p1 -> e1
| ...
| pn -> en
```
Some of the patterns `p1..pn` could be exception patterns of the form `exception q`.
Let `q1..qn` be that subsequence of patterns (without the `exception` keyword),
and let `r1..rm` be the subsequence of non-exception patterns. Then we can rewrite the
code as:
```
match
try e with
| q1 -> e1
| ...
| qn -> en
with
| r1 -> e1
| ...
| rm -> em
```
Which is to say: try evaluating `e`. If it produces an exception packet, use the
exception patterns from the original match expression to handle that packet.
If it doesn't produce an exception packet but instead produces a normal value,
use the non-exception patterns from the original match expression to match that value.
## Case study: Trees
Trees are another very useful data structure. Unlike lists, they are
not built into OCaml. A *binary tree*, as you'll recall from CS 2110, is
a node containing a value and two children that are trees. A binary tree
can also be an empty tree, which we also use to represent the absence of
a child node. In recitation you used a triple to represent a tree node:
```
type 'a tree =
| Leaf
| Node of 'a * 'a tree * 'a tree
```
Here, to illustrate something different, let's use a record type to represent a
tree node. In OCaml we have to define two mutually recursive types, one
to represent a tree node, and one to represent a (possibly empty) tree:
```
type 'a tree =
| Leaf
| Node of 'a node
and 'a node = {
value: 'a;
left: 'a tree;
right: 'a tree
}
```
The rules on when mutually recursive type declarations are legal are a
little tricky. Essentially, any cycle of recursive types must include at
least one record or variant type. Since the cycle between `'a tree` and
`'a node` includes both kinds of types, it's legal.
Here's an example tree:
```
(* represents
2
/ \
1 3 *)
let t =
Node {
value = 2;
left = Node {value=1; left=Leaf; right=Leaf};
right = Node {value=3; left=Leaf; right=Leaf}
}
```
We can use pattern matching to write the usual algorithms for
recursively traversing trees. For example, here is a recursive search
over the tree:
```
(* [mem x t] returns [true] if and only if [x] is a value at some
* node in tree [t].
*)
let rec mem x = function
| Leaf -> false
| Node {value; left; right} -> value = x || mem x left || mem x right
```
The function name `mem` is short for "member"; the standard library
often uses a function of this name to implement a search through a
collection data structure to determine whether some element is a member of that
collection.
Here's a function that computes the *preorder* traversal of a tree, in
which each node is visited before any of its children, by constructing
a list in which the values occur in the order in which they would
be visited:
```
let rec preorder = function
| Leaf -> []
| Node {value; left; right} -> [value] @ preorder left @ preorder right
```
Although the algorithm is beautifully clear from the code above, it takes
quadratic time on unbalanced trees because of the `@` operator. That
problem can be solved by introducing an extra argument `acc` to accumulate
the values at each node, though at the expense of making the code less clear:
```
let preorder_lin t =
let rec pre_acc acc = function
| Leaf -> acc
| Node {value; left; right} -> value :: (pre_acc (pre_acc acc right) left)
in pre_acc [] t
```
The version above uses exactly one `::` operation per `Node` in the tree,
making it linear time.
## Case study: Natural numbers
We can define a recursive variant that acts like numbers, demonstrating
that we don't really have to have numbers built into OCaml! (For sake
of efficiency, though, it's a good thing they are.)
A *natural number* is either *zero* or the *successor* of some other
natural number. This is how you might define the natural numbers in a
mathematical logic course, and it leads naturally to the
following OCaml type `nat`:
```
type nat = Zero | Succ of nat
```
We have defined a new type `nat`, and `Zero` and `Succ` are
constructors for values of this type. This allows us to
build expressions that have an arbitrary number of nested `Succ`
constructors. Such values act like natural numbers:
```
let zero = Zero
let one = Succ zero
let two = Succ one
let three = Succ two
let four = Succ three
```
When we ask the compiler what `four` is, we get
```
# four;;
- : nat = Succ (Succ (Succ (Succ Zero)))
```
Now we can write functions to manipulate values of this type.
We'll write a lot of type annotations in the code below to help the reader
keep track of which values are `nat` versus `int`; the compiler, of course,
doesn't need our help.
```
let iszero (n : nat) : bool =
match n with
| Zero -> true
| Succ m -> false
let pred (n : nat) : nat =
match n with
| Zero -> failwith "pred Zero is undefined"
| Succ m -> m
```
Similarly we can define a function to add two numbers:
```
let rec add (n1:nat) (n2:nat) : nat =
match n1 with
| Zero -> n2
| Succ n_minus_1 -> add n_minus_1 (Succ n2)
```
We can convert `nat` values to type `int` and vice-versa:
```
let rec int_of_nat (n:nat) : int =
match n with
| Zero -> 0
| Succ m -> 1 + int_of_nat m
let rec nat_of_int(i:int) : nat =
if i < 0 then failwith "nat_of_int is undefined on negative ints"
else if i = 0 then Zero
else Succ (nat_of_int (i-1))
```
To determine whether a natural number is even or odd, we can write a
pair of *mutually recursive* functions:
```
let rec
even (n:nat) : bool =
match n with
| Zero -> true
| Succ m -> odd m
and
odd (n:nat) : bool =
match n with
| Zero -> false
| Succ m -> even m
```
You have to use the keyword `and` to combine mutually recursive
functions like this. Otherwise the compiler would flag an error when you
refer to `odd` before it has been defined.
## Summary
Variants are a powerful language feature. They are the workhorse
of representing data in a functional language. OCaml variants actually combine
several theoretically independent language features into one: sum types,
product types, recursive types, and parameterized (polymorphic) types. The result
is an ability to express many kinds of data, including lists, options, trees,
and even exceptions.
## Terms and concepts
* algebraic data type
* binary trees as variants
* carried data
* catch-all cases
* constant constructor
* constructor
* exception
* exception as variants
* exception packet
* exception pattern
* exception value
* leaf
* lists as variants
* mutually recursive functions
* natural numbers as variants
* node
* non-constant constructor
* options as variants
* order of evaluation
* parameterized variant
* parametric polymorphism
* recursive variant
* tag
* type constructor
* type synonym
## Further reading
* *Introduction to Objective Caml*, chapters 6 and 7
* *OCaml from the Very Beginning*, chapters 7, 10, and 11
* *Real World OCaml*, chapters 6 and 7