Variants

# Variants * * * <i> Topics: * type synonyms * variants * catch-all cases * recursive variants * parameterized variants * built-in types that are variants * exceptions * trees * natural numbers </i> * * * ## Type synonyms A *type synonym* is a new name for an already existing type. For example, here are some type synonyms that might be useful in representing some types from linear algebra: ``` type point = float * float type vector = float list type matrix = float list list ``` Anywhere that a `float*float` is expected, you could use `point`, and vice-versa. The two are completely exchangeable for one another. In the following code, `getx` doesn't care whether you pass it a value that is annotated as one vs. the other: ``` let getx : point -> float = fun (x,_) -> x let pt : point = (1.,2.) let floatpair : float*float = (1.,3.) let one = getx pt let one' = getx floatpair ``` Type synonyms are useful because they let us give descriptive names to complex types. They are a way of making code more self-documenting. ## Variants Thus far, we have seen variants simply as enumerating a set of constant values, such as: ``` type day = Sun | Mon | Tue | Wed | Thu | Fri | Sat type ptype = TNormal | TFire | TWater type peff = ENormal | ENotVery | Esuper ``` But variants are far more powerful that this. Our main goal today is to see all the various things that variants can do. As a running example, here is a variant type that does more than just enumerate values: ``` type shape = | Point of point | Circle of point * float (* center and radius *) | Rect of point * point (* lower-left and upper-right corners *) ``` This type, `shape`, represents a shape that is either a point, a circle, or a rectangle. A point is represented by a constructor `Point` that *carries* some additional data, which is a value of type `point`. A circle is represented by a constructor `Circle` that carries a pair of type `point * float`, which according to the comment represents the center of the circle and its radius. A rectangle is represented by a constructor `Rect` that carries a pair of type `point*point`. Here are a couple functions that use the `shape` type: ``` let area = function | Point _ -> 0.0 | Circle (_,r) -> pi *. (r ** 2.0) | Rect ((x1,y1),(x2,y2)) -> let w = x2 -. x1 in let h = y2 -. y1 in w *. h let center = function | Point p -> p | Circle (p,_) -> p | Rect ((x1,y1),(x2,y2)) -> ((x2 -. x1) /. 2.0, (y2 -. y1) /. 2.0) ``` The `shape` variant type is the same as those we've seen before in that it is defined in terms of a collection of constructors. What's different than before is that those constructors carry additional data along with them. Every value of type `shape` is formed from exactly one of those constructors. Sometimes we call the constructor a *tag*, because it tags the data it carries as being from that particular constructor. Variant types are sometimes called *tagged unions*. Every value of the type is from the set of values that is the union of all values from the underlying types that the constructor carries. For the `shape` type, every value is tagged with either `Point` or `Circle` or `Rect` and carries a value from the set of all `point` valued unioned with the set of all `point*float` values unioned with the set of all `point*point` values. Another name for these variant types is an *algebraic data type*. "Algebra" here refers to the fact that variant types contain both sum and product types, as defined in the previous lecture. The sum types come from the fact that a value of a variant is formed by *one of* the constructors. The product types come from that fact that a constructor can carry tuples or records, whose values have a sub-value from *each of* their component types. Using variants, we can express a type that represents the union of several other types, but in a type-safe way. Here, for example, is a type that represents either a `string` or an `int`: ``` type string_or_int = | String of string | Int of int ``` If we wanted to, we could use this type to code up lists (e.g.) that contain either strings or ints: ``` type string_or_int_list = string_or_int list let rec sum : string_or_int list -> int = function | [] -> 0 | (String s)::t -> int_of_string s + sum t | (Int i)::t -> i + sum t let three = sum [String "1"; Int 2] ``` Variants thus provide a type-safe way of doing something that might before have seemed impossible. Variants also make it possible to discriminate which tag a value was constructed with, even if multiple constructors carry the same type. For example: ``` type t = Left of int | Right of int let x = Left 1 let double_right = function | Left i -> i | Right i -> 2*i ``` **Syntax.** To define a variant type: ``` type t = C1 [of t1] | ... | Cn [of tn] ``` The square brackets above denote the the `of ti` is optional. Every constructor may individually either carry no data or carry date. We call constructors that carry no data *constant*; and those that carry data, *non-constant*. To write an expression that is a variant: ``` C e ---or--- C ``` depending on whether the constructor name `C` is non-constant or constant. **Dynamic semantics.** * if `e==>v` then `C e ==> C v`, assuming `C` is non-constant. * `C` is already a value, assuming `C` is constant. **Static semantics.** * if `t = ... | C | ...` then `C : t`. * if `t = ... | C of t' | ...` and if `e : t'` then `C e : t`. **Pattern matching.** We add the following new pattern form to the list of legal patterns: * `C p` And we extend the definition of when a pattern matches a value and produces a binding as follows: * If `p` matches `v` and produces bindings \\(b\\), then `C p` matches `C v` and produces bindings \\(b\\). ## Catch-all cases One thing to beware of when pattern matching against variants is what *Real World OCaml* calls "catch-all cases". Here's a simple example of what can go wrong. Let's suppose you write this variant and function: ``` type color = Blue | Red (* a thousand lines of code in between *) let string_of_color = function | Blue -> "blue" | _ -> "red" ``` Seems fine, right? But then one day you realize there are more colors in the world. You need to represent green. So you go back and add green to your variant: ``` type color = Blue | Red | Green ``` But because of the thousand lines of code in between, you forget that `string_of_color` needs updating. And now, all the sudden, you are red-green color blind: ``` # string_of_color Green - : string = "red" ``` The problem is the *catch-all* case in the pattern match inside `string_of_color`: the final case that uses the wildcard pattern to match anything. Such code is not robust against future changes to the variant type. If, instead, you had originally coded the function as follows, life would be better: ``` let string_of_color = function | Blue -> "blue" | Red -> "red" ``` Now, when you change `color` to add the `Green` constructor, the OCaml type checker will discover and alert you that you haven't yet updated `string_of_color` to account for the new constructor: ``` Warning 8: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: Green ``` The moral of the story is: catch-all cases lead to buggy code. Avoid using them. ## Recursive variants Variant types may mention their own name inside their own body. For example, here is a variant type that could be used to represent something similar to `int list`: ``` type intlist = Nil | Cons of int * intlist let lst3 = Cons (3, Nil) (* similar to 3::[] or [3]*) let lst123 = Cons(1, Cons(2, l3)) (* similar to [1;2;3] *) let rec sum (l:intlist) : int= match l with | Nil -> 0 | Cons(h,t) -> h + sum t let rec length : intlist -> int = function | Nil -> 0 | Cons (_,t) -> 1 + length t let empty : intlist -> bool = function | Nil -> true | Cons _ -> false ``` Notice that in the definition of `intlist`, we define the `Cons` constructor to carry a value that contains an `intlist`. This makes the type `intlist` be *recursive*: it is defined in terms of itself. Record types may also be recursive, but plain old type synonyms may not be: ``` type node = {value:int; next:node} (* OK *) type t = t*t (* Error: The type abbreviation t is cyclic *) ``` Types may be mutually recursive if you use the `and` keyword: ``` type node = {value:int; next:mylist} and mylist = Nil | Node of node ``` But any such mutual recursion must involve at least one variant or record type that the recursion "goes through". For example: ``` type t = u and u = t (* Error: The type abbreviation t is cyclic *) type t = U of u and u = T of t (* OK *) ``` ## Parameterized variants Variant types may be *parameterized* on other types. For example, the `intlist` type above could be generalized to provide lists (coded up ourselves) over any type: ``` type 'a mylist = Nil | Cons of 'a * 'a mylist let lst3 = Cons (3, Nil) (* similar to [3] *) let lst_hi = Cons ("hi", Nil) (* similar to ["hi"] *) ``` Here, `mylist` is a *type constructor* but not a type: there is no way to write a value of type `mylist`. But we can write value of type `int mylist` (e.g., `lst3`) and `string mylist` (e.g., `lst_hi`). Think of a type constructor as being like a function, but one that maps types to types, rather than values to value. Here are some functions over `'a mylist`: ``` let rec length : 'a mylist -> int = function | Nil -> 0 | Cons (_,t) -> 1 + length t let empty : 'a mylist -> bool = function | Nil -> true | Cons _ -> false ``` Notice that the body of each function is unchanged from its previous definition for `intlist`. All that we changed was the type annotation. And that could even be omitted safely: ``` let rec length = function | Nil -> 0 | Cons (_,t) -> 1 + length t let empty = function | Nil -> true | Cons _ -> false ``` The functions we just wrote are an example of a language feature called **parametric polymorphism**. The functions don't care what the `'a` is in `'a mylist`, hence they are perfectly happy to work on `int mylist` or `string mylist` or any other `(whatever) mylist`. The word "polymorphism" is based on the Greek roots "poly" (many) and "morph" (form). A value of type `'a mylist` could have many forms, depending on the actual type `'a`. As soon, though, as you place a constraint on what the type `'a` might be, you give up some polymorphism. For example, ``` # let rec sum = function | Nil -> 0 | Cons(h,t) -> h + sum t;; val sum : int mylist -> int ``` The fact that we use the `(+)` operator with the head of the list constrains that head element to be an `int`, hence all elemtents must be `int`. That means `sum` must take in an `int mylist`, not any other kind of `'a mylist`. It is also possible to have multiple type parameters for a parameterized type, in which case parentheses are needed: ``` # type ('a,'b) pair = {first: 'a; second: 'b};; # let x = {first=2; second="hello"};; val x : (int, string) pair = {first = 2; second = "hello"} ``` ## OCaml's built-in variants **OCaml's built-in list data type is really a recursive, parameterized variant.** It's defined as follows: ``` type 'a list = [] | :: of 'a * 'a list ``` So `list` is really just a type constructor, with (value) constructors `[]` (which we pronounce "nil") and `::` (which we pronounce "cons"). The only reason you can't write that definition yourself in your own code is that the compiler restricts you to constructor names that begin with initial capital letters and that don't contain any punctuation (other than `_` and `'`). **OCaml's built-in option data type is really a parameterized variant.** It's defined as follows: ``` type 'a option = None | Some of 'a ``` So `option` is really just a type constructor, with (value) constructors `None` and `Some`. You can see both `list` and `option` defined in the [Pervasives module][pervasives] of the standard library. [pervasives]: http://caml.inria.fr/pub/docs/manual-ocaml/core.html **OCaml's exception values are really extensible variants.** All exception values have type `exn`, which is a variant defined in the [Pervasives module][pervasives]. It's an unusual kind of variant, though, called an *extensible* variant, which allows new constructors of the variant to be defined after the variant type itself is defined. See the OCaml manual for more information about [extensible variants][extvar] if you're interested. [extvar]: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251 ## Exception semantics Since they are just variants, the syntax and semantics of exceptions is already covered by the syntax and semantics of variants—with one exception (pun intended), which is the dynamic semantics of how exceptions are raised and handled. This is a sufficiently tricky topic that here we'll only sketch what happens. Take CS 4110 or 6110 if you want the details. **Dynamic semantics.** As we originally said, every OCaml expression either * evaluates to a value * raises an exception * or fails to terminate (i.e., an "infinite loop"). So far we've only presented the part of the dynamic semantics that handles the first of those three cases. What happens when we add exceptions? Now, evaluation of an expression either produces a value or produces an *exception packet*. Packets are not normal OCaml values; the only pieces of the language that recognizes them are `raise` and `try`. The exception value produced by (e.g.) `Failure "oops"` is part of the exception packet produced by `raise (Failure "oops")`, but the packet contains more than just the exception value; there can also be a stack trace, for example. For any expression `e` other than `try`, if evaluation of a subexpression of `e` produces an exception packet `P`, then evaluation of `e` produces packet `P`. But now we run into a problem for the first time: what order are subexpressions evaluated in? Sometimes the answer to that question is provided by the semantics we have already developed. For example, with let expressions, we know that the binding expression must be evaluated before the body expression. So the following code raises `A`: ``` exception A exception B let x = raise A in raise B ``` And with functions, the argument must be evaluated before the function. So the following code also raises `A`: ``` (raise B) (raise A) ``` It makes sense that both those pieces of code would raise the same exception, given that we know `let x = e1 in e2` is syntactic sugar for `(fun x -> e2) e1`. But what does the following code raise as an exception? ``` (raise A, raise B) ``` The answer is nuanced. The language specification does not stipulate what order the components of pairs should be evaluated in. Nor did our semantics exactly determine the order. (Though you would be forgiven if you thought it was left to right.) So programmers actually cannot rely on that order. The current implementation of OCaml, as it turns out, evaluates right to left. So the code above actually raises `B`. If you really want to force the evaluation order, you need to use let expressions: ``` let a = raise A in let b = raise B in (a,b) ``` That code will raise `A`. One interesting corner case is what happens when a raise expression itself has a subexpression that raises: ``` exception C of string exception D of string raise (C (raise D "oops")) ``` That code ends up raising `D`, because the first thing that has to happen is to evaluate `C (raise D "oops")` to a value. Doing that requires evaluating `raise D "oops"` to a value. Doing that causes a packet containing `D "oops"` to be produced, and that packet then propagates and becomes the result of evaluating `C (raise D "oops")`, hence the result of evaluating `raise (C (raise D "oops"))`. Once evaluation of an expression produces an exception packet `P`, that packet propagates until it reaches a `try` expression: ``` try e with | p1 -> e1 | ... | pn -> en ``` The exception value inside `P` is matched against the provided patterns using the usual evaluation rules for pattern matching—with one exception (again, pun intended). If none of the patterns matches, then instead of producing `Match_failure` inside a new exception packet, the original exception packet `P` continues propagating until the next `try` expression is reached. **Pattern matching.** There is a pattern form for exceptions. Here's an example of its usage: ``` match List.hd [] with | [] -> "empty" | h::t -> "nonempty" | exception (Failure s) -> s ``` Note that the code is above is just a standard `match` expression, not a `try` expression. It matches the value of `List.hd []` against the three provided patterns. As we know, `List.hd []` will raise an exception containing the value `Failure "hd"`. The *exception pattern* `exception (Failure s)` matches that value. So the above code will evaluate to `"hd"`. In general, exception patterns are a kind of syntactic sugar. Consider this code: ``` match e with | p1 -> e1 | ... | pn -> en ``` Some of the patterns `p1..pn` could be exception patterns of the form `exception q`. Let `q1..qn` be that subsequence of patterns (without the `exception` keyword), and let `r1..rm` be the subsequence of non-exception patterns. Then we can rewrite the code as: ``` match try e with | q1 -> e1 | ... | qn -> en with | r1 -> e1 | ... | rm -> em ``` Which is to say: try evaluating `e`. If it produces an exception packet, use the exception patterns from the original match expression to handle that packet. If it doesn't produce an exception packet but instead produces a normal value, use the non-exception patterns from the original match expression to match that value. ## Case study: Trees Trees are another very useful data structure. Unlike lists, they are not built into OCaml. A *binary tree*, as you'll recall from CS 2110, is a node containing a value and two children that are trees. A binary tree can also be an empty tree, which we also use to represent the absence of a child node. In recitation you used a triple to represent a tree node: ``` type 'a tree = | Leaf | Node of 'a * 'a tree * 'a tree ``` Here, to illustrate something different, let's use a record type to represent a tree node. In OCaml we have to define two mutually recursive types, one to represent a tree node, and one to represent a (possibly empty) tree: ``` type 'a tree = | Leaf | Node of 'a node and 'a node = { value: 'a; left: 'a tree; right: 'a tree } ``` The rules on when mutually recursive type declarations are legal are a little tricky. Essentially, any cycle of recursive types must include at least one record or variant type. Since the cycle between `'a tree` and `'a node` includes both kinds of types, it's legal. Here's an example tree: ``` (* represents 2 / \ 1 3 *) let t = Node { value = 2; left = Node {value=1; left=Leaf; right=Leaf}; right = Node {value=3; left=Leaf; right=Leaf} } ``` We can use pattern matching to write the usual algorithms for recursively traversing trees. For example, here is a recursive search over the tree: ``` (* [mem x t] returns `true` if and only if [x] is a value at some * node in tree [t]. *) let rec mem x = function | Leaf -> false | Node {value; left; right} -> value = x || mem x left || mem x right ``` The function name `mem` is short for "member"; the standard library often uses a function of this name to implement a search through a collection data structure to determine whether some element is a member of that collection. Here's a function that computes the *preorder* traversal of a tree, in which each node is visited before any of its children, by constructing a list in which the values occur in the order in which they would be visited: ``` let rec preorder = function | Leaf -> [] | Node {value; left; right} -> [value] @ preorder left @ preorder right ``` Although the algorithm is beautifully clear from the code above, it takes quadratic time on unbalanced trees because of the `@` operator. That problem can be solved by introducing an extra argument `acc` to accumulate the values at each node, though at the expense of making the code less clear: ``` let preorder_lin t = let rec pre_acc acc = function | Leaf -> acc | Node {value; left; right} -> value :: (pre_acc (pre_acc acc right) left) in pre_acc [] t ``` The version above uses exactly one `::` operation per `Node` in the tree, making it linear time. ## Case study: Natural numbers We can define a recursive variant that acts like numbers, demonstrating that we don't really have to have numbers built into OCaml! (For sake of efficiency, though, it's a good thing they are.) A *natural number* is either *zero* or the *successor* of some other natural number. This is how you might define the natural numbers in a mathematical logic course, and it leads naturally to the following OCaml type `nat`: ``` type nat = Zero | Succ of nat ``` We have defined a new type `nat`, and `Zero` and `Succ` are constructors for values of this type. This allows us to build expressions that have an arbitrary number of nested `Succ` constructors. Such values act like natural numbers: ``` let zero = Zero let one = Succ Zero let two = Succ one let three = Succ two let four = Succ three ``` When we ask the compiler what `four` is, we get ``` # four;; - : nat = Succ (Succ (Succ (Succ Zero))) ``` Now we can write functions to manipulate values of this type. We'll write a lot of type annotations in the code below to help the reader keep track of which values are `nat` versus `int`; the compiler, of course, doesn't need our help. ``` let iszero (n : nat) : bool = match n with | Zero -> true | Succ(m) -> false let pred (n : nat) : nat = match n with | Zero -> failwith "pred Zero is undefined" | Succ(m) -> m ``` Similarly we can define a function to add two numbers: ``` let rec add (n1:nat) (n2:nat) : nat = match n1 with | Zero -> n2 | Succ(n_minus_1) -> add n_minus_1 (Succ n2) ``` We can convert `nat` values to type `int` and vice-versa: ``` let rec int_of_nat (n:nat) : int = match n with | Zero -> 0 | Succ(m) -> 1 + int_of_nat m let rec nat_of_int(i:int) : nat = if i < 0 then failwith "nat_of_int is undefined on negative ints" else if i = 0 then Zero else Succ(nat_of_int(i-1)) ``` To determine whether a natural number is even or odd, we can write a pair of *mutually recursive* functions: ``` let rec even(n:nat) : bool = match n with | Zero -> true | Succ m -> odd m and odd (n:nat) : bool = match n with | Zero -> false | Succ m -> even m ``` You have to use the keyword `and` to combine mutually recursive functions like this. Otherwise the compiler would flag an error when you refer to `odd` before it has been defined. ## Summary Variants are a powerful language feature. They are the workhorse of representing data in a functional language. OCaml variants actually combine several theoretically independent language features into one: sum types, product types, recursive types, and parameterized (polymorphic) types. The result is an ability to express many kinds of data, including lists, options, trees, and even exceptions. ## Terms and concepts * algebraic data type * binary trees as variants * carried data * catch-all cases * constant constructor * constructor * exception * exception as variants * exception packet * exception pattern * exception value * leaf * lists as variants * mutually recursive functions * natural numbers as variants * node * non-constant constructor * options as variants * order of evaluation * parameterized variant * parametric polymorphism * recursive variant * tag * type constructor * type synonym ## Further reading * *Introduction to Objective Caml*, chapters 6 and 7 * *OCaml from the Very Beginning*, chapters 7, 10, and 11 * *Real World OCaml*, chapters 6 and 7