Recitation 4: Datatype pitfalls, polymorphism, lists

Datatype constructors, binding and use occurrences

By convention, data constructors (Penny, Nickel, etc. in the example below) start with capital letters. There are a few exceptions, such as true and false.

type coin = Penny | Nickel | Dime | Quarter

By convention, variables (value and c in the example) start with lower case letters.

let value (c : coin) : float =
  match c with
    Penny -> 0.01
  | Nickel -> 0.05
  | Dime -> 0.10
  | Quarter -> 0.25

The example above is a typical match expression. But consider the following variant. It looks like it should be the same, but when we compile this function, OCaml complains about redundant patterns.

let bad_value (c : coin) : float =
  let penny = Penny in
    match c with
      penny -> 0.01
    | Nickel -> 0.05
    | Dime -> 0.10
    | Quarter -> 0.25

Why? After all, isn't this equivalent to the following?

let bad_value2 (c : coin) : float =
  let penny = Penny in
    if c = penny then 0.01
    else if c = Nickel then 0.05
    else if c = Dime then 0.10
    else if c = Quarter then 0.25
    else raise (Failure "impossible!")

No! Actually, it's more like

let bad_value2 (c : coin) : float =
  let penny = Penny in
    match c with
      random_variable_name -> 0.01
    | Nickel -> 0.05
    | Dime -> 0.10
    | Quarter -> 0.25

or even

let bad_value3 (c : coin) : float =
  let penny = Penny in
    match c with
      _ -> 0.01
    | Nickel -> 0.05
    | Dime -> 0.10
    | Quarter -> 0.25

In a match expression, if there is a data constructor C on the left-hand side of the -> in one of the patterns, then we are comparing to see if the value of the expression e we are trying to match is C. But if it is a variable name, then we're declaring a new, fresh instance of the variable and binding it to the value of e. So any patterns below this are redundant, because this match will never fail.

An occurrence of an identifier in an expression can be either a binding occurrence or a use occurrence. For example, the following are all examples of binding occurrences of the identifier id:

let id = e (* a value declaration *)
let id = e in ... (* a value declaration *)
let id (arg : s) : t = e (* a function declaration *)
let id (arg1 : s1) (arg2 : s2) : t = e (* a function declaration *)
match e with
  id -> ... (* a pattern match *)

In contrast, just about anything else, for example

if id = e then ... else ...
id + 3

is a use occurrence of id. In these occurrences, id is evaluated, and its value is the value that was bound to it in the nearest binding occurrence of id, be it as a function argument, function declaration, value declaration, or pattern match.

In the example

let bad_value (c : coin) : float =
  let penny = Penny in
    match c with
      penny -> 0.01
    | Nickel -> 0.05
    | Dime -> 0.10
    | Quarter -> 0.25

in the first pattern of the match, OCaml does not look to see that penny was previously bound to Penny. The occurrence of penny in the match expression is also a binding occurrence. The match will succeed, and it will bind penny anew to the value of c.

Why do things this way? The most important reason is that binding identifiers in patterns is an extremely useful device that allows concise, elegant code. It is possible to simultaneously bind several identifiers in one pattern. For example,

match e with
  Add (x, y) :: t -> x + y
| ...

simultaneously binds the three identifiers x, y, and t, which can then be used in the expression on the right-hand side. If you do not need a value, use the wildcard _ in the pattern, which matches anything. For example,

match e with
  Add (x, y) :: _ -> x + y
| ...

If we wanted to allow use occurrences of identifiers in patterns, we would need a way to distinguish them from binding occurrences. Currently there is no way to do this in OCaml.

Here is a puzzle to illustrate the difference between binding and use occurrences. What is the value of the following expression?

(*1*)   let f ((x : int), (y : int)) : int =
(*2*)     let x = y in
(*3*)     let y = x in
(*4*)     let (y, x) = (x, y * y) in
(*5*)     match (y, x) with
(*6*)       (x, 1) -> 0
(*7*)     | (x, y) -> x
(*8*)   in f (2, 3)

Figuring this out isn't easy, but here's how to think about it. Let's refer to the values bound to the different binding occurrences of the variables by their line numbers. So x1 means the value bound to x in line 1.


Using Polymorphism

The list type

Because lists are so useful, OCaml provides a builtin parameterized list type called list. It acts just like the List type that we defined in lecture except that the names of the constructors are changed. The constructor [] makes an empty list (compare to Nil) and the constructor :: builds a new list by prepending a first element to another list (compare to Cons). It is as if list were declared as:

type 'a list = [] | :: of 'a * 'a list

(although we can't really do this because of OCaml naming conventions). The constructor :: is an infix operator, which is notationally convenient. The OCaml interpreter knows how to print out lists nicely as well. The empty list is printed as [], and nonempty lists are printed using brackets with semicolon-separated items. These forms may be used to write lists as well. Note that [] is a polymorphic value of type 'a list; it serves as the empty list for all types T list. Here are some examples that show how lists work:

[];;
- : 'a list = []
let it = 2 :: [];;
val it : int list = [2]
let both = 1 :: it;;
val both : int list = [1; 2]
let both2 =
match both with
  x :: lst -> lst 
| [] -> [];;
val both2 : int list = [2]
let both3 =
match both2 with
  x :: lst -> lst 
| [] -> [];;
(* we don't "recover polymorphism" here; it would be unsafe in general *)
val both3 : int list = []
both = 1 :: 2 :: [];; 
(* we can test lists for equality if we can test their elements *)
- : bool = true
match both with
  [x; y] -> x + y (* we can use bracket notation in patterns *)
| _ -> 0;;
- : int = 3
[[]];;
- : 'a list list = [[]]

Just like with types, we have to make sure that we write exhaustive patterns in match expressions:

match ["hello"; "goodbye"] with
  s :: _ -> s ^ " hello";;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]

Built-in lists come with lots of useful predefined OCaml library functions, such as the following and many more:

val List.length : 'a list -> int
val @ : ('a list * 'a list) -> 'a list		(* append two lists *)
val List.hd : 'a list -> 'a
val List.tl : 'a list -> 'a list
val List.nth : ('a list * int) -> 'a

Of course, all of these functions could also be easily implemented for the List type that we defined ourselves.

Multiple type parameters

We saw two related features of OCaml in class:

Polymorphic values are typically function values, but other polymorphic values exist, such as [] (and also Nil, as we defined it). Datatypes can actually be parameterized with respect to multiple type parameters. For example, the following type, ortype, is a type-level function that accepts a pair of types and yields a new type:

# type ('a, 'b) ortype = Left of 'a | Right of 'b | Both of 'a * 'b;;
type ('a, 'b) ortype = Left of 'a | Right of 'b | Both of 'a * 'b
# Left 2;;
- : (int, 'a) ortype = Left 2
# Right "hello";;
- : ('a, string) ortype = Right "hello"
# Both (true, 'a');;
- : (bool, char) ortype = Both (true, 'a')

Note that the values Left 2 and Right "hello" are still polymorphic with respect to one type. OCaml always starts counting type variables from 'a, hence the - : (int, 'a) ortype rather than - : (int, 'b) ortype in the match for Left 2 above.

The option parameterized type

Another important standard parameterized type is option, which represents the possible presence of a value. It is defined as follows:

type 'a option = Some of 'a | None

Options are commonly used when no useful value of a particular type makes sense. This corresponds to some uses of the value null in Java (None acts like null). The difference is that the use of option is type-safe. There is no danger of a runtime null pointer exception as long as you use pattern-matching with option, because the type system forces you to account for the possibility of None. Pattern matching should be used to check for and extract the value. A more detailed description of option is available in the OCaml Library documentation.