CS312 Lecture 3
Tuples. Records. Lists. Recursive Datatypes. Pattern Matching.

Administrivia

You should be able to log in to CMS now. If not, email the TAs, they will add you.

Watch the web page and the newsgroup for announcements - while we will usually post announcements in both places; there will be exceptions.

As previously announced, the third section will convene on MW from 4:40 to 5:30 pm in 202 Thurston. 

We had a software demo/install help session yesterday - we'll have one more late tonight.

This will be whirlwind through some of the most important features of SML. You have seen some of these in section yesterday.

Tuples

Tuples group together a fixed number of values. If a tuple contains n elements, then we call it an n-tuple. SML recognizes 0-tuples, but it does not have 1-tuples. A tuple's elements do not have to be of the same type; they are separated by commas, and surrounded by parantheses: (), (3, "aha!"), (one, (two, three)). If the elements of a tuple have, successively, types t1, t2, ..., tn, then the type of the corresponding tuple is t1*t2* ...* tn. If t1 is not equal to t2, then tuples of type t1*t2 are not of the same type as tuples of type t2*t1; order counts.

You can select the nth data element from a tuple using the #n operator: #2 (1, "two", 3.0) = "two".

Records

Records are similar to tuples, in that they group together various data elements. A record has fields, and these are named. While tuples are an ordered collection of data values, a tuple is an unordered collection of labeled data values.

Here is one example:

{commonName="wolf", latinName="Canis Lupus L.", range="Yellowstone", population=174}

The type of a record with labels l1, l2, ..., ln, and associated types t1, t2, ..., tn is written as {l1:t1, l2:t2, ..., ln:tn}; the order of fields does not count, e.g. the type  {l1:t1, l2:t2} is the same as the type {l2:t2, l1:t1}.

Lists

The list datatype encapsulates the concept of a sequence of 0 or more data values of the same type. Lists are a simple example of recursive datatype. Indeed, a list can best be defined in terms of itself. To make things concrete, think of a list of integers. A list of integers could be (a) an empty list, or (b) a non-empty list consisting of at least one element. In the latter case, the list consists of a first element (the head of the list), that is prepended to (a shorter, possibly empty) integer list. So a list is ... a list. The definition seems circular, and it some sense it is. We are saved by the fact that for any given finite list the process of describing a list in terms of a list having one fewer elements is finite.

Here are a few examples of lists and list operations:

- val empty = [];
val empty = [] : 'a list
- hd empty;
uncaught exception Empty
  raised at: boot/list.sml:36.38-36.43
- tl empty;
uncaught exception Empty
  raised at: boot/list.sml:37.38-37.43
- rev empty;
val it = [] : ?.X1 list
- length empty;
val it = 0: int;
- null [];
val it = true : bool


- val l = [1, 2, 3, 4];
val l = [1,2,3,4] : int list
- hd l;
val it = 1 : int
- tl l;
val it = [2,3,4] : int list
- rev l;
val it = [4,3,2,1] : int list
- length l;
val it = 4 : int
- null l;
val it = false : bool

In the output above we have deleted some warnings. Note the following:

There are more list operations defined in SML; study them, the more you know, the more powerful programs you will be able to write.

Recursive Datatypes

Lists are not the only recursive data types, indeed, recursive datatypes are pervasive. We can declare such data types ourselves with the help of the datatype declaration (brackets indicate optional components):

datatype Y = X1 [of t1| ... Xn [of tn]

Here Xi datatype constructors.

Not all types declared in a datatype declaration need to be recursive:

datatype colors = WHITE | RED | GREEN | BLUE | BLACK
datatype merged = INT of int | REAL of real

The second declaration creates a type that is, in effect,  the union of the int and real types. Examples of values of this type are INT(3)  and  REAL(7.14).

Let us now define the int list type. We will later learn how to define a list type of the same generality as SML's list; for now, we must fix the list element's type.

datatype intlist = Empty | LIST of int * intlist

List  [1, 2, 3, 4] can now be represented as LIST[1, LIST[2, LIST[3, LIST[4, Empty]]]].

We will see many other recursive datatype declarations in the course.

Pattern Matching

We know how to put together tuples, records, lists, and custom non-recursive and recursive datatypes. We even know how to access the elements of a tuple, record, or a list. But how do we access the components of a LIST? Pattern matching offers a powerful solution to this problem. As a bonus, pattern matching will also simplify our access to tuple, record, and list elements.

In most cases, we will use pattern matching in combination with case expressions. The BNF definition of case expressions is

case e of p1=>e| ... | pn=>en,

where e and ei are expressions, and pi are patterns. The BNF definition of  patterns is

p ::= _ | c  | x | (p1,..., pn) | {x1= p1,...,xn= pn} | [] | p1::p2 | X | X(p),

where p and pi are patterns, c is a constant, x is an identifier, xi are record field names, and X is a datatype constructor. 

The pattern denoted by _ (the underscore character) denotes an indifferent value (a value ignored when checking for a match). The pattern containing the :: operator is a list pattern, in this case p1 is the pattern for the head of the list, and p2 is the pattern for the tail of the list.

An important feature of SML patterns is that they can be nested: a pattern can be present inside another pattern.

Patterns are used as template against which values are matched. Values are defined as

v ::= c  |  (v1,...,vn) |  {x1=v1, ..., xn=vn}  |  [v1, ..., vn] | X | X(v),

where v and vi are values, c is a constant, xi are record field names, and X is a datatype constructor. 

Here are the rules of pattern matching; they must be applied recursively:

Patterns are tested in the order in which they are written; the result of the case statement is the value of the expression corresponding to the first pattern that matches.

When performing pattern matching in a case expression, SML has the ability to check whether the cases that have been listed are exhaustive or not. Many subtle errors are avoided because the programmer is made aware of a non-exhaustive match.

To clarify the ideas above, let us examine a few examples:

(* return true iff the length of the arguments list is at least two *)
fun atLeastTwo(l: int list): bool = 
  case l of
    [] => false
  | _::[] => false
  | _ => true

Note that we don't really care for the values here, all we are interested in is the structure of the argument. Note that pattern _::[] could have also been written as [_].

(* is at least one of the tuple's components equal to 0? *)
fun atLeastOneZero(tuple: int * int): bool = 
  case tuple of
    (0, _) => true
  | (_, 0) => true
  | _ => false

In this example we use the fact that patterns are tested in the order in which they are written: if the first component of the argument tuple is 0, then the first pattern will match, if the second component is 0, then the second pattern will match. If none of the first two patterns match, then the tuple contains no zeros at all, we can return false without examining the argument.

Here is a simple pattern-matching example on records:

(* assume that an animal is endangered if its population falls under 100 *)
type record = {name:string, range:string, population:int};

fun isEndangered(r: record): bool =
  case r of
    {name=_, range=_, population=p } => p < 100

The type declaration gives a name to a type; it creates an alias for it. You will probably feel that the case expression is not quite appropriate here, after all, all we need is to extract a value from a record - pattern matching is useful, but there are no real cases here! Indeed, we can rewrite the function above by using pattern matching in a val declaration. We have not covered this feature above, but you might find it useful in situations similar to the one at hand:

fun isEndangered(r: record): bool =
  let
    val {name=_, range=_, population=p } = r
  in
    p < 100
  end

More Pattern Matching. Recursion.

We provide below a number of functions that work on the intlist type we have defined above. Examine them carefully - note how similar some functions are, and how recursion is used to implement "loops:"

(* test to see if the list is empty *)

fun is_empty(xs:intlist):bool = 
  case xs of
    Empty => true
  | LIST(_,_) => false

(* Return the number of elements in the list *)
fun length(xs:intlist):int = 
  case xs of
    Empty => 0
  | LIST(i:int,rest:intlist) => 1 + length(rest)

(* Notice that the case expressions for lists all have the same
 * form -- a case for the empty list (Empty) and a case for a LIST.
 * Also notice that for most functions, the LIST case involves a
 * recursive function call. *)
(* Return the sum of the elements in the list *)
fun sum(xs:intlist):int = 
  case xs of
    Empty => 0
  | LIST(i:int,rest:intlist) => i + sum(rest)

(* Create a string representation of a list *)
fun toString(xs: intlist):string = 
  case xs of
    Empty => ""
  | LIST(i:int, Empty) => Int.toString(i)
  | LIST(i:int, LIST(j:int, rest:intlist)) => 
        Int.toString(i) ^ "," ^ toString(LIST(j,rest))
    
(* Return the first element (if any) of the list *)
fun head(is: intlist):int = 
  case is of
    Empty => raise Fail("empty list!")
  | LIST(i,tl) => i

(* Return the rest of the list after the first element *)
fun tail(is: intlist):intlist = 
  case is of
    Empty => raise Fail("empty list!")
  | LIST(i,tl) => tl

(* Return the last element of the list (if any) *)
fun last(is: intlist):int = 
  case is of
    Empty => raise Fail("empty list!")
  | LIST(i,Empty) => i
  | LIST(i,tl) => last(tl)
(* Return the ith element of the list *)
fun ith(is: intlist, i:int):int = 
  case (i,is) of
    (_,Empty) => raise Fail("empty list!")
  | (1,LIST(i,tl)) => i
  | (n,LIST(i,tl)) =>
	  if (n <= 0) then raise Fail("bad index")
	  else ith(tl, i - 1)

(* Append two lists:  append([1,2,3],[4,5,6]) = [1,2,3,4,5,6] *)
fun append(list1:intlist, list2:intlist):intlist = 
  case list1 of
    Empty => list2
  | LIST(i,tl) => LIST(i,append(tl,list2))
(* Reverse a list:  reverse([1,2,3]) = [3,2,1].
 * Notice that we compute this by reversing the tail of the
 * list first (e.g., compute reverse([2,3]) = [3,2]) and then
 * append the singleton list [1] to the end to yield [3,2,1]. *)

fun reverse(list:intlist):intlist = 
  case list of
    Empty => Empty
  | LIST(hd,tl) => append(reverse(tl), LIST(hd,Empty)) 

fun inc(x:int):int = x + 1;
fun square(x:int):int = x * x;

(* given [i1,i2,...,in] return [i1+1,i2+1,...,in+n] *)
fun addone_to_all(list:intlist):intlist = 
  case list of
    Empty => Empty
  | LIST(hd,tl) => LIST(inc(hd), addone_to_all(tl))

(* given [i1,i2,...,in] return [i1*i1,i2*i2,...,in*in] *)

fun square_all(list:intlist):intlist = 
  case list of
    Empty => Empty
  | LIST(hd,tl) => LIST(square(hd), square_all(tl))

(* given a function f and [i1,...,in], return [f(i1),...,f(in)].
 * Notice how we factored out the common parts of addone_to_all
 * and square_all. *)
fun do_function_to_all(f:int->int, list:intlist):intlist = 
  case list of
    Empty => Empty
  | LIST(hd,tl) => LIST(f(hd), do_function_to_all(f,tl))

(* now we can define addone_to_all in terms of do_function_to_all *)
fun addone_to_all(list:intlist):intlist = do_function_to_all(inc, list);

(* same with square_all *)
fun square_all(list:intlist):intlist = do_function_to_all(square, list);

(* given [i1,i2,...,in] return i1+i2+...+in (also defined above) *)

fun sum(list:intlist):int = 
  case list of
    Empty => 0
  | LIST(hd,tl) => hd + sum(tl)

(* given [i1,i2,...,in] return i1*i2*...*in *)
fun product(list:intlist):int = 
  case list of
    Empty => 1
  | LIST(hd,tl) => hd * product(tl)

(* given f, b, and [i1,i2,...,in], return f(i1,f(i2,...,f(in,b))).
 * Again, we factored out the common parts of sum and product. *)
fun collapse(f:(int * int) -> int, b:int, list:intlist):int = 
  case list of
    Empty => b
  | LIST(hd,tl) => f(hd,collapse(f,b,tl))
(* Now we can define sum and product in terms of collapse *)
fun sum(list:intlist):int = 
  let
    fun add(i1:int,i2:int):int = i1 + i2
  in 
    collapse(add,0,list)
  end

fun product(list:intlist):int = 
  let
    fun mul(i1:int,i2:int):int = i1 * i2
  in
    collapse(mul,1,list)
  end

(* Here, we use an anonymous function instead of declaring add and mul.
 * After all, what's the point of giving those functions names if all
 * we're going to do is pass them to collapse? *)
fun sum(list:intlist):int = collapse((fn (i1:int,i2:int) => i1+i2),0,list);

fun product(list:intlist):int = collapse((fn (i1:int,i2:int) => i1*i2),1,list);

(* And here, we just pass the operators directly... *)
fun sum(list:intlist):int = collapse(op +, 0, list);

fun product(list:intlist):int = collapse(op *, 1, list);

Pattern Matching in (Regular) SML Lists

(* Return the first element of the list, if any *)
fun hd(l: int list): int = 
    case l of
	[] => raise Fail("empty list")
      | i::_ => i


(* Return the rest of the list after the first element *)
fun tl(l: int list): int list = 
    case l of 
	[] => raise Fail("empty list")
      | _::tl => tl


(* Append l1 to l0 *)
fun append(l0: int list, l1: int list) : int list = 
    case l0 of
	[] => l1
      | i0::rest => i0 :: append(rest,l1)

(* Return the reversal of the list *)

fun rev(l: int list): int list = 
    case l of 
	[] => []
      | hd::rest => append(rev(rest),[hd]);

(* Add one to each element in the list *)
fun addOneToEach(l: int list): int list =
    case l of

	[] => []
      | hd::rest => (hd+1) :: addOneToEach(rest)

(* Combine all elements in list using f *)
fun collapse(f:int*int->int, b:int, l: int list): int =
    case l of

	[] => b
      | hd::rest => f(hd,collapse(f,b,rest))

(* Apply the function to the corresponding elements from each list,
and return the resulting list *)
fun apply_pairwise(f:int*int->int, list0: int list, list1: int list): int list =
    case (list0, list1) of 
	([], []) => []
      | (h0::rest0, h1::rest1) => f(h0,h1) :: apply_pairwise(f,rest0, rest1)
      | (_,_) => raise Fail("Lists do not have equal length")