Tuples. Records. Lists. Recursive Datatypes. Pattern Matching.

You should be able to log in to CMS now. If not, email the TAs, they will add you.

Watch the web page and the newsgroup for announcements - while we will usually post announcements in both places; there will be exceptions.

As previously announced, the third section will convene on MW from 4:40 to 5:30 pm in 202 Thurston.

We had a software demo/install help session yesterday - we'll have one more late tonight.

This will be whirlwind through some of the most important features of SML. You have seen some of these in section yesterday.

Tuples group together a fixed number of values. If a tuple contains **n**
elements, then we call it an **n-tuple**. SML recognizes **0-tuple**s, but
it does not have 1-tuples. A tuple's elements do not have to be of the same
type; they are separated by commas, and surrounded by parantheses:
**()**, **(3, "aha!")**, **(one, (two, three))**. If the
elements of a tuple have, successively, types **t _{1}**,

You can select the **n ^{th}** data element from a tuple using the

Records are similar to tuples, in that they group together various data
elements. A record has **fields**, and these are named. While
tuples are an ordered collection of data values, a tuple is an unordered
collection of labeled data values.

Here is one example:

{commonName="wolf", latinName="Canis Lupus L.", range="Yellowstone", population=174}

The type of a record with labels **l _{1}**,

The **list** datatype encapsulates the concept of a sequence of 0 or more
data values of the same type. Lists are a simple example of **recursive
datatype**. Indeed, a list can best be defined in terms of itself. To make
things concrete, think of a list of integers. A list of integers could be (a) an
empty list, or (b) a non-empty list consisting of at least one element. In the
latter case, the list consists of a first element (the **head** of the list),
that is prepended to (a shorter, possibly empty) integer list. So a list is ...
a list. The definition seems circular, and it some sense it is. We are saved by
the fact that for any given finite list the process of describing a list in
terms of a list having one fewer elements is finite.

Here are a few examples of lists and list operations:

-valempty = [];valempty = [] : 'alist-hdempty; uncaught exception Empty raised at: boot/list.sml:36.38-36.43 -tlempty; uncaught exception Empty raised at: boot/list.sml:37.38-37.43 - rev empty;valit = [] : ?.X1list-lengthempty;valit = 0:int; - null [];valit = true :bool-vall = [1, 2, 3, 4];vall = [1,2,3,4] :intlist-hdl;valit = 1 : int -tll;valit = [2,3,4] :intlist-revl;valit = [4,3,2,1] :intlist-lengthl;valit = 4 :int-nulll;valit = false :bool

In the output above we have deleted some warnings. Note the following:

- The empty list
**[]**by itself does not have a well-defined type (at least, not in terms of types we have introduced until now). SML needs more context to determine the type of**[]**unambiguously. - We can not attempt to extract the head (i.e. the first element, see
**hd**) of an empty list; same for the tail (i.e. the list that results after the first element has been removed, see**tl**).

There are more list operations defined in SML; study them, the more you know, the more powerful programs you will be able to write.

Lists are not the only recursive data types, indeed, recursive datatypes are
pervasive. We can declare such data types ourselves with the help of the **datatype**
declaration (brackets indicate optional components):

datatypeY=X_{1}

[`of`

t_{1}]| ...X_{n }[`of`

t_{n}]

Here *X*_{i}` `

**datatype constructors**.

Not all types declared in a **datatype** declaration need to be recursive:

**datatype**

colors = WHITE | RED | GREEN | BLUE | BLACK

**datatype**

merged = INT of int | REAL of real

The second declaration creates a type that is, in effect, the union of
the **int** and **real** types. Examples of values of this type are **INT(3)**
and **REAL(7.14)**.

Let us now define the **int list** type. We will later learn how to define
a list type of the same generality as SML's list; for now, we must fix the list
element's type.

datatypeintlist = Empty | LISTofint * intlist

List **[1, 2, 3, 4]** can now be represented as **LIST[1,
LIST[2, LIST[3, LIST[4, Empty]]]]**.

We will see many other recursive datatype declarations in the course.

We know how to put together tuples, records, lists, and custom non-recursive and
recursive datatypes. We even know how to access the elements of a tuple, record,
or a list. But how do we access the components of a **LIST**? Pattern
matching offers a powerful solution to this problem. As a bonus, pattern
matching will also simplify our access to tuple, record, and list elements.

In most cases, we will use pattern matching in combination with **case**
expressions. The **BNF** definition of **case** expressions is

case e`of`

p_{1}`=>`

e_{1 }| ... | p_{n}`=>`

e_{n},

where **e** and **e _{i}** are expressions, and

p ::= _ | c | x |`(`

p_{1}`,`

...`,`

p_{n}`)`

|`{`

x_{1}`=`

p_{1}`,`

...`,`

x_{n}`=`

p_{n}`}`

| [] | p_{1}::p_{2}| X | X`(`

p`)`

,

where **p** and **p _{i}** are patterns,

The pattern denoted by **_** (the underscore character) denotes an indifferent
value (a value ignored when checking for a match). The pattern containing the **::**
operator is a list pattern, in this case ** p _{1} ** is the pattern for the
head of the list, and

An important feature of SML patterns is that they can be nested: a pattern can be present inside another pattern.

Patterns are used as template against which **values** are matched. Values
are defined as

v ::= c |`(`

v_{1}`,`

...`,`

v_{n}`)`

|`{`

x_{1}`=`

v_{1}`,`

...`,`

x_{n}`=`

v_{n}`}`

| [v_{1}`,`

...`,`

v_{n}] | X | X`(`

v`)`

,

`where `

**v** and **v _{i}** are values,

Here are the rules of pattern matching; they must be applied recursively:

- The
**_**pattern matches everything; - The
**c**pattern matches only itself; - Pattern
**x**matches every value, with the additional result that identifier**x**is bound to the value it matched; - Pattern
`(`

p_{1}`,`

...`,`

p_{n}**)**matches value**(****v**_{1}`,`

...`,`

v_{n}**)**iff all patterns**p**match their corresponding values_{i}**v**;_{i} `Pattern`

**{****x**_{1}`=`

p_{1}`,`

...`,`

x_{n}`=`

p_{n}`}`

`matches value`

**{****x**_{1}`=`

v_{1}`,`

...`,`

x_{n}`=`

v_{n}`}`

`iff patterns`

;**p**match their corresponding values_{i}**v**_{i}- Pattern
**[]**matches**[]**; - Pattern
**p**matches list_{1}::p_{2}**l**iff pattern**p**matches_{1}**hd l**and pattern**p**matches_{2}**tl l**. - Pattern
**X**matches**X**; - Pattern
**X(p)**will match only value**X(v),**assuming that pattern**p**matches value**v**;

Patterns are tested in the order in which they are written; the result of the
**case** statement is the value of the expression corresponding to the first
pattern that matches.

When performing pattern matching in a **case** expression, SML has the
ability to check whether the cases that have been listed are **exhaustive** or
not. Many subtle errors are avoided because the programmer is made aware of a
non-exhaustive match.

To clarify the ideas above, let us examine a few examples:

(* return true iff the length of the arguments list is at least two *) fun atLeastTwo(l: int list): bool = case l of [] => false | _::[] => false | _ => true

Note that we don't really care for the values here, all we are interested in
is the structure of the argument. Note that pattern **_::[]** could have also
been written as **[_]**.

(* is at least one of the tuple's components equal to 0? *) fun atLeastOneZero(tuple: int * int): bool = case tuple of (0, _) => true | (_, 0) => true | _ => false

In this example we use the fact that patterns are tested in the order in
which they are written: if the first component of the argument tuple is 0, then
the first pattern will match, if the second component is 0, then the second
pattern will match. If none of the first two patterns match, then the tuple
contains no zeros at all, we can return **false** without examining the
argument.

Here is a simple pattern-matching example on records:

(* assume that an animal is endangered if its population falls under 100 *) type record = {name:string, range:string, population:int}; fun isEndangered(r: record): bool = case r of {name=_, range=_, population=p } => p < 100

The **type** declaration gives a name to a type; it creates an alias for
it. You will probably feel that the **case** expression is not quite
appropriate here, after all, all we need is to extract a value from a record -
pattern matching is useful, but there are no real cases here! Indeed, we can
rewrite the function above by using pattern matching in a **val**
declaration. We have not covered this feature above, but you might find it
useful in situations similar to the one at hand:

fun isEndangered(r: record): bool = let val {name=_, range=_, population=p } = r in p < 100 end

We provide below a number of functions that work on the **intlist** type we
have defined above. Examine them carefully - note how similar some functions
are, and how recursion is used to implement "loops:"

(* test to see if the list is empty *) fun is_empty(xs:intlist):bool = case xs of Empty => true | LIST(_,_) => false (* Return the number of elements in the list *) fun length(xs:intlist):int = case xs of Empty => 0 | LIST(i:int,rest:intlist) => 1 + length(rest) (* Notice that the case expressions for lists all have the same * form -- a case for the empty list (Empty) and a case for a LIST. * Also notice that for most functions, the LIST case involves a * recursive function call. *) (* Return the sum of the elements in the list *) fun sum(xs:intlist):int = case xs of Empty => 0 | LIST(i:int,rest:intlist) => i + sum(rest) (* Create a string representation of a list *) fun toString(xs: intlist):string = case xs of Empty => "" | LIST(i:int, Empty) => Int.toString(i) | LIST(i:int, LIST(j:int, rest:intlist)) => Int.toString(i) ^ "," ^ toString(LIST(j,rest)) (* Return the first element (if any) of the list *) fun head(is: intlist):int = case is of Empty => raise Fail("empty list!") | LIST(i,tl) => i (* Return the rest of the list after the first element *) fun tail(is: intlist):intlist = case is of Empty => raise Fail("empty list!") | LIST(i,tl) => tl (* Return the last element of the list (if any) *) fun last(is: intlist):int = case is of Empty => raise Fail("empty list!") | LIST(i,Empty) => i | LIST(i,tl) => last(tl)

(* Return the ith element of the list *) fun ith(is: intlist, i:int):int = case (i,is) of (_,Empty) => raise Fail("empty list!") | (1,LIST(i,tl)) => i | (n,LIST(i,tl)) => if (n <= 0) then raise Fail("bad index") else ith(tl, i - 1) (* Append two lists: append([1,2,3],[4,5,6]) = [1,2,3,4,5,6] *) fun append(list1:intlist, list2:intlist):intlist = case list1 of Empty => list2 | LIST(i,tl) => LIST(i,append(tl,list2))

(* Reverse a list: reverse([1,2,3]) = [3,2,1]. * Notice that we compute this by reversing the tail of the * list first (e.g., compute reverse([2,3]) = [3,2]) and then * append the singleton list [1] to the end to yield [3,2,1]. *) fun reverse(list:intlist):intlist = case list of Empty => Empty | LIST(hd,tl) => append(reverse(tl), LIST(hd,Empty)) fun inc(x:int):int = x + 1;

fun square(x:int):int = x * x; (* given [i1,i2,...,in] return [i1+1,i2+1,...,in+n] *) fun addone_to_all(list:intlist):intlist = case list of Empty => Empty | LIST(hd,tl) => LIST(inc(hd), addone_to_all(tl)) (* given [i1,i2,...,in] return [i1*i1,i2*i2,...,in*in] *) fun square_all(list:intlist):intlist = case list of Empty => Empty | LIST(hd,tl) => LIST(square(hd), square_all(tl)) (* given a function f and [i1,...,in], return [f(i1),...,f(in)]. * Notice how we factored out the common parts of addone_to_all * and square_all. *) fun do_function_to_all(f:int->int, list:intlist):intlist = case list of Empty => Empty | LIST(hd,tl) => LIST(f(hd), do_function_to_all(f,tl)) (* now we can define addone_to_all in terms of do_function_to_all *) fun addone_to_all(list:intlist):intlist = do_function_to_all(inc, list); (* same with square_all *) fun square_all(list:intlist):intlist = do_function_to_all(square, list); (* given [i1,i2,...,in] return i1+i2+...+in (also defined above) *) fun sum(list:intlist):int = case list of Empty => 0 | LIST(hd,tl) => hd + sum(tl) (* given [i1,i2,...,in] return i1*i2*...*in *) fun product(list:intlist):int = case list of Empty => 1 | LIST(hd,tl) => hd * product(tl) (* given f, b, and [i1,i2,...,in], return f(i1,f(i2,...,f(in,b))). * Again, we factored out the common parts of sum and product. *) fun collapse(f:(int * int) -> int, b:int, list:intlist):int = case list of Empty => b | LIST(hd,tl) => f(hd,collapse(f,b,tl))

(* Now we can define sum and product in terms of collapse *) fun sum(list:intlist):int = let fun add(i1:int,i2:int):int = i1 + i2 in collapse(add,0,list) end fun product(list:intlist):int = let fun mul(i1:int,i2:int):int = i1 * i2 in collapse(mul,1,list) end (* Here, we use an anonymous function instead of declaring add and mul. * After all, what's the point of giving those functions names if all * we're going to do is pass them to collapse? *) fun sum(list:intlist):int = collapse((fn (i1:int,i2:int) => i1+i2),0,list); fun product(list:intlist):int = collapse((fn (i1:int,i2:int) => i1*i2),1,list); (* And here, we just pass the operators directly... *) fun sum(list:intlist):int = collapse(op +, 0, list); fun product(list:intlist):int = collapse(op *, 1, list);

(* Return the first element of the list, if any *) fun hd(l: int list): int = case l of [] => raise Fail("empty list") | i::_ => i (* Return the rest of the list after the first element *) fun tl(l: int list): int list = case l of [] => raise Fail("empty list") | _::tl => tl (* Append l1 to l0 *) fun append(l0: int list, l1: int list) : int list = case l0 of [] => l1 | i0::rest => i0 :: append(rest,l1) (* Return the reversal of the list *) fun rev(l: int list): int list = case l of [] => [] | hd::rest => append(rev(rest),[hd]); (* Add one to each element in the list *) fun addOneToEach(l: int list): int list = case l of [] => [] | hd::rest => (hd+1) :: addOneToEach(rest) (* Combine all elements in list using f *) fun collapse(f:int*int->int, b:int, l: int list): int = case l of [] => b | hd::rest => f(hd,collapse(f,b,rest)) (* Apply the function to the corresponding elements from each list, and return the resulting list *) fun apply_pairwise(f:int*int->int, list0: int list, list1: int list): int list = case (list0, list1) of ([], []) => [] | (h0::rest0, h1::rest1) => f(h0,h1) :: apply_pairwise(f,rest0, rest1) | (_,_) => raise Fail("Lists do not have equal length")