We graded homework 1 - grades are posted on CMS. TAs will talk about the most typical errors in section.
You have seen many examples of simple pattern matching in the last set of lecture notes. We mentioned there that patterns can be nested (contained in) other patterns, giving raise to deep pattern matching. We will see an example of this when we discuss binary trees.
There are limits to what patterns can do, however. In particular, we can not use identifiers in a pattern to enforce equality constraints. Consider a list of integers, and a function that must decide whether the first two elements of the list, if they exist, are equal. One 'natural' implementation of this function is the following:
fun first2Equal(l: int list): bool = case l of x::x::_ => true | _ => false
SML does not allow for such constraints in patterns; it complains of "duplicate variables in pattern(s)." Why do you think this limitation exists?
Of course, one way to solve the problem above is to use a pattern with two identifiers, and then compare their associated values:
fun first2Equal(l: int list): bool = case l of x::y::_ => x = y | _ => false
Another limitation is that one can not use patterns that contain real constants. For example:
fun isZero(r: real): bool = case r of 0.0 => true | _ => false
To understand why this happens, remember that equality is not defined on real numbers in SML. Hence, SML is unable to test whether a real constant given in a pattern is equal to another real value.
While working on homework 1, you have probably realized that SML's type checking features help you to discover and fix programming errors. Helping achieve program correctness is one of the major reasons types (in fact: type systems) have been introduced in programming languages. So are there disadvantages to strict type checking? Well, yes.
Consider, for example, the intlist datatype introduced last time:
datatype intlist = Empty | LIST of int * intlist
This declaration was meant to emulate SML's int list type. But what about lists of reals? Well, to implement lists of reals we would have to take the datatype declaration and the code that implement intlist and replace int with real everywhere (we would also, probably, rename the type to reallist). Except for the type of its elements an intlist is very much like a reallist. Given what we know about SML's type system we must define a separate list type for both ints and reals. This feels (and is!) wasteful.
In fact, any list looks, from the perspective of the functionality we have implemented, as an intlist. In effect, it appears that there exists a higher-level concept of genericlist that is independent of its underlying element type. We can think of genericlist as a type, but this would not be quite right. Genericlist is not a type in the usual sense, it is in fact a function which given an underlying element type produces a corresponding list type. If SML did not force us to plug in a specific type into the definition of a list, we could write a very general datatype declaration, similar to the one below:
datatype genericlist = Empty | LIST of <unspecified underlying element type> * genericlist
Actually, SML allows for just such definitions using so-called type variables, which are just placeholders for unspecified types. Type variables are written as 'a, 'b, 'c, ... and pronounced alpha, beta, gamma, etc.
Types whose definition depends on other - unspecified - types, are called parameterized types. Our 'universal' custom list type can be written as follows:
datatype 'a genericlist = Empty | LIST of 'a * 'a genericlist
Except for a slightly more complicated notation, our list type is now as general as that of the predefined lists in SML. The type corresponding to our 'a genericlist is SML's 'a list.
Note that the name of the type is 'a genericlist and not genericlist only. This is to emphasize the type parameter that contributes to the definition of the genericlist. We can now implement functions that act on our parameterized list type. Here are two functions, one that computes the length of a list, and one that implements the head operator:
fun length(l: 'a genericlist): int = case l of Empty => 0 | LIST(_, tail) => 1 + (length tail)
fun head(l: 'a genericlist): 'a = case l of Empty => raise Fail "illegal operation" | LIST(h, _) => h
Functions - like those above - that work on more than one type are called polymorphic. Polymorphism is a desirable property because it saves work by allowing the specification of algorithms simultaneously for many types.
Note that it is possible for a polymorphic function to return a well-specified, non-parameterized type (see length), as well as parameterized types (see head).
Polymorphism in SML is particularly powerful and simple to implement. You should be careful not to confuse our notion of polymorphism with the notion of polymorphism used in other languages.
In Java, for example, one can define a hierarchy of related classes, with the entire hierarchy sharing a number of virtual methods. In the root class of this hierarchy, one can define methods that are written in terms of these virtual functions (i.e. call them). Such methods are generic within the class hierarchy because all their class-specific actions are encapsulated inside virtual methods (and are thus hidden at the root level). When we call the generic method using an object reference, the generic algorithm is executed, but the existence of virtual functions assures the necessary customization needed by each derived class..
Binary trees (as well as their specializations and generalization) have many interesting properties to which we will return shortly. For now, we will only look at binary trees as examples of parameterized data types.
A binary tree is either empty, or it consists of a collection of nodes. All non-empty trees have a single distinguished node called root. Each node in a binary tree can have zero, one or two children. No two nodes share any of their children. If node a is a child of node b, then node b is the parent of node a. Given nodes a0, a1, a2, ..., ak (k>=1) in a tree, such that ai is a child of node ai+1, for all i=0, ..., k-1, we say that node ak is an ancestor of node a0. In tree, all nodes that are not root are descendants of the root.
Binary trees can represent information in their very structure. In most cases, however, we will consider that each node carries some additional information.
Here is an example of a binary tree with integers in their nodes:
2 / \ 1 4 / \ 3 5
Because of the way in which represent trees we often find it convenient to distinguish between the children of node by referring to the left child and the right child of the respective node. A node can have a left child even if it does not have a right child (and viceversa).
We can use parameterized types to create a general binary tree datatype:
datatype 'a tree = Empty | Node of 'a tree * 'a * 'a * 'a tree
The example above can be represented by an int tree as follows:
Node(Node(Empty, 1, Empty), 2, Node(Node(Empty, 3, Empty), 4, Node(Empty, 5, Empty)))
Many algorithms on trees rely on detecting the certain node configurations, and often these configurations involve nodes, their children, and sometimes their children's children. Detecting such configurations is easily achieved using deep patterns (patterns in patterns). Let us write a pattern that identifies nodes that have a left child, but no right child:
case n:int tree of (* yes, you can add type specifications to expressions in 'case' *) ... | Node(Node(l1, v1, r1), v0, Empty) => (* we now have access to the values *) ...
Generic datatypes and polymorphic functions allow us to define powerful functions.
Let us consider a generic SML list 'a list and let us assume that we need to apply a function to each of its elements, i.e. we need to map the list elements from the domain of the function onto its range. The results of the function applications must be collected into another list, so that we preserve order (i.e. the nth list element of the result must correspond to the function applied to the nth element of the input list).
fun map(f: 'a -> 'b, l: 'a list): 'b list = case l of [] => [] | h::t => (f h)::(map (f, t))
Here is how we square each element of a list:
- map(fn x => x * x, [1, 2, 3, 4]); val it = [1,4,9,16] : int list
Note that in our definition of map we chose the most general specification for function f - it takes an input of type 'a and produces an output of type 'b. There is no connection between these two types, the can be the same (as in the example above), or different, as below:
- map(fn x => size x, ["short", "longer", "longest"]); val it = [5,6,7] : int list
If many cases generality has little, or no cost, and it can offer many advantages. Use it!
SML has a predefined implementation of map. There is a subtle difference however, and you might first notice this first if you compare the types of the two versions:
(type of predefined map): ('a -> 'b) -> 'a list -> 'b list (type of our version of map): ('a -> 'b) * 'a list -> 'b list
The types are similar, but not identical. It turns out that SML's map is curried. You will talk about currying in section tomorrow.
Function map generates a list of values, and each of these values is calculated in isolation (i.e. the function does not, and can not, consider the value of the kth list element when computing the value of the nth list element when computing its value). In certain applications, however, the result depends on all the list elements. In such cases we use foldl and foldr. You will continue to discuss these in section.