Recitation 2: Tuples, records and datatypes

Tuples

Every function in SML takes exactly one value and returns exactly one result. For instance, our square_root function takes one real value and returns one real value. The advantage of always taking one argument and returning one result is that the language is extremely uniform. Later, we'll see that this buys us a lot when it comes to composing functions.

But it looks like we can write functions that take more than one argument! For instance, we may write:

fun max(r1:real, r2:real):real =
  if r1 < r2 then r2 else r1

max(3.1415, 2.718)

and it appears as if max takes two arguments. In truth max takes one argument that is a 2-tuple (also known as an ordered pair.)

In general, an n-tuple is an ordered sequence of n values written in parenthesis and separated by commas as (exp,exp,...exp). For instance, (1, "hello", true) is a 3-tuple that contains the integer 1 as its first component, the string "hello" as its second component, and the boolean value true as its third component. As another example, () is the empty tuple. This is called "unit" in SML.

When you call a function in SML, if it takes more than one argument, then you have to pass it a tuple of the arguments. For instance, when we write:

max(3.1415, 2.718)

we're passing the 2-tuple (3.1415, 2.718) to the function max. We could just as well write:

val args = (3.1415, 2.178)

max args  (* evaluates to 3.1415 *)

The type of an n-tuple is written (type * type * ... * type). For instance, the type of args above is (real * real). Similarly, the 3-tuple (1, "hello", true) has type (int * string * bool). Notice that max has type (real * real) -> real indicating that it takes one argument (a 2-tuple of reals) and returns one result (a real).

You can extract the components of a tuple by using the form "#n exp" where n is a number between 1 and the size of the tuple. For instance, #2 (1, "hello", true) evaluates to "hello", whereas #1 (3.1415, 2.178) evaluates to 3.1415.

So, for instance, we can rewrite the max function as follows:

fun max(pair: real*real):real =
  if (#1 pair) < (#2 pair) then #2 pair
  else #1 pair

and this is completely equivalent to the first definition. This emphasizes that max really does take just one argument -- a pair of real numbers. But of course, it's a lot less readable than the first definition. We can get closer to the first definition by declaring local values r1 and r2 and bind them to the appropriate components of the pair:

fun  max(pair: real*real):real =
  let val r1 = #1 pair
      val r2 = #2 pair
  in
    if r1 < r1 then r2
    else r1
  end

This is a little better because we avoid re-computing the same expressions over and over again. However, it's still not as succinct as our first definition of max. This is because the first definition uses pattern matching to implicitly de-construct the 2-tuple and bind the components to variables r1 and r2. You can use pattern matching in a val declaration or in a function definition to deconstruct a tuple. A tuple pattern is always of the form (id:type, id:type,...,id:type). For instance, here is yet another version of max that uses a pattern in a val declaration to deconstruct the pair:

fun max(pair: real*real):real =
  let val (r1:real, r2:real) = pair 
  in
    if r1 < r1 then r2 else r1
  end

In the example above, the val declaration matches the pair against the tuple-pattern (r1:real, r2:real). This binds r1 to the first component of the pair (#1 pair) and r2 to the second component (#2 pair). A similar thing happens when you write a function using a tuple-pattern as in the original definition of max:

fun max(r1:real, r2:real):real =
  if r1 < r2 then r2 else r1;

Here, when we call max with the pair (3.1415, 2.718), the tuple is matched against the pattern (r1:real, r2:real) and r1 is bound to the 3.1415 and r2 to 2.718. As we'll see later on, SML uses pattern matching in a number of places to simplify expressions.

In summary:

every function in SML takes 1 argument and returns 1 result.
(exp, exp, ... , exp) creates an n-tuple.
tuple types look like (type * type * ... * type)
#n exp extracts the nth component of a tuple.
val (id:type,id:type,...,id:type) = exp matches the tuple expression exp against the tuple-pattern (id:type,id:type,...,id:type) and binds the identifiers in the pattern to the appropriate components of the tuple.
fun id(id:type,id:type,...,id:type):type = exp is a function declaration that takes an n-tuple as an argument and matches the tuple against the tuple-pattern (id:type,id:type,...,id:type).

Records

Records are similar to tuples in that they carry an unordered collection of labelled data values. In general, record expressions are of the form {id = exp, id = exp, ..., id = exp} where the ids are called labels. For example, the expression {first = "Greg", last = "Morrisett", age = 150, balance = 0.12} is a record with four fields named first, last, age, and balance. You can extract a field from a record by using #id exp where exp is the record and id is the field that you want to extract. For instance, applying #age to the record above yields 150, whereas applying #balance yields 0.12.

When creating a record, it does not matter in what order you give the fields. So the record {balance = 0.12, age = 150, first = "Greg", last = "Morrisett} is equivalent to the example above. Note that when you type in one of these records to the SML top-level, it sorts the fields into a canonical order:

- val jgm = {first = "Greg", last = "Morrisett",
             age = 150, balance = 0.12};
val jgm = {age=150, balance=0.12, first="Greg", last="Morrisett"}
        : {age:int, balance:real, first:string, last:string}

The type of a record is written as {id:type, id:type, ...,id:type} .

Just as you can use pattern-matching to extract the components of a tuple, you can use pattern matching to extract the fields of a record. For instance, you can write:

val {first:string, last:string, age:int, balance:real} = jgm

and SML responds with:

val age = 150 : int
val balance = 0.12 : real
val first = "Greg" : string
val last = "Morrisett" : string

thereby binding the identifiers a, b, f, and l to the respective components of the record. You can also write functions where the argument is a record using a record pattern. For example:

fun full_name{first:string, last:string, age:int, balance:real}:string =
   first ^ " " ^ last (* ^ is the string concatenation operator *)

Calling full_name and passing it the record jgm yields "Greg Morrisett" as an answer.

In summary:

record expressions are of the form {id = exp, id = exp, ..., id = exp}.
record types are of the form {id:type, id:type, ..., id:type}.
you can extract a field from a record by writing #id exp.
you can pattern match records using a pattern of the form {id:type,id:type,...,id:type}.

Simple Datatypes and Case Expressions

Datatypes are used for two basic purposes which we'll describe by example. The first example of a datatype declaration is as follows:

datatype mybool = Mytrue | Myfalse

This definition declares a new type (mybool) and two constructors (Mytrue and Myfalse) for creating values of type mybool. In otherwords, after entering this definition into SML, we can use Mytrue or Myfalse as values of type mybool and indeed, these are the only values of type mybool. So one purpose of datatypes is to introduce new types into the language and to introduce ways of creating values of this new type. In fact, the builtin bool type is simply defined as:

datatype bool = true | false

Notice that a datatype definition is a lot like a BNF grammar. For instance, we can think of bool as consisting of true or false. We'll use this built-in grammar fracility in SML to good effect when we start building implementations of languages.

Side note: the logical operators for conjunction and disjunction are as follows:

exp ::= ... | e1 andalso e2 | e1 orelse e2

Note that and is not for logical conjunction, although it is a keyword. These appear to be like binary operators; however, they are different from infix functions as all the other binary operators evaluate both expressions. These two logical constructs have a special capability called short-circuiting. If the result of the logical formula can be determined by evaluating the left-hand expression, the right-hand expression will remain unevaluated.

Another example of a datatype declaration is as follows:

datatype day = Sun | Mon | Tue | Wed | Thu | Fri | Sat

This declaration defines a new type (day) and 7 new constructors for that type (Sun-Sat). So, for example, we can write a function which maps a number to a day of the week:

fun int_to_day(i: int):day =
  if i mod 7 = 0 then Sun else
  if i mod 7 = 1 then Mon else
  if i mod 7 = 2 then Tue else
  if i mod 7 = 3 then Wed else
  if i mod 7 = 4 then Thu else
  if i mod 7 = 5 then Fri else Sat

This sequence of if expressions where we test the value i is rather tedious. A more concise way to write this is to use a case expression:

fun int_to_day(i: int):day =
  (case i mod 7 of
     0 => Sun
   | 1 => Mon
   | 2 => Tue
   | 3 => Wed
   | 4 => Thu
   | 5 => Fri
   | _ => Sat)

The case expression is similar to the switch statement in languages such as Java or C. In the example above, we perform a case on the value of (i mod 7) and match it against a set of number patterns (i.e., 0, 1, 2, etc.) The last pattern is a wildcard and matches any value. In Java, we would write the above as something like:

switch (i % 7) {
  case 0: return Sun;
  case 1: return Mon;
  case 2: return Tue;
  case 3: return Wed;
  case 4: return Thu;
  case 5: return Fri;
  default: return Sat;
}

So much for mapping integers to days. How about mapping days to integer?

fun day_to_int(d: day):int =
  (case d of
     Sun => 0
   | Mon => 1
   | Tue => 2
   | Wed => 3
   | Thu => 4
   | Fri => 5
   | Sat => 6)

With case expressions lying around, we technically don't need an if expression form. In particular, an expression of the form if exp1 then exp2 else exp3 is equivalent to:

case exp1 of
  true => exp2
| false => exp3

In fact it turns out that with the general form of datatypes and case expressions, we can encode a lot of things that appear to be built in to the language. This is a good thing because it simplifies the number of special forms that we have to reason about.

In summary:

datatype id = id1 | id2 | id3 | ... | idn declares a new type (id1) with n data constructors (id1 id2 id3 ... idn).
case exp of pat1 => exp1 | pat2 => exp2 | ... | patn => expn evaluates exp and then successively matches it against the patterns. That is, the first pattern (pat1) is tried first and if matching succeeds, then we evaluate the corresponding expression (exp1). If matching fails, then we proceed to the next pattern pat2 and so on.
So far, patterns can be made up of integers (e.g., 12, ~4), identifiers that are variables (e.g., x), tuple patterns, record patterns, or identifiers that are data constructors (e.g., Sun, Mon, true, etc.)
The if-expression is a syntactic sugar for a case-expression.

Pattern Matching on Records (using integers)

We can define integers in terms of the natural numbers by using a representation consisting of a sign and magnitude:

datatype sign = Pos | Neg
type integer = { sign : sign, mag : nat }

The type keyword simply defines a name for a type. Here we've defined integer to refer to a record type with two fields: sign and mag. Remember that records are unordered, so there is no concept of a "first" field.

Note that a type declaration is different from a datatype declaration; it creates a new way to name a type, whereas a datatype declaration creates a new type and also happens to give it a name, which is needed to support recursion. For example, we could write a declarationtype natural = nat. The type natural and nat would then be exactly the same type and usable interchangeably.

We can use the definition of integer to write some integers:

val zero = {sign=Pos, mag=Zero}
val zero' = {sign=Neg, mag=Zero}
val one = {sign=Pos, mag=Next(Zero)}
val neg_one = {sign=Neg, mag=Next(Zero)}

Now we can write a function to determine the successor of any integer:

fun inc(i:integer) : integer =
    case i of
      {sign = _, mag = Zero} => {sign = Pos, mag = Next(Zero)}
    | {sign = Pos, mag = n} => {sign = Pos, mag = Next(n)}
    | {sign = Neg, mag = Next(n)} => {sign = Neg, mag = n}

Here we're pattern-matching on a record type. Observe how this works.

The predecessor function is very similar, and it should be obvious that we could write functions to add, subtract, and multiply integers in this representation.