Every function in SML takes exactly one value and returns exactly one result.
For instance, our `square_root`

function takes one real value and returns
one real value. The advantage of always taking one argument and returning
one result is that the language is extremely uniform. Later, we'll
see that this buys us a lot when it comes to *composing* new functions out
of old ones.

But it looks like we can write functions that take more than one argument! For instance, we may write:

fun max(r1:real, r2:real):real = if r1 < r2 then r2 else r1max(3.1415, 2.718)

and it appears as if max takes two arguments. In truth max takes one
argument that is a *2-tuple* (also known as an ordered pair.)

In general, an *n-tuple* is an ordered sequence of *n *values
written in parenthesis and separated by commas as (*expr, expr, *..., *
expr*). For instance, `(42, "hello", true)`

is a 3-tuple that contains
the integer `42`

as its first component, the string `"hello"`

as its second component, and the boolean value `true`

as its third
component. As another example, `()`

is the empty tuple. This
is called "unit" in SML.

When you call a function in SML, if it takes more than one argument, then you have to pass it a tuple of the arguments. For instance, when we write:

max(3.1415, 2.718)

we're passing the 2-tuple `(3.1415, 2.718)`

to the function max.
We could just as well write:

`val args = (3.1415, 2.178);`

`max args (* evaluates to 3.1415 *)`

The type of an n-tuple is written ` (`

*t*_{1}`*`

...`*`

*t _{n
}*

`)`

. For instance, the type of args above is ` (real*real)`

.
This notation is
based on the Cartesian product in mathematics (i.e., the plane is R^2 = R * R). Similarly, the
3-tuple `(42, "hello", true)`

has type (int * string * bool).
Notice that `max`

has type ` (real*real)->real`

, indicating that it
takes one argument (a 2-tuple of reals) and returns one result (a real).

The above grammar for types is a simplification, in fact. A more complete grammar is as follows:

t::=`int`

|`real`

|`bool`

|`string`

|`char`

|t_{1}`*`

...`*`

t|_{n}t_{1}`->`

t_{2}|`(`

t`)`

So the types ` (real*real)->real`

and `real*real->real`

are exactly the same type.

You can extract the components of a tuple by using the form "`#`

*n* *e*"
where `n`

is a number between 1 and the size of the tuple. For
instance, `#2 (1, "hello", true)`

evaluates to `"hello"`

,
whereas `#1 (3.1415, 2.178)`

evaluates to `3.1415`

.

So, for instance, we can rewrite the max function as follows:

fun max(pair: real*real):real = if (#1 pair) < (#2 pair) then (#2 pair) else (#1 pair);

and this is completely equivalent to the first definition. This emphasizes that `max`

really does take just one argument -- a pair
of real numbers. But of course, it's a lot less readable than the first
definition. We can get closer to the first definition by declaring local
values r1 and r2 and bind them to the appropriate components of the pair:

fun max(pair: real*real):real = let val r1 = #1 pair val r2 = #2 pair in if r1 < r2 then r2 else r1 end

This is a little better because we avoid re-computing the same expressions
over and over again. However, it's still not as succinct as our first
definition of max. This is because the first definition uses *pattern
matching* to implicitly de-construct the 2-tuple and bind the components to
variables `r1`

and `r2`

. You can use pattern matching in a `val`

declaration or in a function definition to deconstruct a tuple. A tuple
pattern is always of the form `(`

*x*_{1}:*t*_{1}, *x*_{2}:*t*_{2},..., *x _{n}*:

`)`

.
For instance, here is yet another version of max that uses a pattern in a `val`

declaration to deconstruct the pair:fun max(pair: real*real):real = let val (r1:real, r2:real) = pair in if r1 < r2 then r2 else r1 end

In the example above, the `val`

declaration
matches the pair against the tuple-pattern `(r1:real, r2:real)`

.
This binds `r1`

to the first component of the pair `(#1 pair)`

and
`r2`

to the second component `(#2 pair)`

. A similar thing
happens when you write a function using a tuple-pattern as in the original
definition of max:

fun max(r1:real, r2:real):real = if r1 < r2 then r2 else r1

Here, when we call max with the pair `(3.1415, 2.718)`

, the tuple is
matched against the pattern ` (r1:real, r2:real)`

and `r1`

is bound to the `3.1415`

and `r2`

to `2.718`

. As we'll see later on, SML uses pattern
matching in a number of places to simplify expressions.

Suppose we wanted to extract both the minimum and the maximum of two numbers in a single function. With tuples, this is easy: we just return a tuple containing both results. Using a let, we can extract both results into separate variables conveniently, too:

fun minmax(a: real, b: real): real*real = if a < b then (a, b) else (b, a)

val (mn: real, mx: real) = minmax(2.0,1.0)

This binds `mn`

to `1.0`

and `mx`

to `2.0`

. The
type of `minmax`

is `(real*real)->(real*real)`

, which we can
write without the parentheses because `*`

has a higher precedence than `->`

when writing type expressions.

In summary:

- every function in SML takes 1 argument and returns 1 result.
`(`

*e*_{1}`,`

...`,`

*e*_{n}`)`

creates an*n*-tuple.- tuple types look like
*t*_{1}`*`

...`*`

*t*_{n} `#`

*n**e*extracts the*n*th component of a tuple*e*.-
`val`

`(`

*x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}`)`

*=**e**e*against the tuple-pattern`(`

*x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}`)`

and binds the identifiers in the pattern to the appropriate components of the tuple. -
`fun`

*y*`(`

*x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}`)`

*=**e*is a function declaration that takes an*n*-tuple as an argument and matches the tuple against the tuple-pattern`(`

*x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}`)`

.

Records are similar to tuples in that they are data structures for holding
multiple values. However, they are different from tuples in that they
carry an *unordered* collection of *labeled* values. In
general, record expressions are of the form **{***x*_{1}=*e*_{1}`,`

...`,`

*x _{n}*

`=`

**}**

where the identifiers x are labels.
For example,
the expression {first = "John", last = "Doe", age = 150, balance = 0.12}

is a record with four fields named first, last, age, and balance. You
can extract a field from a record by using `#`

*id expr *where *exp *is
the record and *id* is the field that you want to extract. For
instance, applying `#age`

to the record above yields 150, whereas applying
`#balance`

yields 0.12.

When creating a record, it does not matter in what order you give the fields.
So** **the record

{balance = 0.12, age = 150, first = "John", last = "Doe"}

is equivalent to the example above. Note that when you type in one of these records to the SML top-level, it sorts the fields into a canonical order:

- val pers ={first = "John", last = "Doe", age = 150, balance = 0.12};val pers ={age=150,balance=0.12,first="John",last="Doe"} : {age:int,balance:real,first:string,last:string}

The type of a record is written as **{***x*_{1}:*t*_{1}, *x*_{2}:*t*_{2},..., *x _{n}*:

**}**

Just as you can use pattern-matching to extract the components of a tuple, you can use pattern matching to extract the fields of a record. For instance, you can write:

val{first:string,last:string,age:int,balance:real} =jgm

and SML responds with:

val age = 150 : int val balance = 0.12 : real val first = "Greg" : string val last = "Morrisett" : string

thereby binding the identifiers `age`

, `balance`

, `first`

,
and `last`

to the respective components of the record. You can also
write functions where the argument is a record using a record pattern. For
example:

fun full_name{first:string,last:string,age:int,balance:real}:string=first ^ " " ^ last (* ^ is the string concatenation operator *)

Calling `full_name`

and passing it the record `jgm`

yields ```
"John Doe"
```

as an answer.

It turns out that we can think of tuples as short-hand for records. In
particular, the tuple expression `(3.14, "Greg", true)`

is like
the record expression

.
So in some sense, tuples are just syntactic sugar for records.**{**1=3.14, 2="Greg", 3=true**}**

In summary:

- record expressions are of the form
`{`

*x*_{1}`=`

*e*_{1}`,`

*x*_{2}`=`

*e*_{2}`,`

...`,`

*x*_{n}`=`

*e*_{n}.`}`

- record types are of the form
**{***x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}

.**}** - you can extract a field from a record by writing
`#`

*x**e**x*is the name of the field. - you can pattern match records using a pattern of the form
**{***x*_{1}:*t*_{1},*x*_{2}:*t*_{2},...,*x*:_{n}*t*_{n}

.**}**

We'll cover more kinds of types (datatypes) and more pattern matching constructs next week.

Datatypes are used for two basic purposes which we'll describe by example. The first example of a datatype declaration is as follows:

```
````datatype mybool = Mytrue | Myfalse`

This definition declares a new type (mybool) and two constructors (Mytrue and
Myfalse) for creating values of type mybool. In otherwords, after entering
this definition into SML, we can use Mytrue or Myfalse as values of type mybool
and indeed, these are the *only* values of type mybool. So one
purpose of datatypes is to introduce new types into the language and to
introduce ways of creating values of this new type. In fact, the builtin **bool**
type is simply defined as:

```
````datatype bool = true | false`

Notice that a datatype definition is a lot like a BNF grammar. For instance, we can think of bool as consisting of true or false. We'll use this built-in grammar fracility in SML to good effect when we start building implementations of languages.

exp::= ... | e1 andalso e2 | e1 orelse e2

Note that and is not for logical conjunction, although it is a keyword. These appear to be like binary operators; however, they are different from infix functions as all the other binary operators evaluate both expressions. These two logical constructs have a special capability called short-circuiting. If the result of the logical formula can be determined by evaluating the left-hand expression, the right-hand expression will remain unevaluated.

Another example of a datatype declaration is as follows:

```
````datatype day = Sun | Mon | Tue | Wed | Thu | Fri | Sat`

This declaration defines a new type (day) and 7 new constructors for that type (Sun-Sat). So, for example, we can write a function which maps a number to a day of the week:

fun int_to_day(i: int):day = if i mod 7 = 0 then Sun else if i mod 7 = 1 then Mon else if i mod 7 = 2 then Tue else if i mod 7 = 3 then Wed else if i mod 7 = 4 then Thu else if i mod 7 = 5 then Fri else Sat

This sequence of if expressions where we test the value i is rather tedious. A more concise way to write this is to use a case expression:

fun int_to_day(i: int):day = (case i mod 7 of 0 => Sun | 1 => Mon | 2 => Tue | 3 => Wed | 4 => Thu | 5 => Fri | _ => Sat)

The case expression is similar to the switch statement in languages such as Java or C. In the example above, we perform a case on the value of (i mod 7) and match it against a set of number patterns (i.e., 0, 1, 2, etc.) The last pattern is a wildcard and matches any value. In Java, we would write the above as something like:

switch (i % 7) { case 0: return Sun; case 1: return Mon; case 2: return Tue; case 3: return Wed; case 4: return Thu; case 5: return Fri; default: return Sat; }

So much for mapping integers to days. How about mapping days to integer?

fun day_to_int(d: day):int = (case d of Sun => 0 | Mon => 1 | Tue => 2 | Wed => 3 | Thu => 4 | Fri => 5 | Sat => 6)

With case expressions lying around, we
technically don't need an if expression form.
In particular, an expression of the form if *exp1*
then *exp2* else *exp3*
is equivalent to:

caseexp1of true =>exp2| false =>exp3

In fact it turns out that with the general form of datatypes and case expressions, we can encode a lot of things that appear to be built in to the language. This is a good thing because it simplifies the number of special forms that we have to reason about.

In summary:

- datatype
*id*=*id1 | id2 | id3 | ... | idn**id1*) with*n data*constructors (*id1 id2 id3 ... idn).* - case
*exp*of*pat1 => exp1 | pat2 => exp2 | ... | patn => expn*evaluates*exp*and then successively matches it against the patterns. That is, the first pattern (*pat1*) is tried first and if matching succeeds, then we evaluate the corresponding expression (*exp1*). If matching fails, then we proceed to the next pattern*pat2*and so on. - So far, patterns can be made up of integers (e.g., 12, ~4), identifiers that are variables (e.g., x), tuple patterns, record patterns, or identifiers that are data constructors (e.g., Sun, Mon, true, etc.)
- The if-expression is a syntactic sugar for a case-expression.

A record (or tuple) is logically like an "and". For instance,
a tuple of type (int,real,string) is an object that contains an int *and* a
real *and *a string. Datatypes, in the most general form, are
used for defining "or" types -- when something needs to be one type or
another. In particular, suppose we want to define a new type
"number" that includes both ints and reals. This can be
accomplished in SML by the following datatype definition:

datatype num = Int_num of int | Real_num of real

This declaration gives us a new type (num) and two constructors `Int_num`
and `Real_num`. The `Int_num` constructor takes an int as an
argument and returns a num, while the `Real_num` constructor takes a real
as an argument and returns a num. In this fashion, we can create a type
that is the (disjoint) union of two other types.

But how do we use a value of type num? We can't apply an operation such as + to it, because + is only defined for either ints or reals. In order to use a value of type num, we have to use pattern matching to deconstruct it. For example, the following function computes the maximum of two nums:

fun num_to_real(n:num):real = (case n of Int_num(i) => Real.fromInt(i) | Real_num(r) => r)fun max(n1:num, n2:num):num = let val r1:real = num_to_real(n1) val r2:real = num_to_real(n2) in if r1 >= r2 then Real_num(r1) else Real_num(r2) end

The strategy is simple: convert the numbers to reals and then compare
the two real numbers, returning the larger of the two. In order to make
the return value a num (as opposed to a real), we have to put the result in a `Real_num`
constructor. We could just have well written this as:

fun max2(n1:num, n2:num):num = let val r1:real = num_to_real(n1) val r2:real = num_to_real(n2) in Real_num(if r1 >= r2 then r1 else r2) end

Here, we've wrapped the whole if-expression with
the `Real_num` constructor. This is one advantage of treating if
as an *expression *as opposed to a statement.

Notice that in the function num_to_real, we use a case-expression
to determine whether the number *n* is an integer or real. The
pattern `Int_num(i)` matches *n* if and only if *n* was created
using the `Int_num` data constructor, whereas `Real_num(r)`
matches *n *if and only if it was created with the `Real_num` data
constructor. Also, notice that in the` Int_num(i)` case, we have
bound the underlying integer carried by the data constructor to the variable `i`
and that this is used in the expression `Real.fromInt(i)`.
Similarly, the `Real_num(r)` pattern extracts the underlying real value
carried by the data constructor and binds it to `r`.

So, for instance, calling `num_to_real(Int_num(3))` matches the first
pattern, binds `i` to 3, and then returns `Real.fromInt(i)` = `Real.fromInt(3)`
= `3.0`. Calling `num_to_real(Real_num(4.5))` fails to match
the first pattern, succeeds in matching the second pattern, binds the r to 4.5,
and then returns r = 4.5.

Here is an alternative definition of max on numbers.

fun max2(n1:num, n2:num):num = (case (n1, n2) of (Real_num(r1), Real_num(r2)) => Real_num(Real.max(r1,r2)) | (Int_num(i1), Int_num(i2)) => Int_num(Int.max(i1,i2)) | (_, Int_num(i2)) => max2(n1, Real_num(num_to_real(i2)) | (Int_num(i1), _) => max2(n2, Real_num(num_to_real(i1)))

Notice that that case expression in `max2`
matches a tuple of the numbers n1 and n2. Thus, all of the patterns in the
case expressions are of the tuple form. For example, the pattern `(Real_num(r1),
Real_num(r2)) `matches if and only if both the numbers are reals.

In the third and fourth patterns, we've used a "wildcard" (or
default) pattern. For instance, the third pattern `(_, Int_num(i2)) `matches
iff the first number is anything, but the second is an integer. In this
case, we simply convert the integer to a real and then call ourselves
recursively. Similarly, the fourth pattern `(Int_num(i1), _) `the
first number is an integer and the second number is anything. In this
case, we convert the first number to a real and call ourselves recursively.

Now suppose we call max2 with two integers `max2(Int_num(3), Int_num(4))`.
It appears as if this matches any of the last three cases, so which one do we
select? The answer is that we try the matches in order. So the second
pattern will succeed and the other patterns won't even be tried.

Another question is, how do we know if there is a case for every situation? For instance, suppose we accidentally wrote:

fun max3(n1:num, n2:num):num = (case (n1, n2) of (Int_num(i1), Int_num(i2)) => Int_num(Int.max(i1,i2)) | (_, Int_num(i2)) => max3(n1, Real_num(num_to_real(i2)) | (Int_num(i1), _) => max3(n2, Real_num(num_to_real(i1)))

Now there is no case for when n1 and n2 are both reals. If you type
this in to SML, then it will complain that the pattern match is inexhaustive.
This is wonderful because it tells you your code might fail since you forgot a
case! In general, we will *not* accept code that has a match
inexhaustive warning. That is, you must make sure you never turn in code
that doesn't cover all of the cases.

What happens if we put in too many cases? For instance, suppose we wrote:

fun max2(n1:num, n2:num):num = (case(n1, n2) of (Real_num(r1), Real_num(r2)) => Real_num(Real.max(r1,r2)) | (Int_num(i1), Int_num(i2)) => Int_num(Int.max(i1,i2)) | (_, Int_num(i2)) => max2(n1, Real_num(num_to_real(i2)) | (Int_num(i1), _) => max2(n2, Real_num(num_to_real(i1))) | (_, _) => n1

Then SML complains again that the last case will never be reached.
Again, this is wonderful because it tells us there's some useless code that we
should either trim away, or reexamine (in its context) to see why it will never
be executed. Again, we will *not* accept code that has redundant
patterns.

So how can the SML type-checker determine that a pattern match is exhaustive
and that there are no dead cases? The reason is that patterns can only
test a finite number of things (there are no loops in patterns), the tests are
fairly simple (e.g., is this a Real_num or an Int_num?) and the set of datatype
constructors for a given type is *closed*. That is, after defining a
datatype, we can't simply add new data constructors to it. Note that if we
could, then *every* pattern match would be potentially inexhaustive.

At first, this seems to be a shortcoming of the language. Adding new constructors is something that happens all the time, just as adding new subclasses happens all the time in Java programs. The difference in SML is that, if you add a new data constructor to a datatype declaration, then the compiler will tell you where you need to examine or change your code through "match inexhaustive" errors. This makes pattern matching an invaluable tool for maintaining and evolving programs.

So sometimes, by *limiting *a programming language we gain some power.
In the case of the pattern-matching sub-language of ML, the designers have
restricted the set of tests that can be performed so that the compiler can
automatically tell you where you need to look at your code to get it in
synch with your definitions.