CS312 Lecture 2: More Introduction to SML

In sections on Wednesday, you should've seen a few simple expression and declaration forms for SML. The syntax of this fragment of the language can be summarized as follows: (note that ~ is unary, - is binary).

identifiers (id)	`x`, `y`, `foo1000`
constants (const)	...~2, ~1, 0, 1, 2 (integers) 1.0, ~0.001, 3.141 (reals) true, false (booleans) "hello", "", "!" (strings) #"A", #" " (characters)
types (type)	`int`, `real`, `bool`, `string`, `char`, type``type``...``type* `->` type
unary operations (unary-op expr)	`~`, `not`, `size`, ...
binary operations (binary-op expr)	`+, , -, >, <, >=, <=, ^`, ...*
expressions (expr)	constant, id, unary-op expr, expr binary-op expr, if expr then expr else expr, let decl ... decl in expr end, expr(expr,...,expr)
declarations (decl)	val id = expr fun id(id:type,...,id:type):type = expr

The ML interpreter allows either expressions or declarations to be typed at the prompt. We can think of a program as being just an ML expression, although later we'll see it is more complex.

We can think of this table as giving a set of rules for building up an ML expression that has legal syntax. For example, a type must be either the word int, the word real, the word bool, the word char, or something else of the form type*type*...*type -> type, where the word type represents any type built up according to the rules for building types. For example, int->bool is a legal type that represents the type of a function that takes in an int and returns a boolean. Since int->bool is a legal type, so is string*(int->bool)->string. It is the type of a function that expects two arguments, a string and another function, and returns a string. The function passed as the second argument must accept an integer and return a boolean. Thus, we can see that ML supports higher-order functions that manipulate other functions as if they were ordinary values. (In this definition, type is known as a syntactic metavariable: it isn't a variable at the level of the SML language; it's a variable that we use for defining syntax.)

Just because an expression has legal syntax doesn't mean that it is a legal expression; the expression must also be well-typed. That is, it must use expressions only in accordance with their types. We will look at what it means for an expression to be well-typed in more detail later in the course. In general, it is useful to think of a type as a set of possible values (usually an infinite set).

More generally, there are many ways that an expression in ML can be "wrong", sort of like in English:

Syntax errors: val 0 x =; "Spot run see"
Type errors: "x" + 3; "See Spot ran"
Semantic errors, 1 / 0; "Colorless green ideas sleep furiously" (good grammar, incoherent semantics)
More general errors: ML program that correctly computes what your boss doesn't want, "Officer, you wouldn't dare give me a ticket!"

Now, how do we write expressions and declarations? Here is a simple function declaration that computes the absolute value of a real number:

fun abs(r: real):real =
  if r < 0.0 then ~r else r

every expression and declaration has both a type and a value. When you type an expression or declaration into the SML top-level, it will report both the type and the value of the expression. If we type the definition of abs at the ML prompt, it replies with the following:

val abs = fn : real->real

which means that we have just bound the name abs to a function whose type is real->real.

Namespace management

We can define various functions but we need to avoid collisions. Often we only "need" a certain name within a certain piece of code (literally within). Where an identifier is defined is called its scope. This issue can be very confusing when you type things into ML, as opposed to loading a file into a fresh ML.

Here is a more complex function declaration which finds (an approximation to) the square root of a real number.

Underlying math fact: g, x/g lie on opposite sides of sqrt(x).
Proof: g>sqrt(x) => 1>sqrt(x)/g => sqrt(x) > x/g

(* Computes the square root of x using Heron of Alexandria's
 * algorithm (circa 100 AD). We "guess" that the square root
 * is 1.0 and then continue improving the guess until we're
 * with delta of the real answer.  The improvement is achieved
 * by averaging the current guess with x/guess.
 *)
fun square_root(x: real): real =
  let
    (* used to tell when the approximation is good enough *)
    val delta = 0.0001
    (* returns true iff the guess is good enough *)
    fun good_enough(guess: real): bool =
      abs(guess*guess - x) < delta
    (* improve the guess by averaging it with x/guess *)
    fun improve(guess: real): real =
      (guess + x/guess) / 2.0
    (* try a particular guess -- looping and improving the
     * guess if it's not good enough. *)
    fun try_guess(guess: real): real =
      if good_enough(guess) then guess
      else try_guess(improve(guess))
  in
    (* start with a guess of 1.0 *)
    try_guess(1.0)
  end

This is example shows a number of things. First, you can declare local values (such as delta) and local functions (such as abs, good_enough, improve, and try_guess.) Notice that "inner" functions, such as improve, can refer to outer variables (such as x). Also notice that later definitions can refer to earlier definitions. For instance, try_guess refers to both good_enough and improve. Finally, notice that try_guess is a recursive function -- it's really a loop. It's similar to writing something like:

while (!good_enough(guess)) {
   guess = try_guess(improve(guess));
}

in an imperative language such as Java or C.

If you type the square_root declaration above into the SML top-level, it responds with:

val square_root : fn real -> real

indicating that you've declared a variable (square_root), that its value is a function (fn), and that its type is a function from reals to reals. All of the internal structure of the function definition is hidden; all we know from the outside is that its value is a simple function. In particular, the function "try" is not defined!

After typing in the function, you might try it out on a real number such as 9.0:

- square_root(9.0);
  val it = 3.00000000014 : real

SML has evaluated the expression "square_root(9.0)" and printed the value of the expression (3.00000000014) and the type of the value (real).

At the moment we have only a sloppy, imprecise notion of exactly what happens when you type this expression into ML. In a few weeks we'll have a precise understanding (hopefully!)

If you try to apply square_root to an expression that does not have type real (say an integer or a boolean), then you'll get a type-error:

- square_root(9);
stdIn:27.1-27.14 Error: operator and operand do not agree [literal]
operator domain: real
operand:         int
in expression:
  square_root 9

A Few More Expression and Declaration Forms

There are a few more expression and declaration forms that you need to know in order to do useful programming in SML. Today, we'll see the following new forms: (1) qualified identifiers, (2) tuples, and (3) records

1. Qualified Identifiers and the Library

Qualified identifiers are of the form structid.id where structid is a structure identifier. Examples include Int.toString, Real.fromInt, String.sub, etc. This is another way to manage the namespace(actually to group names).

Structures are a bit like Java packages or C++ namespaces in that they are used to organize collections of definitions. There are a number of pre-defined library structures provided by SML that are extremely useful. For instance, the Int structure provides a number of useful operations on int values, the Real structure provides operations on real values, and the String structure provides operations on string values. To find out more about the library structures and the operations they provide, see the Standard ML Library documents.

For example, there is a built-in operation for calculating the absolute value of a real number called Real.abs. We could use that directly in our implementation of the square_root function as follows:

fun square_root(x: real): real = 
    let ...
        (* returns true iff the guess is good enough *)
        fun good_enough(guess: real): bool = 
            Real.abs(guess*guess - x) < delta
        ...
    in 
       (* start with a guess of 1.0 *)
       try_guess(1.0)
    end

Take some time to look at the libraries and find out what they provide. You shouldn't re-code something that's available in the library (unless we ask you to do so explicitly.) In fact, we could avoid writing the square_root function all together because it's already provided by the Math structure! However, it's called "Math.sqrt" instead of square_root. So we can simply write:

fun square_root(x: real): real = Math.sqrt(x)

or even:

val square_root = Math.sqrt

to create an alias to Math.sqrt. We'll have a lot more to say about the libraries, structures, and qualified identifiers later on when we talk about the SML module language.

2. Tuples

Every function in SML takes exactly one value and returns exactly one result. For instance, our square_root function takes one real value and returns one real value. The advantage of always taking one argument and returning one result is that the language is extremely uniform. Later, we'll see that this buys us a lot when it comes to composing new functions out of old ones.

But it looks like we can write functions that take more than one argument! For instance, we may write:

fun max(r1:real, r2:real):real =
  if r1 < r2 then r2
  else r1

max(3.1415, 2.718)

and it appears as if max takes two arguments. In truth max takes one argument that is a 2-tuple (also known as an ordered pair.)

In general, an n-tuple is an ordered sequence of n values written in parenthesis and separated by commas as (expr, expr, ..., expr). For instance, (42, "hello", true) is a 3-tuple that contains the integer 42 as its first component, the string "hello" as its second component, and the boolean value true as its third component. As another example, () is the empty tuple. This is called "unit" in SML.

When you call a function in SML, if it takes more than one argument, then you have to pass it a tuple of the arguments. For instance, when we write:

max(3.1415, 2.718)

we're passing the 2-tuple (3.1415, 2.718) to the function max. We could just as well write:

val args = (3.1415, 2.178);

max args  (* evaluates to 3.1415 *)

The type of an n-tuple is written (type * type * ... * type). For instance, the type of args above is (real * real). This notation is based on the Cartesian product in mathematics (i.e., the plane is R^2 = R * R).

Similarly, the 3-tuple (42, "hello", true) has type (int * string * bool). Notice that max has type (real * real) -> real indicating that it takes one argument (a 2-tuple of reals) and returns one result (a real).

You can extract the components of a tuple by using the form "#n exp" where n is a number between 1 and the size of the tuple. For instance, #2 (1, "hello", true) evaluates to "hello", whereas #1 (3.1415, 2.178) evaluates to 3.1415.

So, for instance, we can rewrite the max function as follows:

fun max(pair: real*real):real =
    if (#1 pair) < (#2 pair) then
  (#2 pair) else (#1 pair);

and this is completely equivalent to the first definition. This emphasizes that max really does take just one argument -- a pair of real numbers. But of course, it's a lot less readable than the first definition. We can get closer to the first definition by declaring local values r1 and r2 and bind them to the appropriate components of the pair:

fun max(pair: real*real):real =
  let val r1 = #1 pair
      val r2 = #2 pair
  in
    if r1 < r1 then r2 else r1
  end

This is a little better because we avoid re-computing the same expressions over and over again. However, it's still not as succinct as our first definition of max. This is because the first definition uses pattern matching to implicitly de-construct the 2-tuple and bind the components to variables r1 and r2. You can use pattern matching in a val declaration or in a function definition to deconstruct a tuple. A tuple pattern is always of the form (id:type, id:type,...,id:type). For instance, here is yet another version of max that uses a pattern in a val declaration to deconstruct the pair:

fun max(pair: real*real):real =
  let val (r1:real, r2:real) = pair
  in
    if r1 < r1 then r2 else r1
  end

In the example above, the val declaration matches the pair against the tuple-pattern (r1:real, r2:real). This binds r1 to the first component of the pair (#1 pair) and r2 to the second component (#2 pair). A similar thing happens when you write a function using a tuple-pattern as in the original definition of max:

fun max(r1:real, r2:real):real = if r1 < r2 then r2 else r1

Here, when we call max with the pair (3.1415, 2.718), the tuple is matched against the pattern (r1:real, r2:real) and r1 is bound to the 3.1415 and r2 to 2.718. As we'll see later on, SML uses pattern matching in a number of places to simplify expressions.

In summary:

every function in SML takes 1 argument and returns 1 result.
(expr, expr, ... , expr) creates an n-tuple.
tuple types look like (type * type * ... * type)
#n expr extracts the nth component of a tuple.
val(id:type,id:type,...,id:type) = exp matches the tuple expression exp against the tuple-pattern (id:type,id:type,...,id:type) and binds the identifiers in the pattern to the appropriate components of the tuple.
fun id(id:type,id:type,...,id:type):type = exp is a function declaration that takes an n-tuple as an argument and matches the tuple against the tuple-pattern (id:type,id:type,...,id:type).

3. Records

Records are similar to tuples in that they are data structures for holding multiple values. However, they are different from tuples in that they carry an unordered collection of labeled values. In general, record expressions are of the form {id = expr, id = expr, ..., id = expr} where the identifiers id are labels. For example, the expression

{first = "John", last = "Doe", age = 150, balance = 0.12}

is a record with four fields named first, last, age, and balance. You can extract a field from a record by using #id expr where exp is the record and id is the field that you want to extract. For instance, applying #age to the record above yields 150, whereas applying #balance yields 0.12.

When creating a record, it does not matter in what order you give the fields. So the record

{balance = 0.12, age = 150, first = "John", last = "Doe"}

is equivalent to the example above. Note that when you type in one of these records to the SML top-level, it sorts the fields into a canonical order:

- val pers = { first = "John", last = "Doe",
              age = 150, balance = 0.12 };
val pers = {age=150,balance=0.12,first="John",last="Doe"}
: {age:int,balance:real,first:string,last:string}

The type of a record is written as {id:type, id:type, ...,id:type} .

Just as you can use pattern-matching to extract the components of a tuple, you can use pattern matching to extract the fields of a record. For instance, you can write:

val {first:string,last:string,age:int,balance:real} = jgm

and SML responds with:

val age = 150 : int
val balance = 0.12 : real
val first = "Greg" : string
val last = "Morrisett" : string

thereby binding the identifiers age, balance, first, and last to the respective components of the record. You can also write functions where the argument is a record using a record pattern. For example:

fun full_name{first:string,last:string,age:int,balance:real}:string =
  first ^ " " ^ last (* ^ is the string concatenation operator *)

Calling full_name and passing it the record jgm yields "John Doe" as an answer.

It turns out that we can think of tuples as short-hand for records. In particular, the tuple expression (3.14, "Greg", true) is like the record expression {1=3.14, 2="Greg", 3=true}. So in some sense, tuples are just syntactic sugar for records.

In summary:

record expressions are of the form {id = exp, id = exp, ..., id = exp}.
record types are of the form {id:type, id:type, ..., id:type}.
you can extract a field from a record by writing #id exp.
you can pattern match records using a pattern of the form {id:type,id:type,...,id:type}.

We'll cover datatypes and pattern matching in detail next week.