CS312 Lecture 2
ML Syntax and Program Evaluation

Summary of topics:

ML syntax
Errors
Evaluation and rewrite rules
Namespaces and scope
Qualified identifiers and libraries

ML syntax

In sections on Wednesday, you should've seen a few simple expression and declaration forms for SML. The syntax of this fragment of the language can be summarized as follows: (note that ~ is unary, - is binary).

syntactic class	syntactic variable(s) and grammar rule(s)	examples
identifiers	x, y	`a`, `x`, `y`, `x_y`, `foo1000`, ...
constants	c	...`~2`, `~1`, `0`, `1`, `2` (integers) `1.0`, `~0.001`, `3.141` (reals) `true`, `false` (booleans) `"hello"`, `""`, `"!"` (strings) `#"A"`, `#" "` (characters)
unary operator	u	`~`, `not`, `size`, ...
binary operators	b	`+`, `*`, `-`, `>`, `<`, `>=`, `<=`, `^`, ...
expressions (terms)	e ::= x \| c \| u e \| e₁ b e₂ \| `if` e `then` e `else` e \| `let` d₁...d_n `in` e `end` \| e `(`e₁`,` ...`,` e_n`)`	`foo`, `~0.001`, `not` `b`,`2 + 2`,
declarations	d ::= `val` x = e \| `fun` y `(`x₁:t₁, ..., x_n:t_n`):` t = e	val one = 1 fun square(x: int): int
types	t ::= `int` \| `real` \| `bool` \| `string` \| `char` \| t₁``...``t_n`->`t	`int`, `string`, `int->int`, `bool*int->bool`

A program in ML, like any other language, is made up of various kinds of expressions. The table above describes how to construct some of those expression. That is, it specifies some of the syntax of ML. Some of these expressions, such as identifiers, constants, and operators, we have described only by example. These expressions are all single tokens. Other expressions, such as terms, declarations, and types, are described by grammar rules. These rules are written in a form known as BNF, for Backus-Naur Form (named after its inventors). Each rule describes various ways to build a particular kind of expression, separated by vertical bars. For example, a term may be an identifier, a constant, any unary operator u followed by any expression e (u e), any two terms e₁ and e₂separated by any binary operator b, and so on. Notice that we use the letter u to represent any unary operator and the letter e to represent any term. These are examples of syntactic variables or metavariables. A syntactic variable is not an SML program variable; it is just a generic name for a certain syntactic construct. For instance, x can be any identifier, and e can be any expression. We sometimes stick subscripts on syntactic variables to help us keep them distinct (as is done above), but this is not necessary.

The ML interpreter allows either terms or declarations to be typed at the prompt. We can think of a program as being just an ML expression, although later we'll see it is more complex.

Program errors

Just because an expression has legal syntax doesn't mean that it is legal; the expression must also be well-typed. That is, it must use expressions only in accordance with their types. We will look at what it means for an expression to be well-typed in more detail later in the course. In general, it is useful to think of a type as a set of possible values (usually an infinite set). We will see that SML has a powerful, expressive type system.

More generally, there are many ways that an expression in ML can be "wrong", sort of like in English:

Syntax errors: val 0 x =; "Spot run see"
Type errors: "x" + 3; "See Spot ran"
Semantic errors, 1 / 0; "Colorless green ideas sleep furiously" (good grammar, incoherent semantics)
More general errors: ML program that correctly computes the wrong answer, "Officer, you wouldn't dare give me a ticket!"

Now, how do we write expressions and declarations? Here is a simple function declaration that computes the absolute value of a real number:

fun abs(r: real):real =
  if r < 0.0 then ~r else r

every expression and declaration has both a type and a value. When you type an expression or declaration into the SML top-level, it will report both the type and the value of the expression. If we type the definition of abs at the ML prompt, it replies with the following:

val abs = fn : real->real

which means that we have just bound the name abs to a function whose type is real->real.

Example

Here is a function that computes whether its argument is a prime number. The type of the function is int->bool. Note the use of a recursive helper function noDivisorsAbove that is declared inside the function isPrime.


(* Returns whether n is prime.
   Requires: n is a positive integer. *)
fun isPrime(n: int): bool =
  let fun noDivisorsAbove(m: int) =
    if n mod m = 0 then false
	else if m*m >= n then true
	else noDivisorsAbove(m+1)
  in
    noDivisorsAbove(2)
  end

Evaluation

The SML prompt lets you type either a term or a declaration that binds a variable to a term. It evaluates the term to produce a value: a term that does not need any further evaluation. We can define values v as a syntactic class too. For now, we can think of values as just being the same as constants, though we'll see there is much more to them.

Running an ML program is just evaluating a term. What happens when we evaluate a term? In an imperative (non-functional) language like Java, we sometimes imagine that there is an idea of a "current statement" that is executing. This isn't a very good model for ML; it is better to think of ML programs as being evaluated in the same way that you would evaluate a mathematical expression. For example, if you see an expression like (1+2)*3, you know that you first evaluate the subexpression 1+2, getting a new expression 3*3. Then you evaluate 3*3. ML evaluation works the same way. As each point in time, the ML evaluator takes the left-most expression that is not a value and rewrites (or reduces) it to some simpler expression. Eventually the whole expression is a value and then evaluation stops: the program is done. Or maybe the expression never reduces to a value, in which case you have an infinite loop.

ML has a bunch of built-in rules for rewriting terms that go well beyond simple arithmetic. For example, consider the if expression. It has two important rewrite rules:

if true then e₁ else e₂   -->  e₁
if false then e₁ else e₂  -->  e₂

If the evaluator runs into an if expression, the first thing it does is try to reduce the conditional expression to either true or false. Then it can apply one of the two rules here.

Substitution

There are two more expressions (terms) above with rewrite rules. The let expression works by first evaluating all of its bindings. Then those bindings are substituted into the body of the let expression (the expression in between in...end). For example, here is an evaluation using let :

let val x = 1+4 in x*3   -->  let val x = 5 in x*3  -->  5*3  -->  15

Function calls are the most interesting case. When a function is called, ML does a similar subsitution: it substitutes the values passed as arguments into the body of the function. For example, consider evaluating abs(2.0+1.0):

abs(2.0+1.0)  -->  abs(3.0)  --> if 3.0 < 0.0 then ~3.0 else 3.0
   -->  if false then ~3.0 else 3.0 --> 3.0

This is a simple start on how to think about evaluation; we'll have much more to say about evaluation in a couple of lectures.

Scope

We can define various functions but we need to avoid collisions. Often we only "need" a certain name within a certain piece of code (literally within). Where an identifier is defined is called its scope. This issue can be very confusing when you type things into ML, as opposed to loading a file into a fresh ML.

Here is a more complex function declaration which finds (an approximation to) the square root of a real number.

Underlying math fact: for any positive x, g, it is the case that g, x/g lie on opposite sides of sqrt(x). That is because their product is x.

(* Computes the square root of x using Heron of Alexandria's
 * algorithm (circa 100 AD). We "guess" that the square root
 * is 1.0 and then continue improving the guess until we're
 * with delta of the real answer.  The improvement is achieved
 * by averaging the current guess with x/guess.
 *)
fun squareRoot(x: real): real =
  let
    (* used to tell when the approximation is good enough *)
    val delta = 0.0001

    (* returns true iff the guess is good enough *)
    fun goodEnough(guess: real): bool =
      abs(guess*guess - x) < delta

    (* improve the guess by averaging it with x/guess *)
    fun improve(guess: real): real =
      (guess + x/guess) / 2.0

    (* try a particular guess -- looping and improving the
     * guess if it's not good enough. *)
    fun tryGuess(guess: real): real =
      if goodEnough(guess) then guess
      else tryGuess(improve(guess))
  in
    (* start with a guess of 1.0 *)
    tryGuess(1.0)
  end

This is example shows a number of things. First, you can declare local values (such as delta) and local functions (such as abs, goodEnough, improve, and tryGuess.) Notice that "inner" functions, such as improve, can refer to outer variables (such as x). Also notice that later definitions can refer to earlier definitions. For instance, tryGuess refers to both goodEnough and improve. Finally, notice that tryGuess is a recursive function -- it's really a loop. It's similar to writing something like:

while (!goodEnough(guess)) 
   guess = improve(guess);
return guess;

in an imperative language such as Java or C.

If you type the squareRoot declaration above into the SML top-level, it responds with:

val squareRoot : fn real -> real

indicating that you've declared a variable (squareRoot), that its value is a function (fn), and that its type is a function from reals to reals. All of the internal structure of the function definition is hidden; all we know from the outside is that its value is a simple function. In particular, the function tryGuess is not defined!

After typing in the function, you might try it out on a real number such as 9.0:

- squareRoot(9.0);
  val it = 3.00000000014 : real

SML has evaluated the expression "squareRoot(9.0)" and printed the value of the expression (3.00000000014) and the type of the value (real).

At the moment we have only a sloppy, imprecise notion of exactly what happens when you type this expression into ML. In a few weeks we'll have a precise understanding (hopefully!)

If you try to apply squareRoot to an expression that does not have type real (say an integer or a boolean), then you'll get a type error:

- squareRoot(9);
stdIn:27.1-27.14 Error: operator and operand do not agree [literal]
operator domain: real
operand:         int
in expression:
  square_root 9

Qualified Identifiers and the Library

Qualified identifiers are of the form x.y where x is a structure identifier. Examples include Int.toString, Real.fromInt, String.sub, etc. This is another way to manage the namespace(actually to group names).

Structures are a bit like Java packages or C++ namespaces in that they are used to organize collections of definitions. There are a number of pre-defined library structures provided by SML that are extremely useful. For instance, the Int structure provides a number of useful operations on int values, the Real structure provides operations on real values, and the String structure provides operations on string values. To find out more about the library structures and the operations they provide, see the Standard ML Library documents.

For example, there is a built-in operation for calculating the absolute value of a real number called Real.abs. We could use that directly in our implementation of the square_root function as follows:

fun squareRoot(x: real): real = 
    let ...
        (* returns true iff the guess is good enough *)
        fun goodEnough(guess: real): bool = 
            Real.abs(guess*guess - x) < delta
        ...
    in 
       (* start with a guess of 1.0 *)
       tryGuess(1.0)
    end

Take some time to look at the libraries and find out what they provide. You shouldn't recode something that's available in the library (unless we ask you to do so explicitly.) In fact, we could avoid writing the square_root function all together because it's already provided by the Math structure! However, it's called "Math.sqrt" instead of square_root. So we can simply write:

fun squareRoot(x: real): real = Math.sqrt(x)

or even:

val squareRoot = Math.sqrt

to create an alias to Math.sqrt. We'll have a lot more to say about the libraries, structures, and qualified identifiers later on when we talk about the SML module language.

CS312 Lecture 2 ML Syntax and Program Evaluation