CS 3110 Lecture 2
Syntax and evaluation of OCaml programs

Summary of topics:

OCaml syntax
Expression, terms, types, and values
Errors
Evaluation and rewrite rules
Namespaces and scope
Qualified identifiers and libraries

OCaml syntax

In the previous recitation, you should've seen a few simple expression and declaration forms for OCaml. The syntax of this fragment of the language can be summarized as follows: (note that ~ is unary, - is binary).

syntactic class	syntactic variable(s) and grammar rule(s)	examples
identifiers	x, f	`a`, `x`, `y`, `x_y`, `foo1000`, ...
constants	c	...`-2`, `-1`, `0`, `1`, `2` (integers) `1.0`, `-0.001`, `3.141` (floats) `true`, `false` (booleans) `"hello"`, `""`, `"!"` (strings) `'A'`, `' '`, `'\n'` (characters)
unary operator	u	`-`, `not`
binary operators	b	`+`, `*`, `-`, `>`, `<`, `>=`, `<=`, `^`, ...
terms	e ::= x \| c \| u e \| e₁ b e₂ \| `if` e `then` e `else` e \| `let` d₁ `and`...`and` d_n `in` e \| e `(`e₁`,` ...`,` e_n`)`	`foo`, `~0.001`, `not` `b`,`2 + 2`,
declarations	d ::= x = e \| f `(`x₁, ..., x_n`):` t = e	`one = 1< square(x):int = x*x`
types	t ::= `int` \| `real` \| `bool` \| `string` \| `char` \| t₁``...``t_n`->`t	`int`, `string`, `int->int`, `bool*int->bool`

A program in ML, like any other language, is made up of various kinds of expressions. The table above describes how to construct some of those expression. That is, it specifies some of the syntax of ML. Some of these expressions, such as identifiers, constants, and operators, we have described only by example. These expressions are all single tokens. Other expressions, such as terms, declarations, and types, are described by grammar rules. These rules are written in a form known as BNF, for Backus-Naur Form (named after its inventors). Each rule describes various ways to build a particular kind of expression, separated by vertical bars. For example, a term may be an identifier, a constant, any unary operator u followed by any expression e (u e), any two terms e₁ and e₂separated by any binary operator b, and so on. Notice that we use the letter u to represent any unary operator and the letter e to represent any term. These are examples of syntactic variables or metavariables. A syntactic variable is not an OCaml program variable; it is just a generic name for a certain syntactic construct. For instance, x can be any identifier, and e can be any expression. We sometimes stick subscripts on syntactic variables to help us keep them distinct (as is done above), but this is not necessary.

The ML interpreter allows either terms or declarations to be typed at the prompt. We can think of a program as being just an ML expression, although later we'll see it is more complex.

Program errors

Just because an expression has legal syntax doesn't mean that it is legal; the expression must also be well-typed. That is, it must use expressions only in accordance with their types. We will look at what it means for an expression to be well-typed in more detail later in the course. In general, it is useful to think of a type as a set of possible values (usually an infinite set). We will see that OCaml has a powerful, expressive type system.

More generally, there are many ways that an expression in ML can be "wrong", sort of like in English:

Syntax errors: let 0 x =; "Spot run see"
Type errors: "x" + 3; "See Spot ran"
Semantic errors, 1 / 0; "Colorless green ideas sleep furiously" (good grammar, incoherent semantics)
More general errors: ML program that correctly computes the wrong answer, "Officer, you wouldn't dare give me a ticket!"

Now, how do we write expressions and declarations? Here is a simple function declaration that computes the absolute value of a number:

let abs(x: int):int =
  if r < 0 then -r else r

Every expression and declaration has both a type and a value. When you type an expression or declaration into the OCaml top-level, it will report both the type and the value of the expression. If we type the definition of abs at the ML prompt, it replies with the following:

abs : float->float = <fun>

which means that we have just bound the name abs to a function whose type is float->float.

Examples

Here is a function that computes whether its argument is a prime number. The type of the function is int->bool. Note the use of a recursive helper function noDivisorsAbove that is declared inside the function isPrime.

Here is a function declaration which finds (an approximation to) the square root of a floating point number.

Underlying math fact: for any positive x, g, it is the case that g, x/g lie on opposite sides of sqrt(x). That is because their product is x.

This is example shows a number of things. First, you can declare local values (such as delta) and local functions (such as abs, goodEnough, improve, and tryGuess.) Notice that "inner" functions, such as improve, can refer to outer variables (such as x). Also notice that later definitions can refer to earlier definitions. For instance, tryGuess refers to both goodEnough and improve. Finally, notice that tryGuess is a recursive function -- it's really a loop. It's similar to writing something like:

while (!goodEnough(guess)) 
   guess = improve(guess);
return guess;

in an imperative language such as Java or C.

If you type the squareRoot declaration above into the OCaml top-level, it responds with:

squareRoot : float -> float = <fun>

indicating that you've declared a variable (squareRoot), that its value is a function (<fun>), and that its type is a function from float to float. All of the internal structure of the function definition is hidden; all we know from the outside is that its value is a simple function. In particular, the function tryGuess is not defined!

After typing in the function, you might try it out on a floating point number such as 9.0:

# squareRoot(9.0);
  - : float = 3.00000000014

OCaml has evaluated the expression "squareRoot(9.0)" and printed the value of the expression (3.00000000014) and the type of the value (float).

At the moment we have only an imprecise notion of exactly what happens when you type this expression into ML. Hopefully we'll have a more precise understanding soon.

If you try to apply squareRoot to an expression that does not have type real (say an integer or a boolean), then you'll get a type error:

# squareRoot(9);;
This expression has type int but is used here with type float

Qualified Identifiers and the Library

Qualified identifiers are of the form x.y where x is a module identifier. Examples include String.length, List.map, and String.sub. As in Java with packages and classes, in OCaml qualified identifiers allow a set of names to be grouped together in a separate code module.

Evaluation

The OCaml prompt lets you type either a term or a declaration that binds a variable to a term. It evaluates the term to produce a value: a term that does not need any further evaluation. We can define values v as a syntactic class too. For now, we can think of values as just being the same as constants, though we'll see there is much more to them.

Running an ML program is just evaluating a term. What happens when we evaluate a term? In an imperative (non-functional) language like Java, we sometimes imagine that there is an idea of a "current statement" that is executing. This isn't a very good model for ML; it is better to think of ML programs as being evaluated in the same way that you would evaluate a mathematical expression. For example, if you see an expression like (1+2)*3, you know that you first evaluate the subexpression 1+2, getting a new expression 3*3. Then you evaluate 3*3. ML evaluation works the same way. As each point in time, the ML evaluator takes the left-most expression that is not a value and rewrites (or reduces) it to some simpler expression. Eventually the whole expression is a value and then evaluation stops: the program is done. Or maybe the expression never reduces to a value, in which case you have an infinite loop.

ML has a bunch of built-in rules for rewriting terms that go well beyond simple arithmetic. Consider the if expression. It has two important rewrite rules:

if true then e₁ else e₂   -->  e₁
if false then e₁ else e₂  -->  e₂

If the evaluator runs into an if expression, the first thing it does is try to reduce the conditional expression to either true or false. Then it can apply one of the two rules here.

Substitution

There are two more expressions (terms) above with rewrite rules. The let expression works by first evaluating all of its bindings. Then those bindings are substituted into the body of the let expression (the expression after the in). For example, here is an evaluation using let :

let x = 1+4 in x*3 --> let x = 5 in x*3 -> 5*3 -> 15

Function calls are the most interesting case. When a function is called, ML does a similar subsitution: it substitutes the values passed as arguments into the body of the function. Consider evaluating abs(2+1):

abs(2+1)  -->  abs(3)  --> if 3 < 0 then -3 else 3
   -->  if false then -3 else 3 --> 3

This is a simple start on how to think about evaluation; we'll have much more to say about evaluation in a couple of lectures.