Lecture 2: Syntax and Evaluation of OCaml Programs

Summary of topics:

OCaml syntax

In the previous recitation, you should have seen a few simple expression and declaration forms for OCaml.  The syntax of this fragment of the language can be summarized as follows:

syntactic class syntactic variable(s) and grammar rule(s) examples
identifiers x, f a, x, y, x_y, foo1000, ...
constants c ...-2, -1, 0, 1, 2 (integers)
 1.0, -0.001, 3.141 (floats)
true, false (booleans)
"hello", "", "!" (strings)
'A', ' ', '\n' (characters)
unary operator u -, not
binary operators b +, *, -, >, <, >=, <=, ^, !=, ...
terms e ::= x  |  u e  |  e1 b e2  | if e then e else e
 |  
let d1 and...and dn in e   |  e (e1, ..., en)
foo, -0.001, not b, 2+2
declarations d ::= x = e  |    f ( x1, ...,  xn): t =  e one = 1
square(x:int):int = x*x
types t ::= int  |  float  |  bool  |  string  |  char  |  t1*...*tn->t int, string, int->int, bool*int->bool

A program in OCaml, like any other language, is made up of various kinds of expressions. The table above describes how to construct some of those expressions. That is, it specifies some of the syntax of OCaml. Some of these expressions, such as identifiers, constants, and operators, we have described only by example. These expressions are all single tokens. Other expressions, such as terms, declarations, and types, are described by grammar rules. These rules are written in a form known as BNF, for Backus-Naur Form (named after its inventors). Each rule describes various ways to build a particular kind of expression, separated by vertical bars. For example, a term may be an identifier, a constant, any unary operator u followed by any expression e (u e), any two terms e1 and e2 separated by any binary operator b, and so on. Notice that we use the letter u to represent any unary operator and the letter e to represent any term. These are examples of syntactic variables or metavariables. A syntactic variable is not an OCaml program variable; it is just a generic name for a certain syntactic construct. For instance, x can be any identifier, and e can be any expression. We sometimes stick subscripts on syntactic variables to help us keep them distinct (as is done above), but this is not necessary.

The OCaml interpreter allows either terms or declarations to be typed at the prompt. We can think of a program as being just an OCaml expression, although later we'll see it is more complex.

Program errors

Just because an expression has legal syntax doesn't mean that it is legal; the expression must also be well-typed. That is, it must use expressions only in accordance with their types. We will look at what it means for an expression to be well-typed in more detail later in the course. In general, it is useful to think of a type as a set of possible values (usually an infinite set). We will see that OCaml has a powerful, expressive type system.

More generally, there are many ways that an expression in OCaml can be wrong, like in English:

Now, how do we write expressions and declarations? Here is a declaration of a simple function that computes the absolute value of a given integer:

let abs (x : int) : int =
  if x < 0 then -x else x
Equivalently, one could write
let abs : int -> int =
  function x -> if x < 0 then -x else x
or more briefly,
let abs = fun x -> if x < 0 then -x else x

Every expression and declaration has both a type and a value.  When you type an expression or declaration into the OCaml top-level, it will report both the type and the value of the expression.  If we type the definition of abs at the OCaml prompt, followed by ;; to let the OCaml interpreter know that the expression should now be evaluated, it responds with

val abs : int -> int = <fun>
which means that we have just bound the name abs to a function whose type is int -> int.

Examples

Here is a function that determines whether its argument is a prime number. The type of the function is int -> bool.

Turn on Javascript to see the program.

There are a couple things to notice about this program. First, note the use of the recursive helper function noDivisors that is declared inside the function isPrime. The function is defined with let rec because it is recursive. This function would be written with a loop in an imperative language, but an appropriately named helper function can be clearer to read than a generic loop. The scope of the declaration is the body of the declaration itself and the expression following the in; it is not available anywhere else.

Here is a function that finds an approximation to the square root of a given floating point number.  It is based on the fact that for any positive numbers x and g, the numbers g and x/g lie on opposite sides of sqrt(x).  This is because their product is x.

Turn on Javascript to see the program.

This is example shows a number of things.  First, you can declare local values such as delta and local functions such as goodEnough, improve, and tryGuess.  Notice that "inner" functions, such as improve, can refer to "outer" variables (such as x).  Also notice that later declarations can refer to earlier declarations.  For instance, tryGuess refers to both goodEnough and improve.  Actually, the later declarations are inside the in expressions of the earlier ones.

If you type the squareRoot declaration above into the OCaml top-level, it responds with:

val squareRoot : float -> float = <fun>
indicating that you've declared a variable (squareRoot), that its value is a function (<fun>), and that its type is a function from float to float.  All of the internal structure of the function definition is hidden; all we know from the outside is that its value is a simple function float -> float.  In particular, the function tryGuess is not defined outside of squareRoot:

# tryGuess;;
Characters 0-8:
  tryGuess;;
  ^^^^^^^^
Error: Unbound value tryGuess

After typing in the function, you might try it out on a floating point number such as 9.0:

# squareRoot 9.0;;
- : float = 3.0000000013969839

OCaml has evaluated the expression squareRoot 9.0 and printed its value (3.0000000013969839) and its type (float).

At the moment we have only an imprecise notion of exactly what happens when you type this expression into OCaml.  We will have a more precise understanding soon.

If you try to apply squareRoot to an expression that does not have type float (say an integer or a boolean), then you'll get a type error:

# squareRoot 9;;
Characters 11-12:
  squareRoot 9;;
             ^
Error: This expression has type int but is here used with type float
where carets (^^^) are used to indicate the erroneous expression.

Qualified Identifiers and the Library

Qualified identifiers are of the form x.y where x is a module identifier.  Examples include String.length, List.map, and String.sub. As in Java with packages and classes, in OCaml qualified identifiers allow a set of names to be grouped together in a separate code module.

Evaluation

The OCaml prompt lets you type either a term or a declaration that binds a variable to a term. It evaluates the term to produce a value: a term that does not need any further evaluation. We can define values v as a syntactic class too. For now, we can think of values as just being the same as constants, though we'll see there is much more to them.

Running an OCaml program is just evaluating a term. What happens when we evaluate a term? In an imperative (non-functional) language like Java, we sometimes imagine that there is an idea of a "current statement" that is executing. This isn't a very good model for OCaml; it is better to think of OCaml programs as being evaluated in the same way that you would evaluate a mathematical expression. For example, if you see an expression like (1+2)*3, you know that you first evaluate the subexpression 1+2, getting a new expression 3*3. Then you evaluate 3*3. OCaml evaluation works the same way. As each point in time, the OCaml evaluator takes the left-most expression that is not a value and rewrites (or reduces) it to some simpler expression. Eventually the whole expression is a value and then evaluation stops: the program is done. Or maybe the expression never reduces to a value, in which case you have an infinite loop.

OCaml has a bunch of built-in rules for rewriting terms that go well beyond simple arithmetic. Consider the if expression. It has two important rewrite rules:

if true then e1 else e2   -->  e1
if false then e1 else e2  -->  e2

If the evaluator runs into an if expression, the first thing it does is try to reduce the conditional expression to either true or false. Then it can apply one of the two rules here.

Substitution

The let expression is also evaluated using rewrite rules. It works by first evaluating all of its bindings. Then those bindings are substituted into the body of the let expression (the expression after the in). For example, here is a sequence of evaluation steps using let:

let x = 1+4 in x*3
   --> let x = 5 in x*3
   --> 5*3
   --> 15

Function calls are the most interesting case. When a function is called, OCaml does a similar subsitution: it substitutes the values passed as arguments into the body of the function. Consider evaluating abs(2+1):

abs (2+1)
   --> abs 3
   --> if 3 < 0 then -3 else 3
   --> if false then -3 else 3
   --> 3

This is a simple start on how to think about evaluation; we'll have more to say about evaluation in a couple of lectures.