Lecture 3: Scope, Currying, and Lists

Scope

Variable declarations in OCaml bind variables within a scope, the part of the program where the variable stands for the value it is bound to. For example, when we write let x = e1 in e2, the scope of the identifier x is the expression e2. Within that scope, the identifier x stands for whatever value v the expression e1 evaluated to. Since x = v, OCaml evaluates the let expression by rewriting it to e2, but with the value v substituted for the occurrences of x. For example, the expression let x = 2 in x + 3 is evaluated to 2 + 3, and then arithmetic is used to obtain the result value 5.

Functions also bind variables. When we write a function definition in OCaml, we introduce new variables for the function name and for function arguments. For example, in this expression two variables are bound:

let f x = e1 in e2

The scope of the formal parameter x is exactly the expression e1. The scope of the variable f (which is bound to a function value) is the body of the let, e2.

A let expression can introduce multiple variables at once, as in the following example:

let x = 2
and y = 3
in x + y

Here both x and y have the body of the let as their scope. Even though y is declared after x, the definition of y cannot refer to the variable x—it isn't in scope.

To declare a recursive function, the function must be in scope within its own body. In OCaml, this requires using a let rec instead of a let. With let rec, every variable it declares is in scope within its own definition and within the definitions of all the other variables. To make this work, all the definitions that use these variables must be functions. For example, here is how we could define mutually recursive functions even and odd:

let rec even x = x = 0 || odd (x-1)
    and odd x = not (x = 0 || not (even (x-1)))
in
  odd 3110

There are two variables named x in this example, both of which are in scope only within the respective functions that bind them. However, the variables even and odd are in scope in each other's definitions and within the body of the let.

Qualified identifiers

It is possible to name things defined in a module without using a qualified identifier, using the open expression:

# String.length "hi";;
- : int = 2
# open String;;
# length "bye";;
- : int = 3

There are a number of pre-defined library modules provided by OCaml that are extremely useful.  For instance, the String module provides a number of useful operations on strings, and the List module provides operations on lists. Many useful operations are in the Pervasives module, which is already open by default. To find out more about the OCaml libraries and the operations they provide, see the Objective Caml Reference Manual, part IV.

For example, there is a built-in operation for calculating the absolute value of an integer called Pervasives.abs, which can be called simply as abs.

Take some time to browse through the libraries and find out what they provide.  You shouldn't recode something that's available in the library (unless we ask you to do so explicitly.)

Curried functions

We saw that a function with multiple parameters is really just syntactic sugar for a function that is passed a tuple as an argument. For example,

let plus (x, y) = x + y
is sugar for
let plus (z : int * int) = match z with (x, y) -> x + y
which in turn is sugar for
let plus = fun (z : int * int) -> match z with (x, y) -> x + y
When we apply this function, say to the tuple (2, 3), evaluation proceeds as follows:

plus (2, 3)
= (fun (z : int * int) -> match z with (x, y) -> x + y) (2, 3)
= match (2, 3) with (x, y) -> x + y 
= 2 + 3
= 5

It turns out that OCaml has another way to declare functions with multiple formal arguments, and in fact it is the usual way. The above declaration can be given in curried form as follows:

let plus x y = x + y

or with all the types written explicitly:

let plus (x : int) (y : int) : int = x + y

Notice that there is no comma between the parameters. Similarly, when applying a curried function, we write no comma:

plus 2 3 = 2 + 3 = 5

There is more going on here than it might seem. Recall we said that functions really only have one argument. When we write plus 2 3, the function plus is only being passed one argument, the number 2. We can parenthesize the term as (plus 2) (3), because application is left-associative. In other words, plus 2 must return a function that can be applied to 3 to obtain the result 5. In fact, plus 2 returns a function that adds 2 to its argument.

How does this work? The curried declaration above is syntactic sugar for the creation of a higher-order function. It stands for:

let plus = function (x : int) -> function (y : int) -> x + y

Evaluation of plus 2 3 proceeds as follows:

plus 2 3
= ((function (x : int) -> function (y : int) -> x + y) 2) 3
= (function (y : int) -> 2 + y) 3
= 2 + 3
= 5

So plus is really a function that takes in an int as an argument, and returns a new function of type int -> int. Therefore, the type of plus is int -> (int -> int). We can write this simply as int -> int -> int because the type operator -> is right-associative.

It turns out that we can view binary operators like + as functions, and they are curried just like plus:

# (+);;
- : int -> int -> int = <fun>
# (+) 2 3;;
- : int = 5
# let next = (+) 1;;
val next : int -> int = <fun>
# next 7;;
- : int = 8;

Lists

So far the only real data structures we can build are made of tuples. But tuples don't let us make data structures whose size is not known at compile time. For that we need a new language feature.

One simple data structure that we're used to is singly linked lists. It turns out that OCaml has lists built in. For example, in OCaml the expression [] is an empty list. The expression [1;2;3] is a list containing three integers.

In OCaml, all the elements of a list have to have the same type. For example, a list of integers has the type int list. Similarly, the list ["hi"; "there"; "3110"] would have the type string list. But [1; "hi"] is not a legal list. Lists in OCaml are homogeneous lists, as opposed to heterogeneous lists in which each element can have a different type.

Lists are immutable: you cannot change the elements of a list, unlike an array in Java. Once a list is constructed, it never changes.

Constructing lists

Often we want to make a list out of smaller lists. We can concatenate two lists with the @ operator. For example, [1;2;3] @ [4;5] = [1;2;3;4;5]. However, this operator isn't very fast because it needs to build a copy of the entire first list. (It doesn't make a copy of the second list because the storage of the second list is shared with the storage of the concatenated list.)

More often when building up lists we use the :: operator, which prepends an element to the front of an existing list (“prepend” means “append onto the front”). The expression 1::[2;3] is 1 prepended onto the list [2;3]. This is just the list [1;2;3]. If we use :: on the empty list, it makes a one-element list: 1::[] = [1].

For historical reasons going back to the language Lisp, we usually call the :: operator “cons”.

The fact that lists are immutable is in keeping with OCaml being a functional language. It is also actually useful for making OCaml more efficient, because it means that different list data structures can share parts of their representation in the computer's memory. For example, evaluating h::t only requires allocating space for a single extra list node in the computer's memory. It shares the rest of the list with the existing list t.

Pattern matching on lists

The best way to extract elements from a list is to use pattern matching. The operator :: and the bracket constructor can be used as patterns in a match expression. For example, if we had a list lst and wanted to get the value 0 if lst was empty, 1 if lst had one element, and 2 if lst had 2 or more elements, we could write:

match lst with
    [] -> 0
  | [x] -> 1
  | _ -> 2

Here, x would be bound to the single element of the list if the second match arm were evaluated.

Often, functions that manipulate lists are recursive, because they need to do something to every element. For example, suppose that we wanted to compute the length of a list of strings. We could write a recursive function that accomplishes this (in fact, the library function List.length does just this):

(* Returns the length of lst *)
let rec length (lst : string list) : int =
  match lst with
    [] -> 0
  | h :: t -> 1 + length t

The logic here is that if a list is empty ([]), its length is clearly zero. Otherwise, if it is the appending of an element h onto another list t, its length must be one greater than the length of t.

It's possible to write patterns using the bracket syntax. This is exactly the same as writing a similar pattern using the :: operator. For example, the following patterns are all equivalent: [x;2], x::2::[], x::[2]. These expressions are also all equivalent when used as terms.

Library functions

The OCaml structure List contains many useful functions for manipulating lists. Before using lists, it's worth taking a look. Some of them we'll talk about later in more detail. Two functions that should be used with caution are hd and tl. These functions get the head and tail of a list, respectively. However, they raise an exception if applied to an empty list. They make it easy to forget about the possibility that the list might be empty, creating expected exceptions that crash your program. So it's usually best to avoid them.

List examples

We can use pattern matching to implement other useful functions on lists. Suppose we wanted a function that would extract a list element by its index within the list, with the first element at index zero. We can implement this neatly by doing a pattern match on the list and the integer n at the same time:

(* nth lst n returns the nth element of lst. *)
let rec nth (lst : string list) (n : int) : string =
  match lst with
    h :: t -> if n = 0 then h else nth t (n - 1)
  | [] -> raise Not_found

A Not_found exception is raised if n is less than 0 or greater than or equal to the length of lst.