- Scope and binding
- Curried functions
- OCaml lists

Variable declarations in OCaml **bind** variables within a **scope**, the
part of the program where the variable stands for the value it is bound to.
For example, when we write
`let`

*x* = *e*_{1}
`in`

*e*_{2},
the scope of the identifier *x* is the expression
*e*_{2}. Within
that scope, the identifier *x* stands for whatever value
*v* the expression *e*_{1}
evaluated to.
Since *x* = *v*, OCaml evaluates the
`let`

expression by rewriting it to
*e*_{2}, but with the value
*v* substituted for the occurrences of *x*.
For example, the expression `let x = 2 in x + 3`

is
evaluated to `2 + 3`

, and then arithmetic is used to obtain
the result value `5`

.

Functions also bind variables. When we write a function definition in OCaml, we introduce new variables for the function name and for function arguments. For example, in this expression two variables are bound:

let f x = e_{1}in e_{2}

The scope of the formal parameter x is exactly the expression
e_{1}. The scope of the variable f (which is bound to a function value) is
the body of the let, e_{2}.

A `let`

expression can introduce multiple variables at once,
as in the following example:

let x = 2 and y = 3 in x + y

Here both `x`

and `y`

have the body of the `let`

as their scope. Even though `y`

is declared after `x`

,
the definition of `y`

cannot refer to the variable
`x`

—it isn't in scope.

To declare a recursive function, the function must be in scope within its
own body. In OCaml, this requires using a `let rec`

instead of
a `let`

. With `let rec`

, every variable it declares
is in scope within its own definition and within the definitions of all the
other variables. To make this work, all the definitions that use these variables
must be functions. For example, here is how we could define
**mutually** recursive functions `even`

and `odd`

:

let rec even x = x = 0 || odd (x-1) and odd x = not (x = 0 || not (even (x-1))) in odd 3110

There are two variables named `x`

in this example, both of which
are in scope only within the respective functions that bind them. However,
the variables `even`

and `odd`

are in scope in each
other's definitions and within the body of the `let`

.

It is possible to name things defined in a module without using a
qualified identifier, using the `open`

expression:

# String.length "hi";;- : int = 2# open String;; # length "bye";;- : int = 3

There are a number of
pre-defined library modules provided by OCaml that are extremely useful.
For instance, the `String`

module provides a number of useful operations
on strings, and the `List`

module provides operations on lists. Many useful operations are
in the `Pervasives`

module, which is already open by default.
To find out more about the OCaml libraries and the operations they
provide, see
the Objective Caml Reference Manual, part IV.

For example, there is a built-in operation for calculating the absolute value
of an integer called `Pervasives.abs`

, which can be called simply
as `abs`

.

Take some time to browse through the libraries and find out what they provide. You shouldn't recode something that's available in the library (unless we ask you to do so explicitly.)

We saw that a function with multiple parameters is really just syntactic sugar for a function that is passed a tuple as an argument. For example,

let plus (x, y) = x + yis sugar for

let plus (z : int * int) = match z with (x, y) -> x + ywhich in turn is sugar for

let plus = fun (z : int * int) -> match z with (x, y) -> x + yWhen we apply this function, say to the tuple

`(2, 3)`

,
evaluation proceeds as follows:
plus (2, 3) = (fun (z : int * int) -> match z with (x, y) -> x + y) (2, 3) = match (2, 3) with (x, y) -> x + y = 2 + 3 = 5

It turns out that OCaml has another way to declare functions with multiple
formal arguments, and in fact it is the usual way. The above declaration can be
given in **curried** form as follows:

let plus x y = x + y

or with all the types written explicitly:

let plus (x : int) (y : int) : int = x + y

Notice that there is no comma between the parameters. Similarly, when applying a curried function, we write no comma:

plus 2 3 = 2 + 3 = 5

There is more going on here than it might seem. Recall we said that functions
really only have one argument. When we write `plus 2 3`

, the
function `plus`

is only being passed one argument, the number 2.
We can parenthesize the term as `(plus 2) (3)`

, because application
is left-associative. In other words,
`plus 2`

must return a function that can be applied to 3 to obtain
the result 5. In fact, `plus 2`

returns a function that adds 2 to
its argument.

How does this work? The curried declaration above is syntactic sugar for
the creation of a **higher-order function**. It stands for:

let plus = function (x : int) -> function (y : int) -> x + y

Evaluation of `plus 2 3`

proceeds as follows:

plus 2 3 = ((function (x : int) -> function (y : int) -> x + y) 2) 3 = (function (y : int) -> 2 + y) 3 = 2 + 3 = 5

So `plus`

is really a function that takes in an `int`

as an argument, and returns a new function of type `int -> int`

.
Therefore, the type of `plus`

is `int -> (int -> int)`

.
We can write this simply as `int -> int -> int`

because the
type operator `->`

is right-associative.

It turns out that we can view binary operators like `+`

as
functions, and they are curried just like `plus`

:

# (+);; - : int -> int -> int = <fun> # (+) 2 3;; - : int = 5 # let next = (+) 1;; val next : int -> int = <fun> # next 7;; - : int = 8;

So far the only real data structures we can build are made of tuples. But tuples don't let us make data structures whose size is not known at compile time. For that we need a new language feature.

One simple data structure that we're used to is singly linked lists.
It turns out that OCaml
has lists built in. For example, in OCaml the expression `[]`

is an
empty list. The expression `[1;2;3]`

is a list containing three
integers.

In OCaml, all the elements of a list have to have the same type. For
example, a list of integers has the type `int list`

.
Similarly, the list `["hi"; "there"; "3110"]`

would have the type
`string list`

. But `[1; "hi"]`

is not a legal
list. Lists in OCaml are **homogeneous lists**, as opposed to
**heterogeneous lists** in which each element can have a different type.

Lists are **immutable**: you cannot change the elements of
a list, unlike an array in Java. Once a list is constructed, it never changes.

Often we want to make a list out of smaller lists. We can concatenate
two lists with the `@`

operator. For example, ```
[1;2;3] @
[4;5]
```

= `[1;2;3;4;5]`

. However, this operator
isn't very fast because it needs to build a copy of the entire first
list. (It doesn't make a copy of the second list because the storage of the
second list is shared with the storage of the concatenated list.)

More often when building up lists
we use the `::`

operator, which prepends an
element to the front of an existing list (“prepend” means “append onto the front”).
The expression `1::[2;3]`

is `1`

prepended onto
the list `[2;3]`

. This is just the list `[1;2;3]`

.
If we use `::`

on the empty list, it makes a one-element list:
`1::[]`

= `[1]`

.

For historical reasons going back to the language Lisp, we usually
call the `::`

operator “cons”.

The fact that lists are immutable is in keeping with OCaml being a
functional language. It is also actually useful for making OCaml more efficient,
because it means that different list data structures can share parts of their
representation in the computer's memory. For example, evaluating `h::t`

only requires allocating space for a single extra list node in the computer's
memory. It shares the rest of the list with the existing list `t`

.

The best way to extract elements from a list is to use pattern matching.
The operator `::`

and the bracket constructor can be used as
patterns in a `match`

expression. For example, if we had a list
`lst`

and wanted to
get the value 0 if `lst`

was empty,
1 if `lst`

had one element, and 2 if `lst`

had
2 or more elements, we could write:

match lst with [] -> 0 | [x] -> 1 | _ -> 2

Here, `x`

would be bound to the single element of the list if
the second match arm were evaluated.

Often, functions that manipulate lists are recursive, because they need
to do something to every element. For example, suppose that we wanted to
compute the length of a list of strings. We could write a recursive function
that accomplishes this (in fact, the library function `List.length`

does just this):

(* Returns the length of lst *) let rec length (lst : string list) : int = match lst with [] -> 0 | h :: t -> 1 + length t

The logic here is that if a list is empty (`[]`

),
its length is clearly zero. Otherwise, if it is the appending of an
element h onto another list t, its length must be one greater than the
length of t.

It's possible to write patterns using the bracket syntax. This is
exactly the same as writing a similar pattern using the `::`

operator. For example, the following patterns are all equivalent:
`[x;2]`

, `x::2::[]`

,
`x::[2]`

. These expressions are also all equivalent when
used as terms.

The OCaml structure
`List`

contains many useful functions for
manipulating lists. Before using lists, it's worth taking a look.
Some of them we'll talk about later in more detail. Two functions
that should be used with caution are `hd`

and `tl`

.
These functions get the head and tail of a list, respectively. However,
they raise an exception if applied to an empty list. They make it
easy to forget about the possibility that the list might be empty, creating
expected exceptions that crash your program. So it's usually best to avoid them.

We can use pattern matching to implement other useful functions on lists. Suppose we wanted a function that would extract a list element by its index within the list, with the first element at index zero. We can implement this neatly by doing a pattern match on the list and the integer n at the same time:

(* nth lst n returns the nth element of lst. *) let rec nth (lst : string list) (n : int) : string = match lst with h :: t -> if n = 0 then h else nth t (n - 1) | [] -> raise Not_found

A `Not_found`

exception is raised if `n`

is
less than 0 or greater than or equal to the length of `lst`

.