Functions, and Testing in the Toplevel

In the previous lab, you learned to use the OCaml toplevel and a text editor. Today, we'll combine those to write and test some functions.

Reminder about lab guidelines

Recall from the lab guidlines that one and two star exercises should be worked when you come to them, and if you get stuck for more than a little while, you should ask about the problem in recitation. Three star exercises can be skipped at first then revisited when the rest of the lab is finished. Four and five star exercises are intended to provide additional challenge for those students who want it.

Testing code in the toplevel

In addition to allowing you to define functions, the toplevel will also accept directives that are not OCaml code but rather tell the toplevel itself to do something. All directives begin with the # character. Perhaps the most common directive is #use, which loads all the code from a file into the toplevel, just as if you had typed the code from that file into the toplevel.

Exercise: use [✭✭]

Create a file named lab02.ml. In that file put the following code:

let inc x = x + 1

Start the toplevel. Try entering the following expression, and observe the error:

# inc 3;;
Error: Unbound value inc
Hint: Did you mean incr?

The error is because the toplevel does not yet know anything about a function named inc. Now issue the following directive to the toplevel:

# #use "lab02.ml";;

Note that the first # character above indicates the toplevel prompt to you. The second # character is one that you type to tell the toplevel that you are issuing a directive. Without that character, the toplevel would think that you are trying to apply a function named use.

Now try again:

# inc 3;;
- : int = 4

Test the inc function on a couple more integers of your choice.

Finally, exit the toplevel.

□

The final step in the exercise above ("exit the toplevel") is exceedingly important. The best workflow when using the toplevel to test functions is:

Edit some code in a file.
Load the code in the toplevel with #use.
Interactively test the code.
Exit the toplevel.

Suppose you wanted to fix a bug in your code: it's tempting to not exit the toplevel, edit the file, and re-issue the #use directive into the same toplevel session. Resist that temptation. The "stale code" that was loaded from an earlier #use directive in the same session can cause surprising things to happen—surprising when you're first learning the language, anyway. So always exit the toplevel before re-using a file.

Recursive functions

In lecture, we saw that there are two ways of defining functions. Non-recursive functions are defined like this:

let f x = ...

Recursive functions are defined like this:

let rec f x = ...

The difference is just the rec keyword.

Exercise: fact [✭✭]

In your file lab02.ml, try adding the following code to define the factorial function:

let fact n = if n=0 then 1 else n * fact (n-1)

Save the file, start a new toplevel session, and use the file. You'll get an error:

# #use "lab02.ml";;
Error: Unbound value fact

The problem is that we omitted the rec keyword. Without it, fact is not in scope in its own body, so it can't be called recursively.

Fix this error by adding the rec keyword. Don't forget to restart the toplevel before re-using the file. Interactively test your factorial function on a few integers.

Note that, as in many languages, OCaml integers are not the "mathematical" integers but are limited to a fixed number of bits. The manual specifies that integers are at least 30 bits but might be wider. So if you test on large enough inputs, you might begin to see strange results. The problem is machine arithmetic, not OCaml. Also if you test on integers less than 0, e.g. fact (-1), you'll get an error. The problem there is that our function assumes that its input is at least zero. We should really document a precondition for our function!

□

Exercise: fib [✭✭]

Define a recursive function fib : int -> int, such that fib n is the nth number in the Fibonacci sequence, which is 1, 1, 2, 3, 5, 8, 13, ... That is:

fib 1 = 1,
fib 2 = 1, and
fib n = fib (n-1) + fib (n-2) for any n > 2.

Test your function in the toplevel.

□

Exercise: fib fast [✭✭✭]

How quickly does your implementation of fib compute the 50th Fibonacci number? If it computes nearly instantaneously, congratulations! But the recursive solution most people come up with at first will seem to hang indefinitely. The problem is that the obvious solution computes subproblems repeatedly. For example, computing fib 5 requires computing both fib 3 and fib 4, and if those are computed separately, a lot of work (an exponential amount, in fact) is being redone.

Create a function fib_fast that requires only a linear amount of work. Hint: write a recursive helper function h : int -> int -> int -> int, where h n pp p is defined as follows:

h 1 pp p = p, and
h n pp p = h (n-1) p (pp+p) for any n > 1.

The idea of h is that it assumes the previous two Fibonacci numbers were pp and p, then computes forward n more numbers. Hence, fib n = h n 0 1 for any n > 0.

What is the first value of n for which fib_fast n is negative, indicating that integer overflow occurred?

□

Polymorphic functions

The simplest function is the identity function fun x -> x. What is its type? We can enter it into the toplevel to find out:

# fun x -> x;;
- : 'a -> 'a = <fun>

(You will note that the syntax highlighting above gets wonky. The Google Code Prettifier, which we use on this website, unfortunately does the wrong thing with single quotes here: it treats them as surrounding a character literal, even though they are not. If there are any wizard web programmers in the course, feel free to talk to Prof. Clarkson about how to fix this problem.)

The 'a is a type variable: it stands for an unknown type, just like a regular variable stands for an unknown value. Type variables always begin with a single quote. Commonly used type variables include 'a, 'b, and 'c, which OCaml programmers typically pronounce in Greek: alpha, beta, and gamma.

Exercise: id [✭]

Add the identity function to your lab02.ml file:

let id x = x

Try testing it on several types of values—an integer, a string, a Boolean, etc.

□

Because you can apply that function to many types of values, it is a polymorphic function. (poly = many, morph = form)

Exercise: poly types [✭✭✭]

What is the type of each of the functions below? You can ask the toplevel to check your answers.

let f x = if x then x else x
let g x y = if y then x else x
let h x y z = if x then y else z
let i x y z = if x then y else y

□

Labeled and optional arguments

Usually the type and name of a function give you a pretty good idea of what the arguments should be. However, for functions with many arguments (especially arguments of the same type), it can be useful to label them. For example, you might guess that the function String.sub returns a substring of the given string (and you would be correct). You could type in String.sub to find its type:

# String.sub;;
- : string -> int -> int -> string

But it's not clear from the type how to use it—you're forced to consult the documentation.

OCaml supports labeled arguments to functions. You can declare this kind of function using the following syntax:

# let f ~name1:arg1 ~name2:arg2 = arg1 + arg2;;
val f : name1:int -> name2:int -> int = <fun>

This function can be called by passing the labeled arguments in either order:

f ~name2:3 ~name1:4;;

Labels for arguments are often the same as the variable names for them. OCaml provides a shorthand for this case. The following are equivalent:

let f ~name1:name1 ~name2:name2 = name1+name2
let f ~name1 ~name2 = name1 + name2

Use of labeled arguments is a largely matter of taste: they convey extra information, but they can also add clutter to types.

If you need to write both a labeled argument and an explicit type annotation for it, here's the syntax for doing so:

let f ~name1:(arg1:int) ~name2:(arg2:int) = arg1 + arg2

Exercise: divide [✭✭]

Write a function divide : numerator:float -> denominator:float -> float. Apply your function.

□

It is also possible to make some arguments optional: when called without the argument, a default value will be provided. To declare such a function, use the following syntax:

# let f ?name:(arg1=8) arg2 = arg1 + arg2;;
val f : ?name:int -> int -> int = <fun>

You can then call a function with or without the argument:

# f ~name:2 7;;
- : int = 9

# f 7;;
- : int = 15

Partial application

We could define an addition function as follows:

let add x y = x + y

Try entering this rather similar function into the toplevel:

let addx x = fun y -> x + y

Function addx takes an integer x as input, and returns a function of type int -> int that will add x to whatever is passed to it.

Exercise: addx [✭✭]

What is the type of addx? Hint: what does the toplevel say?
If you define let add5 = addx 5, what is the type of add5?
What is the result of add5 2?
If you define let add3 = addx 3, what is the type of add3?
What is the result of add3 8?

□

What you just did in defining add3 is called partial application: you partially applied the function add to one argument, even though you normally would think of it as a multi-argument function. Why does this work? It's because the following three functions are syntactically different but semantically equivalent. That is, they are different ways of expressing the same computation:

let add x y = x+y
let add x = fun y -> x+y
let add = fun x -> (fun y -> x+y)

So add is really a function that takes an argument x and returns a function (fun y -> x+y). Which leads us to a deep truth...

Function associativity

Are you ready for the truth? Here goes...

Every OCaml function takes exactly one argument.

Why? Consider add: although we can write it as let add x y = x + y, we know that's semantically equivalent to let add = fun x -> (fun y -> x+y). And in general,

let f x1 x2 ... xn = e

is semantically equivalent to

let f =
  fun x1 ->
    (fun x2 ->
       (...
          (fun xn -> e)...))

So even though you think of f as a function that takes n arguments, in reality it is a function that takes 1 argument, and returns a function.

And the type of such a function

t1 -> t2 -> t3 -> t4

really means the same as

t1 -> (t2 -> (t3 -> t4)))

That is, function types are right associative: there are implicit parentheses around function types, from right to left. The intuition here is that a function takes a single argument and returns a new function that expects the remaining arguments.

Function application, on the other hand, is left associative: there are implicit parenthesis around function applications, from left to right. So

e1 e2 e3 e4

really means the same as

((e1 e2) e3) e4

The intuition here is that the left-most expression grabs the next expression to its right as its single argument.

Exercise: associativity [✭✭]

Which of the following produces an integer, which produces a function, and which produces an error? Decide on an answer, then check your answer in the toplevel.

add 5 1
add 5
(add 5) 1
add (5 1)

□

Operators

The addition operator + has type int->int->int. It is normally written infix, e.g., 3+4. By putting parentheses around it, we can make it a prefix operator: (+) 3 4. The same technique works for any built-in operator. But be careful of multiplication, which must be written ( * ) as infix, because (*) would be parsed as beginning a comment.

We can even define our own new infix operators, as follows:

let (^^) x y = max x y;;
2 ^^ 3 (* is 3 *)

Exercise: average [✭✭]

Define an infix operator $$ to compute the average of two floating-point numbers. For example,

1.0 $$ 2.0 = 1.5
0. $$ 0. = 0.

□