Recitation 1: Introduction to OCaml Syntax

We will use the Objective Caml (OCaml) programming language this semester.  OCaml is a functional language rather than an imperative language; the key difference between these two classes of languages is the execution model---the way in which programs are executed. Imperative (or procedural) languages, such as C and Java, are based on commands that change the state of the machine on which they execute.  For example, the assignment statement (in both C and Java) explicitly changes the value stored in some memory location.  Functional languages, in contrast, are based on evaluating expressions to produce values.  OCaml provides good ways to build correct, understandable, and large expressions.  The focus of this lecture is understanding expressions.

For real applications, programming with expressions may seem like a stretch. After all, applications often do not "compute" anything; instead, they allow you to "do" something. Later in the course, we will introduce the notion of side-effects: evaluate this, and do that while you're at it—e.g., evaluate an expression, and (as a side-effect) display a window on the screen. But for the moment, we will live in a pure world, one without side-effects.

To start learning OCaml, you should begin playing with the OCaml language by writing toy expressions.  Like most programming languages, Ocaml has a compiler that can be run on source files to produce object files.  But to really understand expressions, there is a better way to interact with the OCaml compiler, called the OCaml toplevel system.  This system is something like a very fancy calculator---you interact with it by typing in an expression, and it responds by evaluating that expression and telling you the resulting value. 


IMPORTANT NOTE: In previous semesters, this course has been taught in SML, a language very similar to OCaml. They share both a common origin and almost all of their syntax. For aid in understanding notes from previous years, you might wish to consult the SML vs. OCaml page, which provides a detailed list of the differences between the two.

If you notice any SML/OCaml discrepancies in the notes, please let us know.


Starting the OCaml toplevel system.  The course resource page has instructions for installing OCaml. Once you have OCaml properly installed, you will be able to start an interactive session by typing ocaml at the terminal (i.e., a Unix shell or Windows console). (The 3110 VM has an enhanced version of ocaml called utop installed.) You will get a top level prompt "#", which indicates that the compiler is ready to accept expressions and evaluate them.  To exit the toplevel system, enter the end-of-file character for your platform (Ctrl-D on Unix, Ctrl-Z + Return on Windows).

Using the OCaml toplevel system.  Type in an expression (possibly on multiple lines, which after the first line will have a secondary prompt "="), then type ;; and press Return. Note that the ;; is not part of the expression to be evaluated. It is simply an indication to the compiler that you have finished typing in the expression and that you are ready to evaluate. Before evaluating the expression, the compiler will type check it.

TIP: The toplevel system is very picky about some things when reading input, and will accept a request to exit only if it is sitting exactly at the top level prompt—i.e. if nothing else has been already typed in. When in doubt (especially if input seems to be behaving strangely), press Ctrl-C to interrupt and reset the prompt to a sane state.

TIP: It can be useful to put expressions into a file so that you're not stuck entering them over and over again at the prompt. Just use an editor to edit a file, and you can load it into OCaml with the operation #use "file";; which behaves as though the expressions have been entered at the top level prompt (don't forget the semicolons!). The big question is where should the file be stored. Operation #use by default looks for files in the current working directory.  To change the current working directory, use the operation #cd "path";; where path is the path where you want to go to. To add a directory to the list of directories that OCaml looks for files in, use the operation #directory "path";;. Use the Unix convention (even on Windows): the path separator is "/". 

TIP: Even more details on the toplevel system can be found at the OCaml manual page on it.


Basic Expressions and Types

Expressions evaluate to values. Values can be classified according to their type:

Type Example values
int 0,1,2,-1,-2
bool true, false
float 3.141592, 1.618034
string "Hello world!","xyzzy"
char 'A','Z'

Let us start looking at simple expressions, with the following syntax: 

e ::= c  |  unop e  |  e1 binop e2  |  if e1 then e2 else e3  |  (e)

where c are constants (the values described above), unop are unary operations, binop are binary operations.  We expressed this syntax using Backus-Naur Form (BNF), a common notation for the grammar of computer languages.  NOTE TO INSTRUCTOR: The BNF for expressions, and later for declarations, should be put on one side of the board, and kept there as we will be adding expression types and declaration types as the section proceeds.

Unary operators include:

Operator Meaning
- take an int (or a float) and return its negation
not take a bool and return its negation

Binary operators include:

Operator Meaning
+,-,*, / take two ints and return their sum, difference, product, or quotient
+.,-.,*., /. take two floats and return their sum, difference, product, or quotient
mod take two ints and return their integer remainder
>,>=,<,<= take two ints (or two floats) and return their comparison
= take two values and return whether they are equal
^ take two strings and return their concatenation into a new string

Expressions are transformed into values by the application of evaluation rules.  The evaluation rules for simple expressions are:

Evaluation of operators only makes sense if the types of the operands agree.  For example, the + operator is defined if both operands are integers, but adding an integer to a boolean is meaningless.  So type checking is performed to ensure that expressions are meaningful, and this requires giving a type to every expression.  How does OCaml determine the type of an expression?

Every constant has a type (42 has type int, true has type bool, etc).  Operators also have types (which we gave informally above); in OCaml syntax, these types are as follows:

 -     :  int -> int (* or *) float -> float                  
 not   :  bool -> bool
 +     :  int -> int -> int
 +.    :  float -> float -> float
 >     :  int -> int -> bool (* or *) float -> float -> bool
 ^     :  string -> string -> string

Every operator specifies the types of values it takes and the type of the value it returns. The addition operator + takes an int, another int, then returns an int.

Now we can give rules for determining the types of expressions.  The principle embodied in these rules is:

The type of an expression is always the type of its result.

If the conditions in these rules are not satisfied by an expression, then it is not possible to give a type to that expression.  It does not type check, and the compiler rejects it as a program. This prevents the evaluation of meaningless expressions.  It is important to do so, because such programming errors could otherwise cause run-time errors.

Declarations

Values can be given names. This is not a form of expression, but rather an indication to the compiler that you are declaring a new name.

d ::= let id = e

For example, let pi = 3.141592 is a declaration.  A declaration indicates to the compiler to evaluate expression e, produce a value, and bind that value to the name id. In subsequent expressions, id can be used to refer to that value. We therefore extend our syntax for expressions with:

e ::= ...  |  id

Evaluating id means to lookup what value it is bound to. The type of id is the type of the value which it is bound to.

Declarations last until another declaration of the same name is encountered (which shadows—that is, replaces—the previous declaration). We can also introduce local declarations that last only for the evaluation of an expression.

e ::= ...  |  let d in e

To evaluate a let expression, declaration d is evaluated, then e is evaluated, taking the binding from the declaration into account.

QUESTION: what is the type of let d in e ?  Answer:  The type of e, since the type of an expression is the type of the result.

At this point, we can write large expressions, but we can't easily reuse expressions. Thus, we introduce functions. A function declaration is a new form of declaration:

d ::= ...  |  let id ((x1:t1), ..., (xn:tn)):t = e

In this new form of declaration, e is called the body of function id.  For example, let square (x:int) : int = x * x is a function definition, and its body is x * x.  Functions have types that are similar to the types of operators.  For example, the type of square is int -> int.

Here's an example of a function with two arguments:

let max2 ((a:int),(b:int)):int = 
  if a>b then a else b

Its type is (int * int) -> int. The argument types are separated by *. Note that, in this syntax, * does not mean integer multiplication.

TIP: It is not strictly necessary to annotate functions with types. Most of the time, OCaml can infer the type of a function from its definition. For example, we could rewrite max2 as follows:

let max2 (a,b) = 
  if a>b then a else b

Note that we don't have to write the inner parentheses around the arguments, since we are leaving out their type annotations. The type that OCaml infers for max2 now becomes rather mysterious: 'a * 'a -> 'a. For now, think of the 'a as meaning "any type". Since int is an example of any type, we can still use max2 on ints.

PITFALL: As you will quickly notice, it can be extremely hard to debug a program that does not typecheck.  It is very difficult to figure out why the compiler inferred this type for that expression.  Solution:  use type annotations to catch type errors!

ÜBER-TIP: It is very good practice always to annotate functions with types.  We will generally require you to do this in programming assignments.

We could even have a function with three arguments:

let max3 ((a:int),(b:int),(c:int)):int = 
  max2(max2(a,b),c)

Its type is (int * int * int) -> int.

You might wonder why the types of built-in operators that take two arguments, such as +, don't look quite the same as the type of max2. We'll discuss the reason for this later (about two lectures from now).

For recursive functions, it is necessary to use let rec:

d ::= ...  |  let rec id ((x1:t1), ..., (xn:tn)):t = e

This is the same as normal function declarations, but any uses of id inside the function body will be bound to the value of the function itself.

How do we use functions? We introduce a new expression called function application:

e ::= ...  |  id (e1,...,em)

How do we type check a function application f (e1,...,em)?  As follows:

  1. If f has type (t1 * ... * tn) -> t, and
  2. If the number of arguments supplied to the function is the same as the number of arguments the function expects, meaning m equals n, and
  3. If e1 has type t1, and e2 has type t2, and ... en has type tn,
  4. Then f (e1,...,em) has type t.

So square (10) type checks, but square (true) does not.

How is function application f (e1,...,em) evaluated? Evaluate e1,...,em to values v1,...,vm, then evaluate the body of f with x1 bound to v1, ..., xm bound to vm.  So square (10+10) --> square (20) --> 20*20 --> 400. We'll see many more details on evaluation in a week. For now, we just want to give you an intuition about the simple cases.

Function bindings can be used within a let expression to obtain a local function (like Pascal, unlike C).  For example:

let fourth (y:int) : int =
  let
    square (x:int):int = x * x
  in
    square (square (y))