# Functions
* * *
*
Topics:
* five essential components to learning a language
* if expressions
* function definitions
* anonymous functions
* function application
* the pipeline operator
*
* * *
## Learning a language
One of the secondary goals of this course is not just for you to learn a new programming
language, but to improve your skills at learning languages in general—that
is, to learn *how to learn* new languages.
There are five essential components to learning a language:
* **Syntax**:
By *syntax*, we mean the rules that define what constitutes a textually
well-formed program in the language, including the keywords,
restrictions on whitespace and formatting, punctuation, operators, etc.
One of the more annoying aspects of learning a new language can be that
the syntax feels odd compared to languages you already know. But the
more languages you learn, the more you'll become used to accepting the
syntax of the language for what it is, rather than wishing it were
different. (If you want to see some languages with really unusual
syntax, take a look at [APL][tryapl], which needs its own extended
keyboard, and [Whitespace][whitespace], in which programs consist
entirely of spaces, tabs, and newlines.) You need to understand
syntax just to be able to speak to the computer at all.
* **Semantics**:
By *semantics*, we mean the rules that define the behavior of programs.
In other words, semantics is about the meaning of a program—what
computation a particular piece of syntax represents. There are two
pieces to semantics, the *dynamic* semantics of a language and the
*static* semantics of a language. The dynamic semantics define the
run-time behavior of a program as it is executed or evaluated. The
static semantics define the compile-time checking that is done to ensure
that a program is legal, beyond any syntactic requirements. The most
important kind of static semantics is probably *type checking*: the
rules that define whether a program is well typed or not. Learning
the semantics of a new language is usually the real challenge, even though
the syntax might be the first hurdle you have to overcome. You need
to understand semantics to say what you mean to the computer, and you
need to say what you mean so that your program performs the right computation.
* **Idioms**:
By *idioms*, we mean the common approaches to using language features to
express computations. Given that you might express one computation in
many ways inside a language, which one do you choose? Some will be more
natural than others. Programmers who are fluent in the language will
prefer certain modes of expression over others. We could think of this
in terms of using the dominant paradigms, whether they are imperative,
functional, object oriented, etc., in the language effectively. You need
to understand idioms to say what you mean not just to the computer, but
to other programmers. When you write code idiomatically, other
programmers will understand your code better.
* **Libraries**:
*Libraries* are bundles of code that have already been written for you
and can make you a more productive programmer, since you won't have to
write the code yourself. (It's been said that [laziness is a virtue for
a programmer][lazy].) Part of learning a new language is discovering
what libraries are available and how to make use of them. A language
usually provides a *standard library* that gives you access to a core
set of functionality, much of which you would be unable to code up in
the language yourself, such as file I/O.
* **Tools**:
At the very least any language implementation provides either a compiler
or interpreter as a tool for interacting with the computer using the
language. But there are other kinds of tools: debuggers; integrated
development environments (IDE); and analysis tools for things like
performance, memory usage, and correctness. Learning to use tools
that are associated with a language can also make you a more productive
programmer. Sometimes it's easy to confuse the tool itself for
the language; if you've only ever used Eclipse and Java together
for example, it might not be apparent that Eclipse is an IDE that
works with many languages, and that Java can be used without Eclipse.
[tryapl]: http://tryapl.org/
[whitespace]: http://compsoc.dur.ac.uk/whitespace/tutorial.html
[lazy]: http://threevirtues.com/
When it comes to learning OCaml in this class, our focus is primarily
on semantics and idioms. We'll have to learn syntax along the way,
of course, but it's not the interesting part of our studies. We'll
get some exposure to the OCaml standard library and a couple other
libraries, notably OUnit (a unit testing framework similar to
JUnit, HUnit, etc.). Besides the OCaml compiler and build system,
the main tool we'll use is the toplevel, which provides the ability
interactively experiment with code. There are some tools that
attempt to provide a full-featured [IDE for OCaml][ide], even
inside [Eclipse][ocaide], but we won't be digging into those in this course.
[ide]: https://opam.ocaml.org/blog/turn-your-editor-into-an-ocaml-ide/
[ocaide]: http://www.algo-prog.info/ocaide/
## Expressions
The primary piece of OCaml syntax is the *expression*. Just like
programs in imperative languages are primarily built out of *commands*,
programs in functional languages are primarily built out of expressions.
Examples of the kinds of expressions that you saw in the first recitation
include `2+2`, `if 3+5 > 2 then "yay!" else "boo!"`, and `increment 21`.
The OCaml manual has a complete definition of [all the expressions in
the language][exprs]. Though that page starts with a rather cryptic
overview, if you scroll down, you'll come to some English explanations.
Don't worry about studying that page now; just know that it's
available for reference.
[exprs]: http://caml.inria.fr/pub/docs/manual-ocaml/expr.html
The primary task of computation in a functional language is to
*evaluate* an expression to a *value*. A value is an expression for
which there is no computation remaining to be performed. So, all values
are expressions, but not all expressions are values. Examples of values
include `2`, `true`, and `"yay!"`.
The OCaml manual also has a definition of [all the values][values], though again,
that page is mostly useful for reference rather than study.
[values]: http://caml.inria.fr/pub/docs/manual-ocaml/values.html
Sometimes an expression might fail to evaluate to a value. There are two
reasons that might happen:
1. Evaluation of the expression raises an exception.
2. Evaluation of the expression never terminates (e.g., it enters an "infinite loop").
## If expressions
You learned about if expressions in the previous lab. Now let's study their
syntax and semantics.
**Syntax.** The syntax of an if expression:
```
if e1 then e2 else e3
```
The letter `e` is used here to represent any other OCaml expression; it's an
example of a *syntactic variable* aka *metavariable*, which is not actually
a variable in the OCaml language itself, but instead a name for a certain
syntactic construct. The numbers after the letter `e` are being used
to distinguish the three different occurrences of it.
**Dynamic semantics.**
The dynamic semantics of an if expression:
* if `e1` evaluates to `true`, and if `e2` evaluates to a value `v`,
then `if e1 then e2 else e3` evaluates to `v`
* if `e1` evaluates to `false`, and if `e3` evaluates to a value `v`,
then `if e1 then e2 else e3` evaluates to `v`.
We call these *evaluation rules*: they define how to evaluate expressions.
Note how it takes two rules to describe the evaluation of an if expression,
one for when the guard is true, and one for when the guard is false.
The letter `v` is used here to represent any OCaml value; it's another
example of a metavariable. Later in the semester we will develop
a more mathematical way of expressing dynamic semantics, but for now
we'll stick with this more informal style of explanation.
**Static semantics.** The static semantics of an if expression:
* if `e1` has type `bool` and `e2` has type `t` and `e3` has type `t`
then `if e1 then e2 else e3` has type `t`
We call this a *typing rule*: it describes how to type check an expression.
Note how it only takes one rule to describe the type checking of an if expression.
At compile time, when type checking is done, it makes no difference whether the
guard is true or false; in fact, there's no way for the compiler to know
what value the guard will have at run time. The letter `t` here is used
to represent any OCaml type; the OCaml manual also has definition of
[all types][types] (which curiously does not name
the base types of the language like `int` and `bool`).
[types]: http://caml.inria.fr/pub/docs/manual-ocaml/types.html
We're going to be write "has type" a lot, so let's introduce a more compact
notation for it. Whenever we would write "`e` has type `t`", let's instead
write `e : t`. The colon is pronounced "has type". This usage of colon
is consistent with how the toplevel responds after it evaluates an expression
that you enter:
```
# let x = 42;;
val x : int = 42
```
In the above example, variable `x` has type `int`, which is what the colon indicates.
## Function definitions
The last example above, `let x = 42`, has an expression in it (`42`)
but is not itself an expression. Rather, it is a *definition*.
Definitions bind values to names, in this case the value `42` being
bound to the name `x`. The OCaml manual has definition of
[all definitions][definitions]
(see the third major grouping titled "*definition*" on that page), but again
that manual page is primarily for reference not for study.
Definitions are not expressions, nor are expressions definitions—
they are distinct syntactic classes. But definitions can have expressions
nested inside them, and vice-versa.
[definitions]: http://caml.inria.fr/pub/docs/manual-ocaml/modules.html
We will return to the topic of definitions in general in the next lecture.
For now, let's focus on one particular kind of definition, a *function definition*.
You got some practice with these in recitation last time. Now let's study
their syntax and semantics.
First, here's an example of a function definition:
```
(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow x y =
if y=0 then 1
else x * pow x (y-1)
```
We provided a specification comment above the function to document the
precondition (`requires`) and postcondition (`returns`) of the function.
Note how we didn't have to write any types: the OCaml compiler infers
them for us automatically. We'll study *how* the compiler does that
later in the semester. For now, observe that it's like a mystery that
can be solved by our mental power of deduction:
* Since the if expression can return `1` in the `then`
branch, we know by the typing rule for `if` that the entire if expression
has type `int`.
* Since the if expression has type `int`, the function's return type must
be `int`.
* Since `y` is compared to `0` with the equality operator, `y` must be an `int`.
* Since `x` is multiplied with another expression using the `*` operator,
`x` must be an `int`.
If we did want to write down the types for some reason, we could do that:
```
(* requires: y>=0 *)
(* returns: x to the power of y *)
let rec pow (x:int) (y:int) : int =
if y=0 then 1
else x * pow x (y-1)
```
When we write the *type annotations* for `x` and `y` the parentheses are
mandatory. We will generally leave out these annotations, because
it's simpler to let the compiler infer them. There are other times when you'll
want to explicitly write down types though. One particularly useful time
is when you get a type error from the compiler that doesn't make sense.
Explicitly annotating the types can help with debugging such an error message.
**Syntax.**
The syntax for function definitions:
```
let rec f x1 x2 ... xn = e
```
The `f` is a metavariable indicating an identifier being used as a function
name. These identifiers must begin with a lowercase letter. The remaining
[rules for lowercase identifier][lowercase] can be found in the manual.
The names `x1` through `xn` are metavariables indicating argument identifiers.
These follow the same rules as function identifiers. The keyword `rec`
is required if `f` is to be a recursive function; otherwise it may be omitted.
[lowercase]: http://caml.inria.fr/pub/docs/manual-ocaml/lex.html#lowercase-ident
Note that syntax for function definitions is actually simplified compared
to what OCaml really allows. We will learn more about some augmented
syntax for function definition in the next couples weeks. But for now,
this simplified version will help us focus.
Mutually recursive functions can be defined with the `and` keyword:
```
let rec f x1 ... xn = e1
and g y1 ... yn = e2
```
For example:
```
(* [even n] is whether [n] is even.
* requires: [n >= 0] *)
let rec even n =
n=0 || odd (n-1)
(* [odd n] is whether [n] is odd.
* requires: [n >= 0] *)
and odd n =
n<>0 && even (n-1);;
```
The syntax for function types:
```
t -> u
t1 -> t2 -> u
t1 -> ... -> tn -> u
```
The `t` and `u` are metavariables indicating types. Type `t -> u` is the
type of a function that takes an input of type `t` and returns an output
of type `u`. We can think of `t1 -> t2 -> u` as the type of a function
that takes two inputs, the first of type `t1` and the second of type
`t2`, and returns an output of type `u`. Likewise for a function that
takes `n` arguments.
**Dynamic semantics.**
There is no dynamic semantics of function definitions. There is nothing
to be evaluated. OCaml just records that the name `f` is bound to a function
with the given arguments `x1..xn` and the given body `e`. Only later, when
the function is applied, will there be some evaluation to do.
**Static semantics.**
The static semantics of function definitions:
* For non-recursive functions: if by assuming that
`x1:t1` and `x2:t2` and ... and `xn:tn`, we can conclude that `e:u`,
then `f : t1 -> t2 -> ... -> tn -> u`.
* For recursive functions: if by assuming that
`x1:t1` and `x2:t2` and ... and `xn:tn` and
`f : t1 -> t2 -> ... -> tn -> u`, we can conclude that `e:u`,
then `f : t1 -> t2 -> ... -> tn -> u`.
Note how the type checking rule for recursive functions assumes that the
function identifier `f` has a particular type, then checks to see whether
the body of the function is well-typed under that assumption. This is
because `f` is in scope inside the function body itself (just like the arguments
are in scope).
## Anonymous functions
We already know that we can have values that are not bound to names.
The integer `42`, for example, can be entered at the toplevel without
giving it a name:
```
# 42;;
- : int = 42
```
Or we can bind it to a name:
```
# let x = 42;;
val x : int = 42
```
Similarly, OCaml functions do not have to have names; they may be
*anonymous*. For example, here is an anonymous function that increments
its input: `fun x -> x+1`. Here, `fun` is a keyword indicating an
anonymous function, `x` is the argument, and `->` separates the argument
from the body.
We now have two ways we could write an increment function:
```
let inc x = x + 1
let inc = fun x -> x+1
```
They are syntactically different but semantically equivalent. That is,
even though they involve different keywords and put some identifiers
in different places, they mean the same thing.
Anonymous functions are also called *lambda expressions*, a term that
comes out of the *lambda calculus*, which is a mathematical model
of computation in the same sense that Turing machines are a model
of computation. In the lambda calculus, `fun x -> e` would
be written \\(\lambda x . e\\). The \\(\lambda\\) denotes
an anonymous function.
It might seem a little mysterious right now why we would want functions
that have no names. Don't worry; we'll see good uses for them later
in the course. In particular, we will often create anonymous functions
and pass them as input to other functions.
**Syntax.**
```
fun x1 ... xn -> e
```
**Static semantics.**
* If by assuming that
`x1:t1` and `x2:t2` and ... and `xn:tn`, we can conclude that `e:u`,
then `fun x1 ... xn -> e : t1 -> t2 -> ... -> tn -> u`.
**Dynamic semantics.**
An anonymous function is already a value. There is no computation
to be performed.
## Function application
Today we cover a somewhat simplified syntax of function application
compared to what OCaml actually allows.
**Syntax.**
```
e0 e1 e2 ... en
```
The first expression `e0` is the function, and it is applied to
arguments `e1` through `en`. Note that parentheses are not required
around the arguments to indicate function application, as they are in
languages in the C family, including Java.
**Static semantics.**
* If `e0 : t1 -> ... -> tn -> u` and `e1:t1` and ... and `en:tn`
then `e0 e1 ... en : u`.
**Dynamic semantics.**
To evaluate `e0 e1 ... en`:
1. Evaluate the argument expressions `e1` through `en` to values `v1` through `vn`.
2. Evaluate `e0` to a function. That might be an anonymous function
`fun x1 ... xn -> e`. Or it might be that `f` is a name, and we have
to find the definition of `f`, in which case let's assume that
definition is `let rec f x1 ... xn = e`. Either way, we now know
the argument names `x1` through `xn` and the body `e`.
3. Substitute each value `vi` for the corresponding argument name `xi` in the
body `e` of the function. That results in a new expression `e'`.
4. Evaluate `e'` to a value `v`, which is the result of evaluating `e0 e1 ... en`.
## Pipeline
There is a built-in infix operator in OCaml for function application that
is written `|>`. Imagine that as depicting a triangle pointing to the
right. It's called the *pipeline* operator, and the metaphor is that
values are sent through the pipeline from left to right. For example,
suppose we have the increment function `inc` from above as well as
a function `square` that squares its input. Here are two equivalent
ways of writing the same computation:
```
square (inc 5)
5 |> inc |> square
(* both yield 36 *)
```
The latter way of writing the computation uses the pipeline operator to
send `5` through the `inc` function, then send the result of that
through the `square` function. This is a nice, idiomatic way of
expressing the computation in OCaml. The former way is ok but arguably
not as elegant, because it involves writing extra parentheses and
requires the reader's eyes to jump around, rather than move linearly
from left to right. The latter way scales up nicely when the number
of functions being applied grows, where as the former way requires
more and more parentheses:
```
5 |> inc |> square |> inc |> inc |> square
square (inc (inc (square (inc 5))))
(* both yield 1444 *)
```
It might feel weird at first, but try using the pipeline operator
in your own code the next time you find yourself writing a big
chain of function applications.
Since `e1 |> e2` is just another way of writing `e1 e2`, we don't need
to state the semantics for `|>`: it's just the same as function application.
These two programs are another example of expressions
that are syntactically different but semantically equivalent.
## Summary
Syntax and semantics are a powerful paradigm for learning a programming
language. As we learn the features of OCaml, we're being careful to write
down their syntax and semantics. We've seen that there can be multiple
syntaxes for expressing the same semantic idea, that is, the same computation.
The semantics of function application is the very heart of OCaml and of
functional programming, and it's something we will come back to several
times throughout the course to deepen our understanding.
## Terms and concepts
* anonymous functions
* definitions
* dynamic semantics
* evaluation
* expressions
* function application
* function definitions
* identifiers
* idioms
* if expressions
* lambda expressions
* libraries
* metavariables
* mutual recursion
* pipeline operator
* recursion
* semantics
* static semantics
* syntax
* tools
* type checking
* type inference
* values
## Further reading
* *Introduction to Objective Caml*, chapter 3
* *OCaml from the Very Beginning*, chapter 2
* *Real World OCaml*, chapter 2