Today we will take a look at the theoretical roots of functional programming in the form
of a mathematical precursor to OCaml,
the *functional
abstraction* and *function application* from an abstract, purely mathematical point
of view.

It was one of many related systems that were proposed in the late 1920s and 1930s. This was an
exciting time for theoretical computer science, even though computers had not yet
been invented. There were many great mathematicians such as Alonzo Church, Kurt Gödel, Alan
Turing, and Emil Post who were trying to understand the notion of *computation*.
They certainly understood intuitively what an algorithm was, but they were
lacking a formal mathematical definition. They came up with a number of
different systems for capturing the general idea of computation:

- Turing machines—Alan Turing
- μ-recursive functions—Kurt Gödel
- rewrite systems—Emil Post
- the λ-calculus—Alonzo Church
- combinatory logic—Haskell Curry

These systems all
looked very different, but what was amazing was that they all turned out to
be *computationally equivalent* in the sense that each could encode and
simulate the others. Church thought that this was too striking to be mere
coincidence, and declared that they had found the correct definition of the
notion of *computability*: it was the common spirit embodied in a different way
by each one of these formalisms. This declaration became known as

The λ-calculus is like OCaml, except much simpler:

*Everything*is a function. There are no other types—no integers, strings, lists, booleans, etc. There is no unit`()`. If you want these things, you must encode them using functions. Luckily, functions are powerful enough to do this.- There are no notions of state or side effects. It is purely functional. Thus we can think exclusively in terms of the substitution model.
- The order of
evaluation is irrelevant. OCaml imposes a certain evaluation order,
namely
*eager evaluation*—evaluate the arguments before applying the function. The λ-calculus does not specify an evaluation order. - All functions are
*unary*(take one argument). Functions of multiple arguments are encoded by currying.

`E,F,G,...` are defined inductively:

*variables:*any variable`x,y,z,...`is a λ-term.*application:*if`E`and`F`are λ-terms, then`EF`is a λ-term.*abstraction:*if`E`is a λ-term and`x`is a variable, then`λx.E`is a λ-term.

Notes:

- This is the basis of the inductive definition of λ-terms.
- The expression
`EF`denotes application of`E`(considered as a function) to`F`(considered as an input). Think of this as an expression with an invisible infix operator*apply*between the`E`and`F`, like + or *. This might also be written`(EF)`or`E(F)`with parentheses as necessary to avoid ambiguity. The apply operator is not associative, so`(EF)G`is not the same as`E(FG)`in general. The expression`EFG`written without parentheses should be read as`(EF)G`. This is the same convention as in OCaml. - The expression
`λx.E`would be written`fun x -> E`in OCaml. The application operator binds tighter than abstraction operator, so`λx.λx.xy`should be read as`λx.(λx.(xy))`.

Computation in the λ-calculus is effected by

*α-conversion*(renaming bound variables):`λx.E => λy.(E{y/x})`, where`E{y/x}`denotes the result of replacing all free occurrences of`x`in`E`by`y`, provided`y`would not be captured by a`λy`already in`E`. Example:

OK:`λx.λy.xy => λz.λy.zy`(rename`x`to`z`, nothing bad happened)

not OK:`λx.λy.xy => λy.λy.yy`(rename`x`to`y`,`y`was captured)

*β-reduction*(substitution rule):`(λx.E)F => E{F/x}`, where`E{F/x}`denotes the result of replacing all free occurrences of`x`in`E`by`F`. Before doing this, bound variables in`E`are renamed by α-conversion if necessary to avoid capturing free variables in`F`. Example:

OK:`(λx.λy.xy) (λz.z) => λy.(λz.z)y`

not OK:`(λx.λy.xy) (λz.yz) => λy.(λz.yz)y`(`y`was captured)

but if we rename the bound`y`first, it's OK:

`(λx.λy.xy) (λz.yz) => (λx.λa.xa) (λz.yz) => λa.(λz.yz)a`

Examples in OCaml:

`fun x -> x + 3`is equivalent to`fun y -> y + 3`.- This is just the substitution rule.
`(fun x -> x + 3) 7 => 7 + 3`.

An α- or β-reduction
step can be performed at any time to any subterm of a λ-term. Write `E
=> F` if `E` goes to `F` in some finite number of α- or
β-reduction steps.

In the λ-calculus, a reduction rule can be performed at any time to any subterm of a λ-term
at which the rule is applicable. OCaml specifies an order of evaluation, but the λ-calculus does not. It
turns out that it does not matter because of the ` E
=> F _{1} ` and

E / \ / \ / \ / \ F_{1}F_{2}\ / \ / \ / \ / G

So if you diverge by applying rules to different parts of the term, you can always get back together again by applying more rules. Example:

(fun x -> x + x) (2*3) / \ lazy/ \eager / \ / \ (2*3) + (2*3) (fun x -> x + x) 6 \ / \ / \ / \ / 6 + 6

Terms are `E _{1}
== E_{2}` if there exists an

E_{1}E_{2}\ / \ / \ / \ / F

The relation `==` is an equivalence relation on terms (reflexive, symmetric,
transitive). To show transitivity, we use the
Church-Rosser property: if ` E _{1} == E_{2} ` and

E_{1}E_{2}E_{3}\ / \ / \ / \ / \ / \ / \ / \ / F_{1}F_{2}

By the
Church-Rosser property, there exists a ` G` such that ` F _{1} =>
G` and

E_{1}E_{2}E_{3}\ / \ / \ / \ / \ / \ / \ / \ / F_{1}F_{2}\ / \ / \ / \ / G

Thus ` E _{1} =>
G` and

A λ-term is said to be in `(λx.M)N`.
For example, the terms `x` and `λx.y` are in normal form. Terms in
normal form correspond to *values* in OCaml. They are objects that
cannot be evaluated further.

A term is said to be

Some terms are not normalizable. For example, `(λx.xx)(λx.xx)`
is not. Even if a term is normalizable, there may be infinite reduction
sequences that never lead to a normal form. For example, `(λy.z)((λx.xx)(λx.xx))`
reduces to the normal form `z` if it is evaluated lazily, but does not
terminate if evaluated eagerly.

The λ-calculus is powerful enough to encode other data types and operations.

Define

true = λx.λy.x false = λx.λy.y

In other words,
`true` and `false` are curried two-argument functions that just return the first and second
argument, respectively. We can then encode the conditional `if-then-else`
as the identity function `λx.x`:

if-then-else true B C => (λx.x) true B C => true B C => (λx.λy.x) B C => (λy.B) C => B if-then-else false B C => (λx.x) false B C => false B C => (λx.λy.y) B C => (λy.y) C => C

We can define other Boolean operations:

and = λx.λy.xyx or = λx.λy.xxy not = λx.(x false true)

For example,

and true false => (λx.λy.xyx) true false => true false true => (λx.λy.x) false true => false

In OCaml, all the pairing operator `( , )` does is package up two things in such a way that they can be extracted
again by `fst` and `snd`. To code this in the λ-calculus, we can take two
terms `A` and `B` and produce a function that will apply another function to them.
Define

pair = λx.λy.λc.cxy

If we apply this to `A` and `B` we get

pair A B = λc.(c A B)

and this is how we encode the pair `(A, B)`. Now we can
extract `A` and `B` again by applying this function to
`true` and `false`, respectively:

(λc.(c A B)) true => true A B => (λx.λy.x) A B => A (λc.(c A B)) false => false A B => (λx.λy.y) A B => B

so we should define

fst = λd.(d true) snd = λd.(d false)

We can encode lists as iterated tuples. Thus `::` is the same
as `pair`, `hd` is `fst`, and `tl` is `snd`.
If we define the empty list `[]` appropriately, we can get a `null`
test. Define

[] = λx.true null = λd.(d λx.λy.false)

Then

null [] => λd.(d λx.λy.false) (λx.true) => (λx.true) (λx.λy.false) => true null (pair A B) => λd.(d λx.λy.false) (λc.(c A B)) => λc.(c A B) (λx.λy.false) => (λx.λy.false) A B => false

In OCaml,
numbers are a built-in primitive type. In the λ-calculus, we have to
encode them as λ-terms. Church came up with a neat way to do this.
His encoding is
called the `n` as a λ-term
`n` representing a function
that takes another function `f` as input and produces the `n`-fold composition of
`f` with itself.

0 = λf.λx.x 1 = λf.λx.fx 2 = λf.λx.f(fx) 3 = λf.λx.f(f(fx)) . . . n = λf.λx.f(...(f(f(fx)))...)

Note that `0`
is just `false` after α-conversion. Let's write
` f ^{n}x`
as an abbreviation for the term

n = λf.λx.f^{n}x

This is nice,
but in order to do anything with it, we have to encode arithmetic. Let's
start with the simplest operation, namely successor. We would like to define a
λ-term ` add1` that when applied to any Church numeral `n`
produces `n+1`. Note that

nfx => f^{n}x

so

f(nfx) => f(f^{n}x) = f^{n+1}x

λ-abstracting, we get

λf.λx.f(nfx) => λf.λx.f^{n+1}x = n+1

Therefore we want
our successor function to produce `λf.λx.f(nfx)`
from `n`. Thus we
just define

add1 = λn.λf.λx.f(nfx)

How do we test for `0`?

isZero = λn.(n λz.false true)

Then

isZero 0 => λn.(n λz.false true) λf.λx.x => λf.λx.x λz.false true => true isZero n+1 => λn.(n λz.false true) λf.λx.f^{n+1}x => λf.λx.f^{n+1}x λz.false true => λx.(λz.false)^{n+1}x true => (λz.false)^{n+1}true => false

Addition is just iterated successor. Note that

m add1 => λf.λx.f^{m}x add1 => λx.(add1^{m}x)

thus

m add1 n => λx.(add1^{m}x) n => add1^{m}n => m+n

This suggests that we should define

add = λn.λm.(m add1 n)

Multiplication can be defined by iterated addition and exponentiation by iterated multiplication in a similar fashion.

mul = λn.λm.(m (add n) 0) exp = λn.λm.(m (mul n) 1)

Subtraction is more difficult, but it can be done. As above, we start
by defining the predecessor function `sub1`, where

sub1 n+1 = n sub1 0 = 0

Recall from our encoding of lists above that

[] = λx.true [n; n-1; n-2; ...; 2; 1; 0] = pair n [n-1; n-2; ...; 2; 1; 0]

so `[...]`
represents a list in the usual OCaml sense under our encoding. Our
approach is to come up with a term `F` such that

n F [0] => [n; n-1; n-2; ...; 2; 1; 0]

If we can concoct such a term `F`, then we can take

sub1 = λn.(if-then-else (isZero n) 0 (fst (snd (n F [0]))))

Note that

n F [0] => F^{n}[0]

so this will work provided for all `n` we have

F [n; n-1; n-2; ...; 2; 1; 0] => [n+1; n; n-1; n-2; ...; 2; 1; 0]

This can be achieved by defining

F = λx.(pair (add1 (fst x)) x)

Proper subtraction (`sub n
m = 0
if n < m,
n - m
if n >= m`)
can be effected by iterating `sub1` exactly as `add` was
defined by iterating `add1`.