Lecture 28: Introduction to the λ-Calculus

Today we will take a look at the theoretical roots of functional programming in the form of a mathematical precursor to OCaml, the λ- (lambda-) calculus. The λ-calculus was invented by Alonzo Church in the 1930s to study the interaction of functional abstraction and function application from an abstract, purely mathematical point of view.

It was one of many related systems that were proposed in the late 1920s and 1930s. This was an exciting time for theoretical computer science, even though computers had not yet been invented. There were many great mathematicians such as Alonzo Church, Kurt Gödel, Alan Turing, and Emil Post who were trying to understand the notion of computation. They certainly understood intuitively what an algorithm was, but they were lacking a formal mathematical definition. They came up with a number of different systems for capturing the general idea of computation:

Turing machines—Alan Turing
μ-recursive functions—Kurt Gödel
rewrite systems—Emil Post
the λ-calculus—Alonzo Church
combinatory logic—Haskell Curry

These systems all looked very different, but what was amazing was that they all turned out to be computationally equivalent in the sense that each could encode and simulate the others. Church thought that this was too striking to be mere coincidence, and declared that they had found the correct definition of the notion of computability: it was the common spirit embodied in a different way by each one of these formalisms. This declaration became known as Church's thesis. Nowadays we have a better understanding of computability, and one could add any modern general-purpose programming language such as Java, C, Scheme, or OCaml (suitably abstracted) to this list, but Church and his colleagues did not have that luxury.

The λ-Calculus

The λ-calculus is like OCaml, except much simpler:

Everything is a function. There are no other types—no integers, strings, lists, booleans, etc. There is no unit (). If you want these things, you must encode them using functions. Luckily, functions are powerful enough to do this.
There are no notions of state or side effects. It is purely functional. Thus we can think exclusively in terms of the substitution model.
The order of evaluation is irrelevant. OCaml imposes a certain evaluation order, namely eager evaluation—evaluate the arguments before applying the function. The λ-calculus does not specify an evaluation order.
All functions are unary (take one argument). Functions of multiple arguments are encoded by currying.

Syntax

λ-terms E,F,G,... are defined inductively:

variables: any variable x,y,z,... is a λ-term.
application: if E and F are λ-terms, then EF is a λ-term.
abstraction: if E is a λ-term and x is a variable, then λx.E is a λ-term.

Notes:

This is the basis of the inductive definition of λ-terms.
The expression EF denotes application of E (considered as a function) to F (considered as an input). Think of this as an expression with an invisible infix operator apply between the E and F, like + or *. This might also be written (EF) or E(F) with parentheses as necessary to avoid ambiguity. The apply operator is not associative, so (EF)G is not the same as E(FG) in general. The expression EFG written without parentheses should be read as (EF)G. This is the same convention as in OCaml.
The expression λx.E would be written fun x -> E in OCaml. The application operator binds tighter than abstraction operator, so λx.λx.xy should be read as λx.(λx.(xy)).

Substitution

Computation in the λ-calculus is effected by substitution. There are two rules:

α-conversion (renaming bound variables): λx.E => λy.(E{y/x}), where E{y/x} denotes the result of replacing all free occurrences of x in E by y, provided y would not be captured by a λy already in E. Example:
OK: λx.λy.xy => λz.λy.zy (rename x to z, nothing bad happened)
not OK: λx.λy.xy => λy.λy.yy (rename x to y, y was captured)
β-reduction (substitution rule): (λx.E)F => E{F/x}, where E{F/x} denotes the result of replacing all free occurrences of x in E by F. Before doing this, bound variables in E are renamed by α-conversion if necessary to avoid capturing free variables in F. Example:
OK: (λx.λy.xy) (λz.z) => λy.(λz.z)y
not OK: (λx.λy.xy) (λz.yz) => λy.(λz.yz)y (y was captured)
but if we rename the bound y first, it's OK:
(λx.λy.xy) (λz.yz) => (λx.λa.xa) (λz.yz) => λa.(λz.yz)a

Examples in OCaml:

fun x -> x + 3 is equivalent to fun y -> y + 3.
This is just the substitution rule. (fun x -> x + 3) 7 => 7 + 3.

An α- or β-reduction step can be performed at any time to any subterm of a λ-term. Write E => F if E goes to F in some finite number of α- or β-reduction steps.

Order of Evaluation

In the λ-calculus, a reduction rule can be performed at any time to any subterm of a λ-term at which the rule is applicable. OCaml specifies an order of evaluation, but the λ-calculus does not. It turns out that it does not matter because of the Church-Rosser theorem: If E => F₁ and E => F₂, then there always exists a G such that F₁ => G and F₂ => G.

             E
            / \
           /   \
          /     \
         /       \
        F₁        F₂
         \       /
          \     /
           \   /
            \ /
             G

So if you diverge by applying rules to different parts of the term, you can always get back together again by applying more rules. Example:

      (fun x -> x + x) (2*3)
            /       \
       lazy/         \eager
          /           \
         /             \
 (2*3) + (2*3)    (fun x -> x + x) 6
         \             /
          \           /
           \         /
            \       /
              6 + 6

Terms are equivalent if they have a common reduct; that is, E₁ == E₂ if there exists an F such that E₁ => F and E₂ => F.

        E₁        E₂
         \       /
          \     /
           \   /
            \ /
             F

The relation == is an equivalence relation on terms (reflexive, symmetric, transitive). To show transitivity, we use the Church-Rosser property: if E₁ == E₂ and E₂ == E₃, then there exist F₁ and F₂ such that

        E₁        E₂        E₃
         \       / \       /
          \     /   \     /
           \   /     \   /
            \ /       \ /
             F₁        F₂

By the Church-Rosser property, there exists a G such that F₁ => G and F₂ => G:

        E₁        E₂        E₃
         \       / \       /
          \     /   \     /
           \   /     \   /
            \ /       \ /
             F₁        F₂
              \       /
               \     /
                \   /
                 \ /
                  G

Thus E₁ => G and E₃ => G, which proves that E₁ == E₃.

Values

A λ-term is said to be in normal form if there are no β-reductions that apply; that is, if there are no subterms of the form (λx.M)N. For example, the terms x and λx.y are in normal form. Terms in normal form correspond to values in OCaml. They are objects that cannot be evaluated further.

A term is said to be normalizable if there is a sequence of α- and β-reductions that lead to a normal form. Because of the Church-Rosser property, if a term is normalizable, then its normal form is unique (up to α-conversion). A sequence of reductions leading to a normal form corresponds roughly to a halting computation in OCaml.

Some terms are not normalizable. For example, (λx.xx)(λx.xx) is not. Even if a term is normalizable, there may be infinite reduction sequences that never lead to a normal form. For example, (λy.z)((λx.xx)(λx.xx)) reduces to the normal form z if it is evaluated lazily, but does not terminate if evaluated eagerly.

Encoding Other Data Types

The λ-calculus is powerful enough to encode other data types and operations.

Booleans

Define

true  = λx.λy.x
false = λx.λy.y

In other words, true and false are curried two-argument functions that just return the first and second argument, respectively. We can then encode the conditional if-then-else as the identity function λx.x:

if-then-else true B C
  =>  (λx.x) true B C
  =>  true B C
  =>  (λx.λy.x) B C
  =>  (λy.B) C 
  =>  B

if-then-else false B C
  =>  (λx.x) false B C
  =>  false B C
  =>  (λx.λy.y) B C
  =>  (λy.y) C 
  =>  C

We can define other Boolean operations:

and = λx.λy.xyx
or  = λx.λy.xxy
not = λx.(x false true)

For example,

and true false
  =>  (λx.λy.xyx) true false
  =>  true false true
  =>  (λx.λy.x) false true
  =>  false

Pairs and Tuples

In OCaml, all the pairing operator ( , ) does is package up two things in such a way that they can be extracted again by fst and snd. To code this in the λ-calculus, we can take two terms A and B and produce a function that will apply another function to them. Define

pair  =  λx.λy.λc.cxy

If we apply this to A and B we get

pair A B  =  λc.(c A B)

and this is how we encode the pair (A, B). Now we can extract A and B again by applying this function to true and false, respectively:

(λc.(c A B)) true
  =>  true A B
  =>  (λx.λy.x) A B
  =>  A
(λc.(c A B)) false
  =>  false A B 
  =>  (λx.λy.y) A B
  =>  B

so we should define

fst  =  λd.(d true)
snd  =  λd.(d false)

Lists

We can encode lists as iterated tuples. Thus :: is the same as pair, hd is fst, and tl is snd. If we define the empty list [] appropriately, we can get a null test. Define

[]   =  λx.true
null =  λd.(d λx.λy.false)

Then

null []
  =>  λd.(d λx.λy.false) (λx.true)
  =>  (λx.true) (λx.λy.false)
  =>  true
null (pair A B)
  =>  λd.(d λx.λy.false) (λc.(c A B))
  =>  λc.(c A B) (λx.λy.false)
  =>  (λx.λy.false) A B
  =>  false

Numbers and Arithmetic

In OCaml, numbers are a built-in primitive type. In the λ-calculus, we have to encode them as λ-terms. Church came up with a neat way to do this. His encoding is called the Church numerals. The idea is to encode the number n as a λ-term n representing a function that takes another function f as input and produces the n-fold composition of f with itself.

0  = λf.λx.x
1  = λf.λx.fx
2  = λf.λx.f(fx)
3  = λf.λx.f(f(fx))
     .
     .
     .
n  = λf.λx.f(...(f(f(fx)))...)

Note that 0 is just false after α-conversion. Let's write fⁿx as an abbreviation for the term f(...(f(f(fx)))...) with n applications of f. Then

n  =  λf.λx.fⁿx

This is nice, but in order to do anything with it, we have to encode arithmetic. Let's start with the simplest operation, namely successor. We would like to define a λ-term add1 that when applied to any Church numeral n produces n+1. Note that

nfx  =>  fⁿx

f(nfx)  =>  f(fⁿx) = fⁿ⁺¹x

λ-abstracting, we get

λf.λx.f(nfx)
  =>  λf.λx.fⁿ⁺¹x 
  =  n+1

Therefore we want our successor function to produce λf.λx.f(nfx) from n. Thus we just define

add1  =  λn.λf.λx.f(nfx)

How do we test for 0?

isZero  =  λn.(n λz.false true)

Then

isZero 0
  =>  λn.(n λz.false true) λf.λx.x
  =>  λf.λx.x λz.false true
  =>  true
isZero n+1
  =>  λn.(n λz.false true) λf.λx.fⁿ⁺¹x
  =>  λf.λx.fⁿ⁺¹x λz.false true
  =>  λx.(λz.false)ⁿ⁺¹x true
  =>  (λz.false)ⁿ⁺¹ true
  =>  false

Addition is just iterated successor. Note that

m add1  =>  λf.λx.f^mx add1  =>  λx.(add1^m x)

thus

m add1 n  =>  λx.(add1^m x) n  =>  add1^m n  =>  m+n

This suggests that we should define

add  =  λn.λm.(m add1 n)

Multiplication can be defined by iterated addition and exponentiation by iterated multiplication in a similar fashion.

mul  =  λn.λm.(m (add n) 0)
exp  =  λn.λm.(m (mul n) 1)

Subtraction is more difficult, but it can be done. As above, we start by defining the predecessor function sub1, where

sub1 n+1 = n
sub1 0   = 0

Recall from our encoding of lists above that

[]  =  λx.true
[n; n-1; n-2; ...; 2; 1; 0]  =  pair n [n-1; n-2; ...; 2; 1; 0]

so [...] represents a list in the usual OCaml sense under our encoding. Our approach is to come up with a term F such that

n F [0]  =>  [n; n-1; n-2; ...; 2; 1; 0]

If we can concoct such a term F, then we can take

sub1  =  λn.(if-then-else (isZero n) 0 (fst (snd (n F [0]))))

Note that

n F [0]  =>  Fⁿ [0]

so this will work provided for all n we have

F [n; n-1; n-2; ...; 2; 1; 0]  =>  [n+1; n; n-1; n-2; ...; 2; 1; 0]

This can be achieved by defining

F  =  λx.(pair (add1 (fst x)) x)

Proper subtraction (sub n m = 0 if n < m, n - m if n >= m) can be effected by iterating sub1 exactly as add was defined by iterating add1.