CS 312 Recitation 26
The Lambda Calculus

Lambda calculus is a notation for describing mathematical functions and programs. It is a mathematical system for studying the interaction of functional abstraction and functional application. It captures some of the essential, common features of a wide variety of programming languages. Because it directly supports abstraction, it is a more natural model of universal computation than a Turing machine is.

Syntax

A λ-calculus term is:

a variable x∈Var, where Var is a countably infinite set of variables;
an application, a function e₀ applied to an argument e₁, usually written e₀ e₁ or e₀(e₁); or
a lambda abstraction, an expression λx.e representing a function with input parameter x and body e. Where a mathematician would write x ↦ x², or an SML programmer would write fn x => x*x, in the λ-calculus we write λx.x².

In BNF notation,

e ::= x | λx.e | e₀ e₁

We use the metavariable e to represent a λ-calculus term.

Parentheses are used just for grouping; they have no meaning on their own. Like other familiar binding constructs from mathematics (e.g., sums, integrals), lambda terms are greedy, extending as far to the right as they can. Therefore, the term λx. λy. y is the same as λx.(x (λy.y)), not (λx.x) (λy.y). As in SML, application terms are left-associative, so x y z is the same thing as (x y) z.

For simplicity, multiple variables may be placed after the lambda, and this is considered shorthand for having a lambda in front of each variable. For example, we write λxy.e as shorthand for λx.λy.e. This shorthand is an example of syntactic sugar. The process of removing this shorthand is currying, just as in SML, and adding it is called uncurrying.

We can apply a curried function like λx.λy.x one argument at a time. Applying it to one argument results in a function that takes in a value for x and returns a constant function, one that returns the value of x no matter what argument it is applied to. As this suggests, functions are just ordinary values, and can be the results of functions or passed as arguments to functions (even to themselves!). Thus, in the lambda calculus, functions are first-class values: lambda terms serve both as functions and data.

Variables

Just as in SML, occurrences of variables in a term can be bound or free. The term λx.e binds all the free occurrences of x in e. The scope of x is e. This is called lexical scoping, because the variable's scope is defined by the text of the program. It is “lexical” because it is possible to determine its scope before the program runs, just by inspecting the program text. A term is closed if all variables are bound. A term is open if it is not closed.

Substitution and β-reduction

How do we run a λ-calculus program? The main computational rule is β-reduction, which we are already familiar with from SML. This rule applies whenever there is a subterm of the form (λx.e) e' representing the application of a function λx.e to an argument e'. To perform β-reduction, we substitute the argument e' for all free occurrences of the formal parameter x in the body e, obtaining e{e'/x}. This agrees with SML and corresponds to our intuition for what the function λx.e means.

We have to be a little careful; we cannot always substitute e' blindly for x in e, because bad things could happen which could alter the meaning of expressions in undesirable ways. We only want to replace the free occurrences of x within e, because any other occurrences are bound to a different binding; they are really different variables. There are some additional subtleties to substitution when e' is not closed, in particular the problem of variable capture.

The β reduction (λx.e) e' → e{e'/x} is the basic computational step of the λ-calculus. In the pure λ-calculus, we can start with a term and perform β-reductions on subterms in any order. However, for modeling programming languages, it is useful to restrict which β reductions are allowed and in what order they can be performed.

Higher-order functions

The lambda calculus is all about functions. It's easy to define higher-order functions that operate on other functions. For example, we can define a function COMPOSE that operates on two functions f and g to produce the composition of f and g, written f∘g. The definition is that (f∘g)(x) = f(g(x)), which we can then directly abstract:

COMPOSE = λfg. λx.f (g x)

We can use COMPOSE to define a higher-order function TWICE that operates on a function f to produce a new function that applies f twice:

TWICE = λf. COMPOSE f f
= λf. (λfg. λx.f (g x)) f f
= λf. λx.f (f x) (by β reduction inside the lambda abstraction)

If we want to define a function that applies another function four times, we can just apply TWICE to itself: FOURTIMES = TWICE TWICE.

Call-by-Name and Call-by-Value

In general there may be many possible β-reductions that can be performed on a given term. How do we choose which beta reductions to perform next? Does it matter?

A specification of which β-reduction to perform next is called a reduction strategy. Under a given reduction strategy, a value is a term on which no β-reductions are permitted. chosen reduction strategy. For example, λx.x would always be value, whereas ((λx.x) 1) would not be under usual reduction strategies.

Most real programming languages based on the λ-calculus, such as SML, use a reduction strategy known as call by value (CBV), in which functions may only be applied to (called on) values. Thus (λx.e) e' only β-reduces if e' is a value v. Here is an example of a CBV evaluation sequence, assuming 3, 4, and S (the successor function) are appropriately defined:

((λx.λy.y x) 3) S
→ (λy.y 3) S
→ S 3
→ 4

Another important strategy is call by name (CBN). We defer evaluation of arguments until as late as possible, applying reductions from left to right within the expression. In other words, we can pass an incomplete computation to a function as an argument. Terms are evaluated only once their value is really needed.

Ω and divergence

Let us define an expression we call Ω:

Ω = (λx.x x) (λx.x x)

What happens when we try to evaluate it?

Ω = (λx.x x) (λx.x x) → (x x){(λx.x x)/x} = Ω

We have just coded an infinite loop! When an expression e can go through infinite sequence of evaluation steps, we say e diverges. This corresponds to nontermination.

What happens if we try using Ω as a parameter? It depends on the reduction strategy. Consider this example:

e = (λx.(λy.y)) Ω

Using the CBV evaluation strategy, we must first reduce Ω. This puts the evaluator into an infinite loop, so e diverges. On the other hand, CBN reduces the term above to λy.y, the identify function. So CBN avoids divergence because it is lazy about evaluating arguments. CBN has an important property: CBN will not loop infinitely unless every other semantics would also loop infinitely, yet it agrees with CBV whenever CBV terminates successfully.

Local variables

One feature we seem to be missing is the ability to declare local variables. For example, in SML we introduce a new local variable with the "let" expression: let val x = e₁ in e₂ end.

We expect this expression to evaluate e₁ to some value v₁ and then to substitute for occurrences of x, obtaining e₂{v₁/x}. But we know another term that evaluates to the same expression:

(λx.e₂) e₁ → ... →
(λx.e₂) v₁ →
e₂{v₁/x}

Therefore, we can view a let expression as simply syntactic sugar for an application of a lambda abstraction. We don't really need to have let in the language.

Booleans

Perhaps the simplest interesting kind of value is a Boolean. We would like to define terms that act like the Boolean constants TRUE and FALSE and the Boolean operators IF, AND, OR, NOT, so that all these terms behave in the expected way, obeying the boolean abstraction. There are many reasonable encodings into lambda calculus. The standard approach is to define TRUE and FALSE functions that return the first and second of their two arguments, respectively:

TRUE = λx.λy.x
FALSE = λx.λy.y

It's important to realize that the tokens TRUE and FALSE are not part of the lambda calculus. We are just using them as abbreviations for the terms λx.λy.x and λx.λy.y.

Now, what about the conditional test IF? We would like IF to take three arguments b, t, f, where b is a Boolean value and t, f are arbitrary terms. The function should return t if b=TRUE and f if b=FALSE. Now the reason for defining TRUE and FALSE the way we did becomes clear. Since TRUE t f → t and FALSE t f → f, all IF has to do is apply its Boolean argument to the other two arguments:

IF = λb.λt.λf.b t f

The other Boolean operators can be defined using IF, TRUE, and FALSE:

AND = λbb'. IF b b' FALSE
OR = λbb'. IF b TRUE b'
NOT = λb.IF b FALSE TRUE

These operators work correctly when given Boolean values as we have defined them, but all bets are off if they are applied to any other term. There is no guarantee of any kind of reasonable behavior. Evaluation will proceed and reductions will be applied, but there is no way to usefully interpret the result. With the untyped λ-calculus, it is garbage in, garbage out.

Numbers

We can encode natural numbers using Church numerals, introduced by Alonzo Church. The Church numeral for the number n is denoted n. It is the function that takes in a function f and produces a new function that is the n-fold composition of f with itself:

n = λf.fⁿ

Of course, we can't write the exponential directly in lambda calculus. But we can build it up inductively:

0 = λf.λx.x
1 = λf.λx.f x
2 = λf.λx.f(f x) (= TWICE)
3 = λf.λx.f(f(f x))
...
n = λf.λx.f(f(...(f x))

The most basic operation on a number n is to find its successor S(n), which is n+1. We can define the successor function S as

S = λn.λf.λx.f((n f) x)

In words, S on input n returns a function that takes a function f as input, applies n to it to get the n-fold composition of f with itself, then composes that with one more f to get the (n+1)-fold composition of f with itself. Then,

S(n) = (λnfx.f (n f x)) n
→ λfx.f (n f x)
= n+1.

We can perform more interesting arithmetic with Church numerals. We might define addition as follows:

ADD = λmn.λfx.(m f) ((n f) x))

If applied to arguments m and n, ADD returns a number that if applied to f, returns the function that applies f the desired number of times: m+n. This works because we are composing f^m with fⁿ to get f^m+n. Therefore, we could have defined ADD = λmn.λf. COMPOSE (m f) (n f).

As another alternative, recall that Church numerals act on a function to apply that function repeatedly, and addition is equivalent to repeated application of the successor function S, so we could define ADD as:

ADD = λmn.(m S) n

If we apply ADD in curried fashion to one argument n, we will get a function that adds n to whatever it is applied to.

Similarly, multiplication is just iterated addition, and exponentiation is iterated multiplication:

MUL = λmn.m (ADD n) 0
EXP = λmn.m (MUL n) 1

These last two functions can be defined even more compactly. COMPOSE will act like MUL on numbers. And EXP can be defined as:

EXP = λmn.n m

Defining subtraction, division, and other arithmetic operations is possible but more tricky. A good exercise is to figure out how to construct a predecessor function on the natural numbers.

Church numerals only encode natural numbers, but they can be used to build more complex kinds of numbers such as integers and floating point numbers.

Pairing and Projection

Logic and arithmetic are good places to start, but we still have no useful data structures. For example, consider ordered pairs. We expect to have a pairing constructor PAIR with projection operations FIRST and SECOND that obey the following equations for any e₁, e₂, p:

FIRST (PAIR e₁ e₂) = e₁
SECOND (PAIR e₁ e₂) = e₂
PAIR (FIRST p) (SECOND p) = p

provided p is a pair. We can take a hint from IF. Recall that IF selects one of its two branch options depending on its Boolean argument. PAIR can do something similar, wrapping its two arguments for later extraction by some function f:

PAIR = λab. λf. f a b

Thus PAIR e₁ e₂ = λf. f e₁ e₂. To get e₁ back out, we can just apply this to TRUE, which will pick out its first argument: (λf.f e₁ e₂) TRUE → TRUE e₁ e₂→ e₁, and similarly, applying it to FALSE extracts e₂. Thus we can define two projection functions FIRST and SECOND corresponding to SML #1 and #2:

FIRST = λp.p TRUE
SECOND = λp.p FALSE

Again, if p isn't a term constructed using PAIR, expect the unexpected.

Recursion through self-application

With an encoding for IF, we have some control over the flow of a program. We can also write simple for loops using the Church numerals n, by applying the Church numeral to the loop body expressed as a function. However, we do not yet have the ability to write an unbounded while loop or a recursive function.

In ML, we can write the factorial function recursively as

fact(n) = if n <= 1 then 1 else n*fact(n-1)

But how can we write this in the lambda calculus, where all the functions are anonymous? We must somehow construct a term FACT that satisfies the following equation, assuming appropriate definitions for SUB and a less-than-or-equal comparison LEQ:

FACT = λn. IF (LEQ n 1) 1 (MUL n (FACT (SUB n 1)))

We can't just define FACT with this equation, because FACT appears on the right-hand side. Therefore it doesn't make sense as a shorthand for the term on the right. But it turns out we can define a term that acts in the way we want.

Recursion through self-application

Suppose we break up the recursion into two steps. Since FACT can't refer to itself directly, we need some way to pass FACT into itself so it can be used for the recursion. We start by defining a function F that adds an extra argument f for passing in something similar to FACT:

F = λf. λn. IF (LEQ n 1) 1 (MUL n (f (SUB n 1)))

Now, if we just had FACT available, we could pass it to this function and get the FACT we want. Of course, that's begging the question. But we do have something rather similar to FACT: this function F itself. That won't quite work either, because this function expects to receive the extra argument f. So we just change the function definition so it calls f with an extra argument: f. Call this new function FACT':

FACT' = λf. λn. IF (LEQ n 1) 1 (MUL n ((f f)(SUB n 1)))

FACT' has the property that if a function just like it is passed to it, it will do what FACT should. Since (FACT' FACT') should behave as we want FACT to, we define FACT as exactly this self-application:

FACT = FACT' FACT'

If we try this definition of FACT, we can see the recursion working:

FACT(4) = (FACT' FACT') 4
= (λn.IF (LEQ n 1) 1 (MUL n ((FACT' FACT') (SUB n 1)))}) 4
= IF (LEQ 4 1) 1 (MUL 4 (FACT (SUB 4 1)))
= (MUL 4 (FACT (SUB 4 1)))
= (MUL 4 (FACT 3))
...
= (MUL 4 (MUL 3 (MUL 2 1))) = 24

This self-application trick can be used to implement all recursive functions. It even works in SML, though you have to use a recursive datatype to convince the type system that it is type-safe:

Conclusions

We've seen that many interesting features of SML can be encoded using just lambda calculus functions. The lambda calculus is in fact a universal model of computation that is just as powerful as other realizable models such as Turing machines (which we can easily prove by implementing a Turing machine using lambda calculus).

CS 312 Recitation 26 The Lambda Calculus

Syntax

Variables

Substitution and β-reduction

Higher-order functions

Call-by-Name and Call-by-Value

Ω and divergence

Local variables

Booleans

Numbers

Pairing and Projection

Recursion through self-application

Recursion through self-application

Conclusions

CS 312 Recitation 26
The Lambda Calculus