Introduction to the Lambda Calculus:

See Ch. 3, 4, and 5 for relevant material.


The abstract syntax for the pure lambda calculus is dirt simple:

  x in Var
  e in Exp ::= x | \x.e | e1 e2

(I'm using \ for lambda here.)  The (small-step) operational 
semantics is also dirt simple:

  (\x.e1) e2 => e1[e2/x]
  
  e1 => e1'
  ---------------
  e1 e2 => e1' e2

where e1[e2/x] denotes capture avoiding substitution of
the term e2 for the free occurrences of x within the term
e1.  This is formalized as follows:


  x[e2/x] = e2
  y[e2/x] = y  (y != x)
  (\x.e)[e2/x] = \x.e
  (\y.e)[e2/x] = \y.(e[e2/x])   (y not in FV(e2))
  (e e')[e2/x] = (e[e2/x]) (e'[e2/x])

Note that FV(e) stands for the set of free variables in
an expression and is defined as:

  FV(x) = {x}
  FV(e1 e2) = FV(e1) U FV(e2)
  FV(\x.e) = FV(e) \ {x}

These rules are meant to capture a high-level model of functions and
function calls.  Intuitively, we substitute the actual parameter (e2)
for the formal parameter (x) within the body of the function and then
run the function.

First, a basic principle of static scoping is that variable names
shouldn't matter as far as the semantics of a program are concerned.
Both \x.x and \y.y behave the same (they are terms that represent the
identity function) so we would like to treat them as equivalent.

This gives rise to the notion of alpha-equivalence and
alpha-equivalent classes of terms.  We can define:

  x == x

  e1 == e1'    e2 == e2'
  ---------------------
    e1 e2 == e1' e2'

  e1[z/x] == e2[z/y]
  ------------------ (z not in FV(e1) U FV(e2))
  \x.e1 == \y.e2

and say that two terms e1 and e2 are alpha equivalent if e1 == e2.  We
write [e] for the set of terms that are alpha-equivalent to e (i.e., 
{e' | e' == e }).

When we define the semantics of programs, we're going to technically
be using alpha equivalence classes over terms instead of the terms
themselves.  This means that, by default, alpha-equivalent terms will
end up having the same meaning.  It also means that we can use the
"bound variable convention":

  When you are writing down a lambda-calculus expression, you
  can always pick a term in the alpha-equivalence class that
  avoids naming conflicts with bound variables.  

For instance, in the substitution rule above, we have a clause:

  (\y.e)[e2/x] = \y.(e[e2/x])    (y not in FV(e2))

Suppose e2 = y.  Then we can't do the substitution because of the side
condition.  However, we can first pick out the alpha-equivalent term
(\z.e[z/y]) where z is a fresh variable and then do the substitution.

A few technicalities regarding scope and variables: First, note that
when we have:

  (\x.e)[e2/x] 

we don't push the substitution in.  The reason is that the inner
binding for x shadows the outer binding that we were substituting.
For instance, if we start with:

  \x.(\x.x)

and apply that to e2, we'll end up trying to reduce (\x.x)[e2/x] which
should evaluate to just \x.x.

---------------------------------------------------------------------
Encodings in the pure lambda calculus:

An easy encoding is:

  let x = e1 in e2 =def= (\x.e2) e1

Note that (\x.e2) e1 => e2[e1/x] which is the intended behavior
of a let-expression.

Another easy encoding is a multi-argument function.  

  \[x1,x2,...,xn].e = \x1.\x2....\xn.e

To encode a data value, we think about how the data values are used
instead of how they are built.  For instance, booleans are used in
conditionals.  We want to define some functions for true, false, and
if such that

  if e e1 e2  => e1   when e evaluates to true
  if e e1 e2  => e2   when e evaluates to false

We can simplify the problem by defining if as:

  if e e1 e2 = e e1 e2

That is:

  if =def= \test.\then.\else.test then else

Then we can take true and false to be:

  true =def= \then.\else.then
  false =def= \then.\else.else

Then you can verify that:

  if true e1 e2 =>* e1

  if false e1 e2 =>* e2

We can define numbers as follows:

  0 =def= \f.\x.x

  1 =def= \f.\x.(f x)
  
  2 =def= \f.\x.(f (f x))

  3 =def= \f.\x.(f (f (f x)))

  ...

  n =def= \f.\x.f^n x

Note that a number n takes a function f and some argument x and
applies the function to the argument n times.  

  inc =def= \n.\f.\x.f (n f x)

  plus =def= \n.\m.n inc m

  times =def= \n.\m.n (plus m) 0

We can define pairs (2-tuples) as:

  pair =def= \x.\y.\f.f x y

So, pair 0 1 =>* \f.f 0 1.  

  first =def= \p.(p (\x.\y.x))
  second =def= \p.(p (\x.\y.y))

So, first (pair 0 1) =>* 0
    second (pair 0 1) =>* 1

You can define lists, trees, predecessor, subtraction, tests for
zero, and just about anything else you can imagine in the pure
lambda calculus.  

You can also encode loops.  Here's the analog of "while true do skip":

  forever =def= (\x.x x)(\x.x x)

  Note that forever => forever => forever => ...

A more interesting combinator is Y:

  Y =def= \f.(\x.f (x x))(\x.f (x x))

Note that for any function f:

  Y f => f (Y f)

So, for instance, we can define:

  factbody = (\f.\x.if (eq x 0) 1 (times x (f (dec x 1))))

  fact =def= Y factbody

Then note that  Y factbody => factbody (Y factbody) = factbody fact =>

   \x.if (eq x 0) 1 (times x (fact (dec x 1)))

So, if you have an ML function:

  let fun f(x) = e
  in
     e'
  end

we can encode this as:

  (\f.e') (Y (\f.\x.e))

---------------------------------------------------------------------
Consider:

  inc 1 =

  (\n.\g.\x.g (n g x)) (\f.\x.(f x))  =>

  \g.\x.g ((\f.\x.(f x)) g x)

and then we're stuck.  So the result of inc 1 is

  \g.\x.g ((\f.\x.(f x)) g x)

which is not 2:

  \g.\x.g (g x)

But I claim that these two functions behave the same in all contexts.
That is, (inc 1) and 2, when given the same arguments, produce
equivalent results.  So we're morally justified in saying that
(inc 1) is equal to 2, even though the evaluation rules don't
justify this directly.  

The problem is that, for any given function, there are lots
and lots of ways to write down that function as a lambda
term.  It would be nice, when we're reasoning about programs,
if we could somehow collapse all of the functions down so that,
if they behave the same, then we could get them to line up
syntactically.  This would allow us to mechanize the comparison
of two programs to see if they are equal.  

Let me define the following inference rules:

  e =a= e'
  -------
  e =b= e'

  e1 =b= e2
  ---------
  e2 =b= e1

  e1 =b= e2   e2 =b= e3
  ---------------------
       e1 =b= e3

The first rule says that if two expressions are alpha-equivalent,
then we also consider them to be semantically (beta) equivalent.
The second and third rules say that =b= is an equivalence relation
in that it is symmetric and transitive.  (Alpha-equivalence gives
us reflexivity.)  

  (\x.e1) e2 =b= e1[e2/x]   

This rule says that a function applied to an expression can be
considered equivalent to the substitution of that expression for the
formal parameter in the body of the function -- just like the
evaluation rule.

  e1 =b= e1'   e2 =b= e2'
  -----------------------
   e1 e2 =b= e1' e2'

     e =b= e'
  --------------
  \x.e =b= \x.e'

These two rules say that we can replace beta-equivalent expressions
within sub-expressions and still get out equivalent expressions.

Finally, there's one more rule that's possible:

  \x.(e x) =b= e     (x not in FV(e))

This is called the "eta-rule".  It reflects the fact that if x is not
in e, then \x.(e x) e' =b= e e'.  That is, when we apply \x.(e x) to
an argument, and we apply e to that same argument, we get out the same
result.

Neat fact:  if |- e1 =b= e2, then for all e, e1 e =b= e2 e.  This
just says that =b= captures our notion of equivalence. 

Unfortunately, we can't turn this into a decision procedure for
reasoning about the equivalence of programs since, as we've seen,
the lambda calculus is Turing complete.  Still, beta-equivalence
lets us easily reason about expressions in an algebraic style.

For instance, we can now prove that (inc 1) = 2 (exercise) using
these rules.

It's much, much easier to use these rules than to construct some
denotational semantics and try to prove that two expressions are
equivalent.  In fact, there's a lot of literature in the functional
world on writing naive, but easy to understand programs as your
specification.  Then, you derive an efficient implementation
by using these algebraic laws to get something that is provably
equivalent to your specification.  

-----------------------------------------------------------------------
Call-by-Value vs. Call-by-Name

Note that the pure lambda calculus does not evaluate arguments
before calling a function:

  (\x.e1) e2 => e1[e2/x]

This was extremely useful for implementing "if" as a function:

  if =def= \b.\t.\e.b t e

because when we write "if e then e1 else e2" we do *not* want
to evaluate e1 or e2 until we've evaluated e.  Otherwise, in
a recursive function, we'd loop forever.  

We say that the pure lambda calculus is call-by-name.  The
terminology comes from Algol and related languages.  

In contrast, ML, C, C++, Java, etc. are call-by-value languages.
In a call-by-value language, the evaluation rules are roughly:

  (\x.e1) v => e1[v/x]

  e1 => e1'
  ---------------
  e1 e2 => e1' e2

  e2 => e2'
  -------------
  v e2 => v e2'

where v ranges over values.  In the case of the lambda calculus,
the only values we have are functions (\x.e).  Note that the
rules effectively force you to evaluate in a left-to-right,
inner-most-to-outermost fashion, and force you to evaluate
the argument before you call the function.  

Call-by-value (CBV) doesn't work well when we decide to define
something like if, because it eagerly evaluates the "then"
and "else" clauses.  For instance, if you define in ML:

  val True = fn x => fn y => x
  val False = fn x => fn y => y
  val If = fn x => fn y => fn z => x y z

and then write:

  If True (print "Hello") (print "Goodbye")

then you'll get both "Hello" and "Goodbye" printed.  In a
call-by-name language, you'd only get "Hello" which is what
we normally intend.  

Call-by-name (CBN) isn't always better though.  Consider:

  double = fn x => x + x

If we did something like:

  double (big-computation-that-takes-2-weeks)

then in CBN, this would reduce to:

  (big-computation-that-takes-2-weeks) + (big-computation-that-takes-2-weeks)

and you'd end up computing for 4 weeks (plus a little :-)).  In
call-by-value, you'd first reduce (big-computation-that-takes-2-weeks)
to its value (say 3) and then call couble:

  double (big-computation-that-takes-2-weeks) =>
  double 3 =>
  3 + 3

which only takes 2 weeks.  So, in principle, if we ignore
efficiency, CBN is better because we don't compute something
if we don't need it.  But in practice, for most things, CBV 
is better because we only compute things at most once.  

Another option is call-by-need (aka Lazy) evaluation.  This
requires a more complicated model so we won't formalize it
here.  But the basic idea is to insert a level of indirection
and a flag and only evaluate things once.  Here's how we might
encode a lazy computation in ML:

  datatype 'a E = Value of 'a | Computation of unit -> 'a
  type 'a thunk = ('a E) ref

  fun read_thunk r = 
     case (!r) of
       Value v => v
     | Computation f => 
       let val v = f()
       in
           r := (Value v);
           v
       end

  fun new_thunk f = ref (Computation f)

I can simulate something like "if e1 then e2 else e3" by
writing:

  fun If (e1:'a thunk->'a thunk->'a thunk) (e2: 'a thunk) (e3: 'a thunk) = 
       e1 e2 e3

  fun True (x:'a thunk) (y:'a thunk) = x
  fun False (x:'a thunk) (y:'a thunk) = y

Now, I could write:

  read_thunk(If True (new_thunk (fn () => print "Hello")) 
             (new_thunk (fn () => print "Goodbye")))

This will evaluate to unit and print "Hello".  If I write:

  fun double (x: int thunk) = 
      (read_thunk x) + (read_thunk x)

then I can write:

  double (new_thunk (fn () => big-computation-that-takes-2-weeks))

and I'll only do the computation once.  But note that I could also
write:

  fun forget (x: int thunk) = 3

and 

  forget (new_thunk (fn () => big-computation-that-takes-2-weeks))

will run in constant time (i.e., won't take 2 weeks).