\documentclass{article}
\usepackage{611-lecture}
\usepackage{amsmath}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% CS611: Please fill in these macros as appropriate:
\lecture{2}                  %% Lecture number
\title{Lambda Calculus}   %% Title of lecture
\lecturer{Michael Clarkson}
%\author{Oliver Kennedy}       %% name of scribe
\date{28 August, 2006}     %% Date of lecture, e.g., 1 January 2001
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% See 611.sty for a variety of macros that will be helpful in
% typesetting the lecture. Here are a few of particular interest:
%
% "x"       x in keyword font (e.g., "if", "#t")
% _x_       x in italics
% \nm{n}    n in slanted font (used for abbreviations)
% <e>       e in angle brackets
% \lt       less-than sign
% \gt       greater-than sign
% \SB{x}    x in semantic brackets
% \Tr x{y}  x[[y]] with x in calligraphic font
%           (if x is more than a single character, use \Tr{x}{y})

\begin{document}

\maketitle

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% CS611: SCRIBE NOTES GO HERE!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{The Lambda Calculus}

\subsection{Recap---$\lambda$-terms}

Lambda calculus is a notation for describing mathematical functions and programs.
A $\lambda$-term is: 
\begin{enumerate}
\item a variable $x\in\Set{Var}$, where \Set{Var} is a countably infinite set of variables;
\item a function $e_0$ applied to an argument $e_1$, usually written $e_0~e_1$ or $e_0(e_1)$; or
\item an expression $\lam xe$ denoting a function with input parameter $x$ and body $e$.
\end{enumerate}
In BNF notation,
\[ e \ \ ::=\ \ x \bnf  \lam{x}{e} \bnf  e_0\,e_1 \]

Parentheses are used just for grouping; they have no
meaning on their own.  Lambdas are greedy, extending as far to the right as they can.
For simplicity, multiple variables may be placed after the lambda, and this is considered
shorthand for having a lambda in front of each variable.
For example, we write $\lam{xy}{e}$ as shorthand for $\lam{x}{\lam{y}{e}}$.
This shorthand is an example of _syntactic sugar_.  The process of removing it in this
instance is called _currying_.

Where a mathematician might write $x \mapsto x^{2}$, in the $\lambda$-calculus we would write $\lambda x.x^2$.  This suggests that functions are just ordinary values, and can be passed as arguments to functions (even to themselves!).

The $\lambda$-calculus is a mathematical system for studying the interaction of _functional abstraction_ and _functional application_.  In the _pure_ $\lambda$-calculus, $\lambda$-terms serve as both functions and data.

\subsection{Recap---BNF Notation}

The grammar
\[ e \ \ ::=\ \ x \bnf  \lam{x}{e} \bnf  e_0~e_1 \]
describing the syntax of the pure $\lambda$-calculus,
the $e$ is not a variable in the language, but a _metavariable_ representing
a syntactic class (in this case $\lambda$-terms) in the language.  It is not a variable at the level
of the programming language.  We use subscripts to differentiate
syntactic metavariables of the same syntactic class. For example, $e_0$, $e_1$ and
$e$ all represent $\lambda$-terms.

\subsection{Recap---Variable Binding}

Occurrences of variables in a $\lambda$-term can be _bound_ or _free_.
In the $\lambda$-term $\lam xe$, the lambda abstraction operator $\lambda x$
binds all the free occurrences of $x$ in $e$.
The _scope_ of $\lambda x$ in $\lam xe$ is $e$.
This is called _lexical scoping_; the variable's scope is
defined by the text of the program. It is ``lexical'' because it is
possible to determine its scope before the program runs by inspecting the
program text.  A term is _closed_ if all variables are bound.
A term is _open_ if it is not closed.

\subsection{Digression---Terms and Types}

There are different kinds of expressions in a typical programming
language: _terms_ and _types_.  We have not talked about types yet, but we will soon.
A term represents a value that exists only at run time; a type is a compile-time expression
used by the compiler to rule out ill-formed programs.  For now there are no types.

\section{Substitution and $\beta$-reduction}

Now we get to the question: How do we run a $\lambda$-calculus program?
The main computational rule is called _$\beta$-reduction_.  This rule applies
whenever there is a subterm of the form $(\lam{x}{e_1})~e_2$ representing the
application of a function $\lam x{e_1}$ to an argument $e_2$.

Intuitively, to perform the $\beta$-reduction, we substitute the argument
$e_2$ for all free occurrences of the formal parameter $x$ in the body $e_1$,
then evaluate the resulting expression (which may involve further such steps).

We have to be a little careful though.  We cannot just substitute $e_2$
blindly for $x$ in $e_1$, because bad things could happen which could alter
the semantics of expressions in undesirable ways.  For example, if $e_2$ contained
a free occurrence of a variable $y$, and there were a free occurrence of $x$
in the scope of a $\lambda y$ in $e_1$, then the free occurrence of $y$ in $e_2$ would be
``captured'' by that $\lambda y$ and would end up bound to it after the substitution.
This would not be good.  However, we can avoid the problem by renaming the bound variable $y$.

\subsection{Safe Substitution}

We wish to define a notion of _safe substitution_ in which undesired capture
of variables is avoided by judicious renaming of bound variables.
We write $\subst{e_1}{e_2}x$ to denote the result of substituting $e_2$ for all free occurrences
of $x$ in $e_1$ according to the following rules:
\[
\begin{array}{rcll}
\subst xex &=& e\\
\subst yex &=& y & \mbox{where $y\neq x$}\\
\subst{(e_1\,e_2)}ex &=& \subst{e_1}ex\cdot\subst{e_2}ex\\
\subst{(\lam x{e_0})}{e_1}x &=& \lam x{e_0}\\
\subst{(\lam y{e_0})}{e_1}x &=& \lam y{\subst{e_0}{e_1}x} & \mbox{where $y\neq x$ and $y\not\in\FV{e_1}$}\\
\subst{(\lam y{e_0})}{e_1}x &=& \lam z{\subst{\subst{e_0}zy}{e_1}x} & \mbox{where $y\neq x$, $z\neq x$, $z\not\in\FV{e_0}$, and $z\not\in\FV{e_1}$}.
\end{array}
\]
(There are many notations for substitution. Pierce writes $[x \mapsto e_2]e_1$.
Because we will be using similar notation for something else, we
will use the notation $\subst{e_1}{e_2}{x}$.)

Note that the rules are applied inductively.  That is, the result of a substitution in a compound term is defined in terms of substitutions on its subterms.  The very last of the six rules applies when $y\in\FV{e_1}$.  In this case we can rename the bound variable $y$ to $z$ to avoid capture of the free occurence of $y$.  One might well ask: But what if $y$ occurs free in the scope of a $\lambda z$ in $e_0$?  Wouldn't the $z$ then be captured?  The answer is that it will be taken care of in the same way, but inductively on a smaller term.

Rewriting $(\lam{x}{e_1})~e_2$ to $\subst{e_1}{e_2}{x}$ is the basic computational step
of the $\lambda$-calculus and is called _$\beta$-reduction_.  In the pure $\lambda$-calculus,
we can start with a $\lambda$-term and perform $\beta$-reductions on subterms in any order.

\section{Call-by-Name and Call-by-Value}

Now we have another issue.  In general there may be many possible $\beta$-reductions that can be performed on a given
$\lambda$-term.  How do we choose which beta reductions to perform next?  Does it matter?

A specification of which $\beta$-reduction to perform next is called a _reduction strategy_.
Let us define a _value_ to be a closed $\lambda$-term to which no $\beta$-reductions are possible, given our
chosen reduction strategy.  For example, $\lambda x.x$ would always be value, whereas $(\lambda x.x)1$ would
most likely not be. 

Most real programming languages based on the $\lambda$-calculus use a reduction strategy known as _call By value_ (CBV).
In other words, they may only call functions on values.  Thus
$(\lambda x.e_{1}) e_{2}$ only reduces if $e_{2}$ is a value.  Here is an example of a CBV evaluation sequence,
assuming $3$, $4$, and $s$ (the successor function) are appropriately defined.
\[
((\lambda x.\lambda y.y~x)~3)~\nm{s}
\ \ \stepsone\ \ (\lambda y.y~3)~\nm{s}
\ \ \stepsone\ \ \nm{s}~3
\ \ \stepsone\ \ 4.
\]

Another strategy is _call by name_ (CBN).  We defer
evaluation of arguments until as late as possible, applying reductions
from left to right within the expression.

\section{Structured Operational Semantics (SOS)}

Let's formalize CBV with a few inference rules.
\[
%\infer[\mbox{[$\beta$-reduction]}]{(\lam{x}{e})~v \stepsone \subst{e}{v}{x}}
\frac{}{(\lam{x}{e})~v \stepsone \subst{e}{v}{x}} \quad \mbox{[$\beta$-reduction]}
\]
\[
%\Rule{e_1 \stepsone e'_1}{e_1~e_2 \stepsone e'_1~e_2}
\frac{e_1 \stepsone e'_1}{e_1~e_2 \stepsone e'_1~e_2}
\]
\[
%\Rule{e \stepsone e'}{ v~e \stepsone v~e'}
\frac{e \stepsone e'}{ v~e \stepsone v~e'}
\]

This is a simple operational semantics for a programming language based on the
lambda calculus.  An operational semantics is a language semantics that
describes how to run the program. This can be done through informal
human-language text, as in the Java Language Specification, or through
more formal rules.  Rules of this form are known as a
Structural Operational Semantics (SOS).  They define evaluation as the result of
applying the rules to transform the expression.  The rules are typically
defined in term of the structure of the expression being evaluated.

This kind of operational semantics is known as a _small-step_ semantics because
it only describes one step at a time.  An alternative is a _big-step_ (or _large-step_)
semantics that describes the entire evaluation of the program to a final value.

We will see other kinds of semantics later in the course, such as
_axiomatic semantics_, which describes the behavior of a program in terms of
the observable properties of the input and output states, and
_denotational semantics_, which translates a program into
an underlying mathmatical representation.  

Expressed as SOS, CBN has slightly simpler rules:

\[
%\infer[\mbox{[$\beta$-reduction]}]{(\lam x{e_1})~e_2 \stepsone \subst{e_1}{e_2}{x}}{}
\frac{}{(\lam x{e_1})~e_2 \stepsone \subst{e_1}{e_2}{x}} \quad \mbox{[$\beta$-reduction]}
\]

\[
%\Rule{e_0 \stepsone e'_0}{e_0~e_1 \stepsone e'_0~e_1}
\frac{e_0 \stepsone e'_0}{e_0~e_1 \stepsone e'_0~e_1}
\]

We don't need the rule for evaluating the right-hand side of an
application because $\beta$-reductions are done immediately once the
left-hand side is a value.

\subsection{$\Omega$}

Let us define an expression we will call $\Omega$:
\[
\Omega\ \ =\ \ (\lam{x}{x~x})~(\lam{x}{x~x})
\]
What happens when we try to evaluate it?
\[
\Omega\ \ =\ \ (\lam{x}{x~x})~(\lam{x}{x~x})
\ \ \stepsone\ \ \subst{(x~x)}{(\lam{x}{x~x})}x
\ \ =\ \ \Omega
\]
We have just coded an infinite loop!

Now what happens if we try using $\Omega$ as a parameter?
\[
 (\lambda x.(\lambda y.y))~\Omega
\]
Using the CBV evaluation strategy, we must first
reduce $\Omega$.  This puts the evaluator into an infinite loop.  On
the other hand, CBN reduces the term above to $\lam{y}{y}$.  CBN has an
important property: CBN will not loop infinitely unless every other
semantics would also loop infinitely, yet it agrees with CBV
whenever CBV terminates successfully.

\end{document}