Recitation 8: Program equivalence and structural induction

Overview

This recitation explores how we can use the substitution model of evaluation and structural induction to prove equivalences between OCaml programs.

Program Equivalence

Deciding whether two programs are equivalent is a fundamental question that arises in a wide range of contexts from debugging to compilation to optimization and complexity analysis. But what does it mean for two programs to be equivalent? Intuitively, they should produce identical outputs when they are provided with identical inputs, where an output can either be a value, side effects, an exception, or an infinite loop. As a simplification, if the programs are purely functional — that is, they do not exhibit side effects such as printing to the console or modifying mutable data structures — then we can simplify this to say that on any input, the programs should either both loop or both produce the same final value. We will adopt this simplified notion of equivalence in the rest of this recitation.

As examples of programs that we might attempt to prove equivalent, consider the following:

Structural Induction

In many situations, in order to prove programs equivalent, we will need to do a proof over infinite sets of structures — all natural numbers, lists, or trees. Establishing properties over inifinite sets is challenging because simple proof techniques such as exhaustive case analysis are often not powerful enough. Fortunately, there is a general proof technique called structural induction that can often be used to establish such properties. Structural induction exploits the fact that recursive data types are ultimately built up using applications of constructors — to construct a list, we "cons" the elements onto the "nil" element.

The structural induction principle for a data type can be obtained as follows. An OCaml data type is defined by a number of variants, each of which can wrap a tuple of values. Given such a definition, each non-recursive variant is a base-case, and each recursive variant is an inductive case, with an induction hypothesis for each occurence of a recursive type.

For the list data type

  type 'a list = 
     [] 
   | (::) of 'a * 'a list
the structural induction principle is as follows: given an arbitrary predicate on lists, if P([]) and for all values vh and vt we have that P(vt) implies P(vh::vt), then for all lists t, the predicate P(t) holds.

For the nat data type type nat = Zero | Succ of nat the structural induction principle is as follows: given an arbitrary predicate on lists, if P(Zero) and for all values vn we have that P(vn) implies P(Succ vn), then for all natural numbers n, the predicate P(n) holds.

What is the induction principles for the following data types?

Substitution Model Proofs

Using the substitution model and structural induction, we can prove many useful equivalences. As an example, we will prove the following equivalence,

let rec map f l = 
  match l with 
    | [] -> []
    | h::t -> (f h)::(map f t) in 
map g l 
and
let rec fold_right f l a = 
  match l with 
    | [] -> a 
    | h::t -> f h (fold_right f t a) in 
fold_right (fun x a -> g x::a) l []
using structural induction on lists.

To start, assume that we have generated fresh names map' and fold_right' for map and fold_right respectively, and added the following evaluation rules to the model:

map' --> 
  (fun f -> fun l -> 
     match l with 
       | [] -> []
       | h::t -> (f h)::(map' f t)  
and
fold_right' --> 
  (fun f -> fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> f h (fold_right' f t a) 

Next, we will prove a lemma showing that P(l) holds for all lists l, where P(l) is defined as follows;

P(l) =def=
  (fun f -> fun l -> 
     match l with 
       | [] -> []
       | h::t -> (f h)::(map' f t))
  g l 
=    
  (fun f -> fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> (f h)::(fold_right' f t a) 
  g l []

Let g be arbitrary. To use the structural induction principle for lists, we must prove two simpler properties, P([]) and forall vh, vt, P(vt) ==> P(vh::vt). We prove each separately.

P([]) 
  (fun f -> fun l -> 
     match l with 
       | [] -> []
       | h::t -> (f h)::(map' f t))
  g []
-->
  (fun l -> 
     match l with 
       | [] -> []
       | h::t -> (g h)::(map' g t))
  []
--> 
   match [] with 
     | [] -> []
     | h::t -> (g h)::(map' g t))
-->
  [] 

  (fun f -> fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> (f h)::(fold_right' f t a) 
  g [] []
--> 
  (fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> (g h)::(fold_right' g t a) 
  [] []
--> 
  (fun a -> 
     match [] with
       | [] -> a
       | h::t -> (g h)::(fold_right' g t a) 
  []
--> 
     match [] with
       | [] -> []
       | h::t -> (g h)::(fold_right' g t []) 
--> 
  [] 
Now let vh and vt be arbitrary, and assume P(vt).
P(vh::vt) 
  (fun f -> fun l -> 
     match l with 
       | [] -> []
       | h::t -> (f h)::(map' f t))
  g vh::vt
-->
  (fun l -> 
     match l with 
       | [] -> []
       | h::t -> (g h)::(map' g t))
  vh::vt
--> 
   match vh::vt with 
     | [] -> []
     | h::t -> (g h)::(map' g t))
-->
   (g vh)::(map' g vt)

  (fun f -> fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> (f h)::(fold_right' f t a) 
  g (vh::vt) []
--> 
  (fun l -> fun a -> 
     match l with
       | [] -> a
       | h::t -> (g h)::(fold_right' g t a) 
  (vh::vt) []
--> 
  (fun a -> 
     match (vh::vt) with
       | [] -> a
       | h::t -> (g h)::(fold_right' g t a) 
  []
--> 
    match vh::vt with
      | [] -> []
      | h::t -> g h::(fold_right' g t []) 
--> (g vh)::(fold_right' g vt [])

= (g vh)::map' g vt
where the last step follows using our assumption that P(vt). By the structural induction princple for lists, we conclude that P(l) holds for all lists l. The top-level equivalence is a simple calculation using this lemma.