CS 3110 Lecture 12
Modular Verification

In this lecture we'll look at verification of code in the presence of abstractions. Last time we saw how to prove that a function satisfies its specifications. Often, what we want to show is that a concrete implementation correctly matches an abstract type. Many of the structures we use correspond to abstract mathematical objects which have well-understood operations - for example, the union of two sets. We want to know that our implementation matches the math.

As a simple example, suppose we didn't have a boolean type in our language and we needed to implement it. Here is a potential signature:

module type BOOL = sig
  type bool
  val true: bool
  val false: bool
  val not: bool -> bool
  val or: bool * bool -> bool
  val and: bool * bool -> bool
  val if: bool * 'a * 'a -> 'a
end

It's not important for the purposes of this lecture, but there are two problems with this signature that mean it wouldn't quite work in SML. That aside, what are potential implementations for this? Here's one you might see in 611:

module FunBool :> BOOL = struct
  type 'a bool = 'a * 'a -> 'a
  let false (a,b) = b
  let true (a,b) = a
  let if (c,a,b) = c(a,b)
  let or (x,y) = fn(a,b) => x(a,y(a,b))
  let and (x,y) = fn(a,b) => x(y(a,b),b)
end

This slightly mind-bending implementation does not use sml's existing conditionals at all! It doesn't actually quite match the BOOL signature, because it requires that bool be polymorphic, but apart from that it works!

Here's a less strange implementation using ints:

structure IntBool :> BOOL = struct
  type bool = int
  (* RI: value must be either 0 or 1 
     AF: 0 represents false, 1 represents true *)
  val false = 0
  val true = 1
  let not (x: bool) = 1 - x
  let and (x:bool,y:bool) = x * y
  let or (x:bool,y:bool) = x + y - x * y
  let if (c:bool, e1:'a, e2:'a) =
    if c > 0 then e1 else e2
end

Does this really work? We will show that we can actually prove that it does. We will show that the or function is correct. What does that mean? We need to show two things:

Proof of correctness

Suppose we know RI(b1) and RI(b2), that is, b1 and b2 are both either 0 or 1. First we'll show that RI(or(b1,b2)), that the output of or is also either 0 or 1. The simplest thing to do is to look at a table of outputs.

b0b1or(b0,b1)
000+0-0*0=0
010+1-0*1=1
101+0-1*0=1
111+1-1*1=1

We see that in all cases, the output of the or function is either 0 or 1, so RI(b1,b2) is still satisfied.

Next, we need to show that or actually does the right thing, i.e that if RI(b0) and RI(b1), AF(or(b1,b2))=AF(b1) ∨ AF(b2). Let's just add a few more columns to the previous table:

b0b1or(b0,b1)AF(b0)AF(b1)AF(or(b0,b1))AF(b0)∨AF(b1)
000+0-0*0=0falsefalsefalsefalse
010+1-0*1=1falsetruetruetrue
101+0-1*0=1truefalsetruetrue
111+1-1*1=1truetruetruetrue

And so we see that or both maintains the representation invariant and produces the correct result - providing the representation invariant was true in the first place.

By the way, if you're wondering where x+y-x*y came from, you can get it using DeMorgan's laws. or(x,y)=not(and(not x,not y))=1-((1-x)*(1-y))=x+y-x*y

Module verification using induction

Now comes the tricky part; we are going to prove correct a module which contains recursive functions. Consider this interface for a set of natural numbers:

  (* A "set" is a set of natural numbers. *)
  type set
  (* contains x s  is whether x∉s. *)
  val contains: int -> set -> bool
  (* union s1 s2  is s1 ∪ s2. *)
  val union: set -> set -> set

Here is a partial implementation:

  type set = int list
  (* AF: [x1,..., xn] represents {x1, ... xn}.
         AF([]) = ∅
	  AF(h::t) = {h} ∪ AF(t)
     RI: no duplicates or negative elements.
         RI([]) = true.
	  RI(h::t) = h≥0 ∧ RI(t) ∧ h∉AF(t)
     *)
  let contains (x:int) (s:set) = ... (* implementation not shown *)
  let union (s1: set, s2: set) = 
    let f s x = if contains x s then s else x::s
    in
      fold_left f s1 s2
  ... (* other set functions not shown *)

As before, union is correct if assuming RI(s1) and R(s2), we can show RI(union s1 s2) and also AF(union s1 s2)=AF(s1) ∪ AF(s2).

We will assume that we have already proved an equivalent statement for the contains function. Also, from the definition of fold_left, we can observe that fold_left f a [] evaluates to a, and fold_left f a (h::t) evaluates to fold_left f (f(h,a)) t.

Proof of correctness

We use proof by induction on the length of s2. In order to use proof by induction, we need to state what we seek to prove in terms of some proposition P(n), so that our goal is to prove the proposition For all n ≥ 0, P(n). Here, our P(n) is if RI(s1) and RI(s2) and s2 has length n, fold_left f s1 s2 evaluates to a list l such that RI(l) is true and AF(l) = AF(s1) ∪ AF(s2).

Our base case is n=0, meaning s2 is []. In this case, fold_left f s1 [] evaluates to s1, so l=s1. RI(s1) so RI(l), and AF(s1) ∪ AF([]) = AF(s1) ∪ ∅ = AF(s1) = AF(l).

For our inductive step, we assume P(n) and seek to prove P(n+1). P(n+1) states if RI(s1) and RI(s2) and s2 has length n+1, fold_left f s1 s2 evaluates to a list l such that RI(l) is true and AF(l) = AF(s1) ∪ AF(s2).

Proof of P(n+1): Let s1, s2=[v1, v2, ..., vn+1]=v1::s2', such that RI(s1), RI(s2) and s2 has length n+1 (so s2' has length n). Let us evaluate fold_left f s1 s2.

fold_left f s1 s2 =
fold_left f (f(v1,s1)) s2' =
fold_left f (if contains v1 s1 then s1 else v1::s1) s2'
fold_left f s1' s2'

where we define s1' as the result of evaluating if contains v1 s1 then s1 else v1::s1.

Since we know RI(s1), we know contains works (i.e. contains x s is true iff x is a member of AF(s)). Consider the evaluation of if contains v1 s1 then s1 else v1::s1, which we have named s1'. If contains v1 s1 is true, then s1'=s1, and RI(s1), so RI(s1'). Also, v1 is a member of AF(s1), so AF(s1')=AF(s1)=AF(s1) ∪ {v1}. If contains v1 s1 is false, then RI(s1'=v1::s1) because v1 is not negative (it came from s2, which satisfied the rep invariant) and v1 does not duplicate any member of s1. (s1') will evaluate to an expression containing no duplicates. Also, v1 is not in AF(s1) so AF(s1'=v1::s1)=AF(s1) ∪ {v1}.

If RI(s2), then RI is also true of a sublist of s2, i.e. s2'. Since RI(s1'), RI(s2') and s2' has length n, we can use our induction hypothesis P(n) to say that fold_left f s1' s2' evaluates to a list l such that RI(l) is true and AF(l) = AF(s1') ∪ AF(s2').

Now, putting it all together, fold_left f s1 s2 evaluates to fold_left f s1' s2', which evaluates to l such that RI(l) is true and AF(l) = AF(s1') ∪ AF(s2'). Moreover, AF(l) = AF(s1') ∪ AF(s2') = AF(s1) ∪ {v1} ∪ AF(s2') = AF(s1) ∪ {v1} ∪ AF([v2, ..., vn+1]) = AF(s1) ∪ {v1, v2, ..., vn+1} = AF(s1) ∪ AF(s2). Thus, we have shown P(n+1) to be true, and so proved our hypothesis, that union is correct!

Problems with the BOOL signature

This material is optional, and only included for the curious

One problem with the signature as written is that several of the identifiers are SML reserved words, such as if. If we were actually implementing this, we could use an alternate name like if_. The second problem is the meaning of 'if'. We want the evaluation rule for if c then e1 else e2 to first evaluate c, then only evaluate one of e1 or e2, depending on the value of c. If we implement val if_:bool * 'a * 'a, the evaluation rules for SML specify that both e1 and e2 be evaluated. This would be a problem, for example, for some recursive functions, e.g., fun factorial n = if n = 0 then 1 else n*(factorial (n-1)). If implemented using our BOOL sig, this would result in an infinite loop. One solution would be to change the type of if_ to take unit -> 'a instead of 'a. Then the evaluation of the expressions could be delayed. For more on this, take CS 4110 or CS 6110.