CS 312 Lecture 12
Modular Verification

In this lecture we'll look at verification of code in the presence of abstractions. Last time we saw how to prove that a function satisfies its specifications. Often, what we want to show is that a concrete implementation correctly matches an abstract type. Many of the structures we use correspond to abstract mathematical objects which have well-understood operations - for example, the union of two sets. We want to know that our implementation matches the math.

As a simple example, suppose we didn't have a boolean type in our language and we needed to implement it. Here is a potential signature:

signature BOOL = sig
  type bool
  val true: bool
  val false: bool
  val not: bool -> bool
  val or: bool * bool -> bool
  val and: bool * bool -> bool
  val if: bool * 'a * 'a -> 'a
end

It's not important for the purposes of this lecture, but there are two problems with this signature that mean it wouldn't quite work in SML. That aside, what are potential implementations for this? Here's one you might see in 611:

structure FunBool :> BOOL = struct
  type 'a bool = 'a * 'a -> 'a
  fun false (a,b) = b
  fun true (a,b) = a
  fun if(c,a,b) = c(a,b)
  fun or (x,y) = fn(a,b) => x(a,y(a,b))
  fun and (x,y) = fn(a,b) => x(y(a,b),b)
end

This slightly mind-bending implementation does not use sml's existing conditionals at all! It doesn't actually quite match the BOOL signature, because it requires that bool be polymorphic, but apart from that it works!

Here's a less strange implementation using ints:

structure IntBool :> BOOL = struct
  type bool = int
  (* RI: value must be either 0 or 1 
     AF: 0 represents false, 1 represents true *)
  val false = 0
  val true = 1
  fun not (x: bool) = 1 - x
  fun and (x:bool,y:bool) = x * y
  fun or (x:bool,y:bool) = x + y - x * y
  fun if (c:bool, e1:'a, e2:'a) =
    if c > 0 then e1 else e2
end

Does this really work? We will show that we can actually prove that it does. We will show that the or function is correct. What does that mean? We need to show two things:

Proof of correctness

Suppose we know RI(b1) and RI(b2), that is, b1 and b2 are both either 0 or 1. First we'll show that RI(or(b1,b2)), that the output of or is also either 0 or 1. The simplest thing to do is to look at a table of outputs.

b0b1or(b0,b1)
000+0-0*0=0
010+1-0*1=1
101+0-1*0=1
111+1-1*1=1

We see that in all cases, the output of the or function is either 0 or 1, so RI(b1,b2) is still satisfied.

Next, we need to show that or actually does the right thing, i.e that if RI(b0) and RI(b1), AF(or(b1,b2))=AF(b1) ∨ AF(b2). Let's just add a few more columns to the previous table:

b0b1or(b0,b1)AF(b0)AF(b1)AF(or(b0,b1))AF(b0)∨AF(b1)
000+0-0*0=0falsefalsefalsefalse
010+1-0*1=1falsetruetruetrue
101+0-1*0=1truefalsetruetrue
111+1-1*1=1truetruetruetrue

And so we see that or both maintains the representation invariant and produces the correct result - providing the representation invariant was true in the first place.

By the way, if you're wondering where x+y-x*y came from, you can get it using DeMorgan's laws. or(x,y)=not(and(not x,not y))=1-((1-x)*(1-y))=x+y-x*y

Module verification using induction

Now comes the tricky part; we are going to prove correct a module which contains recursive functions. Consider this partial implementation of a set of natural numbers as a list of integers:

structure Natset = struct
  open List
  type set = int list
  (* AF: [x1,..., xn] represents {x1, ... xn} *)
  (* RI: no duplicates or negative elements *)
  fun contains(s1: set, x: int) = ... (* implementation not shown *)
  fun union (s1: set, s2: set) = 
    let 
      fun helper (x,s) = if contains(s,x) then s else x::s
    in
      foldl helper s1 s2
    end
  ... (* other set functions not shown *)
end

As before, Natset.union is correct if assuming RI(s1) and R(s2), we can show RI(union(s1,s2)) and also AF(union(s1,s2))=AF(s1) ∪ AF(s2).

We will assume that we have already proved an equivalent statement for the contains function. Also, from the definition of foldl, we can observe that foldl f a [] evaluates to a, and foldl f a (h::t) evaluates to foldl f (f(h,a)) t.

Proof of correctness

We use proof by induction on the length of s2. In order to use proof by induction, we need to state what we seek to prove in terms of some proposition P(n), so that our goal is to prove the proposition For all n ≥ 0, P(n). Here, our P(n) is if RI(s1) and RI(s2) and s2 has length n, foldl helper s1 s2 evaluates to a list l such that RI(l) is true and AF(l) = AF(s1) ∪ AF(s2).

Our base case is n=0, meaning s2 is []. In this case, foldl helper s1 [] evaluates to s1, so l=s1. RI(s1) so RI(l), and AF(s1) ∪ AF([]) = AF(s1) ∪ ∅ = AF(s1) = AF(l).

For our inductive step, we assume P(n) and seek to prove P(n+1). P(n+1) states if RI(s1) and RI(s2) and s2 has length n+1, foldl helper s1 s2 evaluates to a list l such that RI(l) is true and AF(l) = AF(s1) ∪ AF(s2).

Proof of P(n+1): Let s1, s2=[v1, v2, ..., vn+1]=v1::s2', such that RI(s1), RI(s2) and s2 has length n+1 (so s2' has length n). Let us evaluate foldl helper s1 s2.

foldl helper s1 s2 =
foldl helper (helper(v1,s1)) s2' =
foldl helper (if contains(s1,v1) then s1 else v1::s1) s2'
foldl helper s1' s2'

where we define s1' as the result of evaluating if contains(s1,v1) then s1 else v1::s1.

Since we know RI(s1), we know contains works (i.e. contains(x,s) is true iff x is a member of AF(s)). Consider the evaluation of if contains(s1,v1) then s1 else v1::s1, which we have named s1'. If contains(s1,v1) is true, then s1'=s1, and RI(s1), so RI(s1'). Also, v1 is a member of AF(s1), so AF(s1')=AF(s1)=AF(s1) ∪ {v1}. If contains(s1,v1) is false, then RI(s1'=v1::s1) because v1 is not negative (it came from s2, which satisfied the rep invariant) and v1 does not duplicate any member of s1. (s1') will evaluate to an expression containing no duplicates. Also, v1 is not in AF(s1) so AF(s1'=v1::s1)=AF(s1) ∪ {v1}.

If RI(s2), then RI is also true of a sublist of s2, i.e. s2'. Since RI(s1'), RI(s2') and s2' has length n, we can use our induction hypothesis P(n) to say that foldl helper s1' s2' evaluates to a list l such that RI(l) is true and AF(l) = AF(s1') ∪ AF(s2').

Now, putting it all together, foldl helper s1 s2 evaluates to foldl helper s1' s2', which evaluates to l such that RI(l) is true and AF(l) = AF(s1') ∪ AF(s2'). Moreover, AF(l) = AF(s1') ∪ AF(s2') = AF(s1) ∪ {v1} ∪ AF(s2') = AF(s1) ∪ {v1} ∪ AF([v2, ..., vn+1]) = AF(s1) ∪ {v1, v2, ..., vn+1} = AF(s1) ∪ AF(s2). Thus, we have shown P(n+1) to be true, and so proved our hypothesis, that Natset.union is correct!

Problems with the BOOL signature

This material is optional, and only included for the curious

One problem with the signature as written is that several of the identifiers are SML reserved words, such as if. If we were actually implementing this, we could use an alternate name like if_. The second problem is the meaning of 'if'. We want the evaluation rule for if c then e1 else e2 to first evaluate c, then only evaluate one of e1 or e2, depending on the value of c. If we implement val if_:bool * 'a * 'a, the evaluation rules for SML specify that both e1 and e2 be evaluated. This would be a problem, for example, for something like a recursive call fun factorial n = if n = 0 then 1 else n*(factorial (n-1)) - if implemented using our BOOL sig, this would result in an infinite loop. One solution would be to change the type of if_ to take unit -> 'a instead of 'a. Then the evaluation of the expressions could be delayed. For more on this, take 411 or 611.