CS 312 Lecture 10
Rep invariants and program correctness

We have identified two pieces of implementation-side specification: the abstraction function and the representation invariant. They need to be provided in every module implementation so that implementers can use local reasoning to figure out whether the code they are reading or writing is correct. Ordinary function specifications that appear in the module interface (signature) are a contract between the implementer and the user of the module. By contrast, the abstraction function and rep invariant are a contract between the implementer and other implementers or maintainers of the code.

Reasoning locally using representation invariants

A rep invariant is a condition that is intended to hold for all values of an abstract type. The abstraction barrier ensures that the module is the only place that the rep invariant can be broken; it is the only place that the concrete type of the values is known.

Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single operation is correct in isolation from the rest of the module. It is correct if, assuming that:

  1. the function's requires and checks clauses hold and
  2. the concrete representations of all values of the abstract type satisfy the rep invariant

we can show that

  1. the returns clause of the function is satisfied (that is, the commutation diagram holds) and
  2. all new values of the abstract type that are created have concrete representations satisfying the rep invariant

The rep invariant makes it easier to write code that is provably correct, because it means that we don't have to write code that works for all possible incoming concrete representations--only those that satisfy the rep invariant. This is why NatSetNoDups.union doesn't have to work on lists that contain duplicate elements, for example. On return from each operation there is a corresponding responsibility to produce only values that satisfy the rep invariant, ensuring that the rep invariant is in fact an invariant.

Explicit rep invariant checks: repOK

When implementing a complex abstract data type, it is often helpful to write a function internal to the module that checks that the rep invariant holds. This function can provide an additional level of assurance about your reasoning the correctness of the code. By convention we will call this function repOK; given an abstract type (say, set) implemented as a concrete type (say, int list)  it always has the same specification:

(* Returns whether x satisfies the representation invariant *)
fun repOK(x: int list): bool = ...

The repOK can be used to help us implement a module and be sure that each function is independently correct. The trick is to bulletproof each function in the module against all other functions by having it apply repOK to any values of the abstract type that come from outside. In addition, if it creates any new values of the abstract type, it applies repOK to them to ensure that it isn't breaking the rep invariant itself. With this approach, a bug in one function is less likely to create the appearance of a bug in another.

RepOK as an identity function

A more convenient way to write repOK is to make it an identity function that raises an exception if the rep invariant doesn't hold. Making it an identity function lets us conveniently test the rep invariant in various ways, as shown below.

(* The identity function. 
   Checks: whether x satisfies the rep. invariant. *)
fun repOK(x: int list): int list = ...

Here is an example of we might use repOK for the NatSetNoDups implementation of sets given in lecture:

structure NatSetNoDups :> NATSET = struct
  type set = int list
  (* AF: the list [a1,...,an] represents the set {a1,...,an}.
   * RI: list contains no negative elements or duplicates. *)

  fun repOK(s: int list): int list =
    case s of
      [] => s
    | h::t => if h < 0 orelse contains'(h, repOK t)
                then raise Fail "RI failed" else s
  and contains'(x: int,s: int list) =
   List.exists (fn y => y=x) s

  val empty: set = repOK []
  fun single(x: int): set = repOK [x]
  fun contains(x: int, s: set): bool = contains'(x, repOK s)
  fun union(s1: set, s2: set): set =
    repOK (foldl (fn (x,s) => if contains'(x,s) then s else x::s)
           (repOK s1) (repOK s2))
  fun size(s: set): int = length(repOK s)
end

Here, repOK is implemented using contains' rather than the function contains, because using contains would result in a lot of extra repOK checks.  This is a common pattern when implementing a repOK check. 

Production vs. development code

Calling repOK on every argument can be too expensive for the production version of a program. The repOK above is quite expensive (though it could be implemented more cheaply). For production code it may be more appropriate to use a version of repOK that only checks the parts of the rep invariant that are cheap to check. When there is a requirement that there be no run-time cost, repOK can be changed to an identity function (or macro) so the compiler optimizes away the calls to it. It is a good idea to keep around the full code of repOK (perhaps in a comment) so it can be easily reinstated during future debugging.

Using types to enforce repOK checks

For those who really want to be careful about enforcing and checking the rep invariant, there is a way to use the SML type system to impose more discipline. The trick is to define the abstract type as a singleton datatype, e.g.

datatype set = Rep of int list

This definition means that you cannot treat a list accidentally as a set; you have to write case...of Rep(lst) to convert a list to its abstract view. If the Rep constructor is only used within a special function up that performs the mapping of the abstraction function, then you cannot construct a set without checking that it satisfies the rep invariant:

fun up(l: int list): set = Rep(repOK l)
fun down(s: set): int list = case s of Rep(l) => repOK l

The down function is used to map the other way and obtain the concrete representation for an abstract value while checking that the representation satisfies the invariant.

The previous implementation can now be rewritten to use up and down in the various places that repOK was used formerly:

structure NatSetNoDups2 :> NATSET = struct
  datatype set = Rep of int list
  (* AF: Rep [a1,...,an] represents the set {a1,...,an}.
   * RI: list contains no negative elements or duplicates. *)

  fun repOK(s: int list): int list =
    case s of
      [] => s
    | h::t => if h < 0 orelse contains'(h, repOK t)
                then raise Fail "RI failed" else s
  and contains'(x:int,s:int list) =
    List.exists (fn y => y=x) s

  fun up(l: int list): set = Rep(repOK l)
  fun down(s: set): int list = case s of Rep(l) => repOK l

  val empty: set = up []
  fun single(x: int): set = up [x]
  fun contains(x: int, s:set): bool = contains'(x, down s)
  fun union(s1: set, s2: set): set =
    up (foldl (fn (x,s) => if contains'(x,s) then s else x::s)
        (down s1) (down s2))
  fun size(s: set): int = length(down s)
end

The type system will force us to unwrap each input set to obtain the underlying integer list, and then wrap the resulting list back into a set. The up and down functions perform these conversions; they automatically check the rep invariant while performing the conversions.

Proving Program Correctness

What does it mean to prove that we have a correct implementation? Can write a proof that the program is correct? Testing the program for a set of possible inputs (and checking the corresponding outputs) is an easy way of getting more convinced that the program works correctly. But are we sure that it will work correctly for all inputs, and we haven't forgotten some corner case? Next time you board on a plane, think about the correctness of the critical pieces of software that control parts of your plane.

Let's consider the implementation of sets using lists with no duplicates. What does it mean that such a set implementation is correct? Informally, the implementation of an operation (e.g., union, size, etc) that works over a concrete structure (a list in this case) is correct if it performs the appropriate operation in the abstract domain (e.g., set union, set size, etc).  We know the we can express this notion using the abstraction function (AF) via the commutative diagram. In addition, each operation should expect well-formed concrete structures (i.e., structures that satisfy the RI) and produce well-formed data structures (that satisfy the RI).

Let's take the code that implements set union:

fun union(s1: int list, s2: int list): int list =
    foldl (fn (x,s) => if contains(x,s) then s else x::s) s1 s2

Proving that this implementation is correct requires proving that:

  1. if:      s1 and s2 satisfy the RI
  2. then: union(s1,s2) satisfies the RI, and
  3.          AF(union(s1,s2)) = AF(s1) U AF(s2)

The requirements 1 and 2 talks about maintaining the RI, whereas requirement 3 talks about the implemention modeling the set union operation in the abstract domain. Note that the code needs to work correctly only for inputs that are well-formed, i.e., that satisfy the RI. The key technique behind proving such properties is induction.

Induction Overview

You've learned about this mathematical induction over natural numbers in CS211. To prove that a property P(n) holds for all natural numbers n >= 0, you first prove the base case, that the property holds for the smallest natural number, n = 0. Then, you prove the inductive step: if P(n) holds for some number n, then P(n+1) also holds. Together, these two parts show that P(n) holds for all n >= 0. A simple analogy is that of an infinite sequence of dominoes: flipping the first piece corresponds to the base case, and then each piece will fall over the next one (the inductive step). As a result, all pieces will fall.

Induction can also be performed not only on natural numbers, but also on more complicated sets, like pairs of non-negative integers, or binary trees. For instance, proving that the above function union is correct requires induction on lists.

It is useful to keep in mind the key pieces of a proof by induction and remember to check that all of these items have been covered.  Use the following list as a recipe for inductive proofs:

  1. Write down the property P(n) that you are trying to prove.
  2. State what is the set that you're inducting on (i.e., the set that n ranges over).
  3. Base case: prove that the property holds for the least element(s) of the set
  4. Inductive step: state the induction hypothesis P(n). Then state and prove P(n+1)
  5. Clearly mention each application of the induction hypothesis (IH) and:
          a) show that all conditions for applying the IH are met
          b) indicate what you're applying the IH to.

Proving Correctness of Union

We can now formally prove that the above code correctly implements the set union operation by induction. For this code, the abstraction function AF maps lists to the sets that they correspond: AF([]) = {} and AF(h::t) = {h} U AF(t). Furthermore, the rep invariant (RI) for a list l states that there are no duplicates in the list.

We can prove that union(s1,s2) is correct by induction on the second list s2. We will formulate this as an induction on the length of s2.