CS 312 Lecture 9
Programming with rep invariants

We have identified two pieces of implementation-side specification: the abstraction function and the representation invariant. They need to be provided in every module implementation so that implementers can use local reasoning to figure out whether the code they are reading or writing is correct. Ordinary function specifications that appear in the module interface (signature) are a contract between the implementer and the user of the module. By contrast, the abstraction function and rep invariant are a contract between the implementer and other implementers or maintainers of the code.

Reasoning about nondeterminism

Suppose that we added a nondeterministic operation to our NATSET signature:

(* choose(s) is an element of s.
 * Checks: s non-empty *)
val choose: set -> int

Here is a possible implementation for the NatSet structure:

fun choose(s: set) =
  case s of
    [] => raise Fail "empty"
  | h::t => h

Is this a correct implementation? We said that an implementation is correct if it satisfies a commutation diagram. However, this specification is nondeterministic, which complicates our thinking about the diagram. Rather than mapping a set to a natural number, this specification maps a set to a set of natural numbers: all the possible natural numbers that might be returned according to the spec. The implementation of course returns the very first number in the list that represents the set.

Let's write choose to mean the abstraction operation described by the specification, and choose to mean the actual operation implemented above. If choose is applied to the set {1,2,3}, the possible results are 1, 2, and 3. If choose is applied to a representation of {1,2,3}, the possible results are also 1, 2, and 3, since we don't know which representation of {1,2,3} we got. If the representation were [1,2,3], then choose would return 1. If the representation were [2,1,3,2,3], which also represents {1,2,3}, then choose would return 2. Regardless of the representation of {1,2,3}, choose always returns one of the values (1,2,3) that choose does. That is why we say that choose is a correct implementation of its specification, choose.

Let AF be the abstraction function that maps concrete values to abstract values. Let f be the specification function that maps an abstract value to a set of abstract outputs. Let f be the actual implementation that maps a single concrete value to an output. Let x be a concrete input. Then the implementation is correct as long as AF(f(x)) is a subset of f (AF(x)):

AF(f(x)) ⊆ f (AF(x))

That is, the commutation holds with the proviso that the specification may permit more behaviors than the implementation actually exhibits, as illustrated in this figure:

In the case of our example function choose applied to some concrete list h::t, the abstract view is some set AF(h::t) = {h} U AF(t) . The set of values permitted by the specification is this entire set; the set of values produced by the implementation is just {h}, which is clearly a subset.

Reasoning locally

A rep invariant is a condition that is intended to hold for all values of an abstract type. The abstraction barrier ensures that the module is the only place that the rep invariant can be broken; it is the only place that the concrete type of the values is known.

Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single operation is correct in isolation from the rest of the module. It is correct if, assuming that:

  1. the function's requires and checks clauses hold and
  2. the concrete representations of all values of the abstract type satisfy the rep invariant

we can show that

  1. the returns clause of the function is satisfied (that is, the commutation diagram holds) and
  2. all new values of the abstract type that are created have concrete representations satisfying the rep invariant

The rep invariant makes it easier to write code that is provably correct, because it means that we don't have to write code that works for all possible incoming concrete representations--only those that satisfy the rep invariant. This is why NatSetNoDups.union doesn't have to work on lists that contain duplicate elements, for example. On return from each operation there is a corresponding responsibility to produce only values that satisfy the rep invariant, ensuring that the rep invariant is in fact an invariant.

Rep invariants and code evolution

Let us consider the rep invariant for the vector implementation of NATSET. There is some question about what we should write. One possibility is to write the strongest possible specification of the possible values that can be created by the implementation. It happens that the vector representing the set never has trailing false values:

structure NatSetVec :> NATSET = struct
  type set = bool vector
  (* Abstraction function: the vector v represents the set
     of all natural numbers i such that sub(v,i) = true

     Representation invariant: the last element of v is true
   *)
  val empty:set = Vector.fromList []
  

This representation invariant describes an interesting property of the implementation that may be useful in judging its performance. However, we don't need this rep invariant in order to show that the implementation is correct. If there were no rep invariant, we could still argue that the implementation works properly. All of the operations of NatSetVec will work even if sets are somehow introduced that violate the no-trailing-false property. It is not necessary to have the rep invariant in order to argue that the operations of NatSetVec are correct according to the 4-point plan above.

Further, a strong rep invariant is not always the best choice, because it restricts future changes to the module. We described interface specifications as a contract between the implementer of a module and the user. A rep invariant is a contract between the implementer and herself, or among the various implementers of the module, present and future. According to assumption 2, above, operations may be implemented assuming that the rep invariant holds. If the rep invariant is ever weakened (made more permissive), some parts of the implementation may break. It makes sense to avoid unnecessarily strengthening the invariant, because as the code evolves, it might later be necessary to weaken it -- and in that case, the entire module might have to be re-examined for correctness. 

One of the most important purposes of the rep invariant is to document exactly what may and what may not be safely changed about a module implementation. A weak rep invariant forces the implementer to work harder to produce a correct, efficient implementation, because less can be assumed about concrete representation values, but conversely it gives maximum flexibility for future changes to the code.

repOK

When implementing a complex abstract data type, it is often helpful to write a function internal to the module that checks that the rep invariant holds. This function can provide an additional level of assurance about your reasoning the correctness of the code. By convention we will call this function repOK; given an abstract type (say, set) implemented as a concrete type (say, int list)  it always has the same specification:

(* Returns whether x satisfies the representation invariant *)
fun repOK(x: int list): bool = ...

The repOK can be used to help us implement a module and be sure that each function is independently correct. The trick is to bulletproof each function in the module against all other functions by having it apply repOK to any values of the abstract type that come from outside. In addition, if it creates any new values of the abstract type, it applies repOK to them to ensure that it isn't breaking the rep invariant itself. With this approach, a bug in one function is less likely to create the appearance of a bug in another.

RepOK as an identity function

A more convenient way to write repOK is to make it an identity function that raises an exception if the rep invariant doesn't hold. Making it an identity function lets us conveniently test the rep invariant in various ways, as shown below.

(* The identity function. Checks whether x satisfies the representation invariant. *)
fun repOK(x: int list): int list = ...

Here is an example of we might use repOK for the NatSetNoDups implementation of sets given in lecture:

structure NatSetNoDups :> NATSET = struct
  type set = int list
  (* AF: the list [a1,...,an] represents the set {a1,...,an}.
   * RI: list contains no negative elements or duplicates.
   *)
  fun repOK(s: int list): int list =
    case s of
      [] => s
    | h::t => if h >= 0 andalso not(contains_internal(h,repOK(t)))
                then s
                else raise Fail "RI failed")
  and contains_internal(x:int,s:int list) =
    case s of
       [] => false
     | h::t => x = h orelse contains_internal(x,t)
  val empty = []
  fun single(x) = repOK([x])
  fun contains(x,s) = contains_internal(repOK(s))
  fun union(s1, s2) =
     repOK (foldl (fn (x,s) => 
                   if contains(x,s) then s else x::s)
            (repOK(s1)) (repOK(s2)))
  fun size(s) = length(repOK(s))
end

Here, repOK is implemented using contains_internal rather than the function contains, because using contains would result in a lot of extra repOK checks.  This is a common pattern when implementing a repOK check. 

Production vs. development code

Calling repOK on every argument can be too expensive for the production version of a program. The repOK above is quite expensive (though it could be implemented more cheaply). For production code it may be more appropriate to use a version of repOK that only checks the parts of the rep invariant that are cheap to check. When there is a requirement that there be no run-time cost, repOK can be changed to an identity function (or macro) so the compiler optimizes away the calls to it. It is a good idea to keep around the full code of repOK (perhaps in a comment) so it can be easily reinstated during future debugging.

Explicit up and down

For those who really want to be careful about enforcing and checking the rep invariant, there is a way to use the SML type system to impose more discipline. The trick is to define the abstract type as a singleton datatype, e.g.

datatype set = Rep of int list

This definition means that you cannot treat a list accidentally as a set; you have to write case...of Rep(lst) to convert a list to its abstract view. If the Rep constructor is only used within a special function up that performs the mapping of the abstraction function, then you cannot construct a set without checking that it satisfies the rep invariant:

  fun up(lst: int list): set = Rep(repOK(lst))
  fun down(s: set): int list = case s of Rep(lst) => repOK(lst)

The down function is used to map the other way and obtain the concrete representation for an abstract value while checking that the representation satisfies the invariant.

The previous implementation can now be rewritten to use up and down in the various places that repOK was used formerly:

structure NatSetNoDups2 :> NATSET = struct
  datatype set = Rep of int list
  (* AF: Rep [a1,...,an] represents the set {a1,...,an}.
   * RI: list contains no negative elements or duplicates.
   *)

  fun repOK(s: int list): int list =
    case s of
      [] => s
    | h::t => if h >= 0 andalso not(contains_internal(h,repOK(t)))
                then s
                else raise Fail "RI failed")
  and contains_internal(x:int, s:int list) =
    case s of
       [] => false
     | h::t => x = h orelse contains_internal(x,t)
  fun up(l: int list): set = Rep(repOK(l))
  fun down(s: set): list = case s of Rep(l) => repOK(l)

  val empty = up []
  fun single(x) = up [x]
  fun contains(x,s) = contains_internal(down(s))
  fun union(s1, s2) =
     up (foldl (fn (x:int, s:int list) => 
                   if contains_(x,s) then s else x::s)
            (down(s1)) (down(s2)))
  fun size(s) = length(down(s))
end