Recitation 8: Using Representation Invariants

The rep invariant is a condition that is intended to hold for all values of the abstract type. Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single ADT operation is correct in isolation from the rest of the module. It is correct if, assuming that:

  1. the function's requires and checks clauses hold and
  2. the concrete representations of all values of the abstract type satisfy the rep invariant

we can show that

  1. the returns clause of the function is satisfied (that is, the commutation diagram holds) and
  2. all new values of the abstract type that are created have concrete representations satisfying the rep invariant

The rep invariant makes it easier to write code that is provably correct, because it means that we don't have to write code that works for all possible incoming concrete representations--only those that satisfy the rep invariant. This is why NatSetNoDups.union doesn't have to work on lists that contain duplicate elements. On return there is a corresponding responsibility to produce only values that satisfy the rep invariant. As suggested in the figure above, the rep invariant holds for all reps both before and after the functions, which is why we call it an invariant at all.

repOK

When implementing a complex abstract data type, it is often helpful to write a function internal to the module that checks that the rep invariant holds. This function can provide an additional level of assurance about your reasoning the correctness of the code. By convention we will call this function repOK; given an abstract type (say, set) implemented as a concrete type (say, int list)  it always has the same specification:

(* Returns whether x satisfies the representation invariant *)
fun repOK(x: int list): bool = ...

The repOK can be used to help us implement a module and be sure that each function is independently correct. The trick is to bulletproof each function in the module against all other functions by having it apply repOK to any values of the abstract type that come from outside. In addition, if it creates any new values of the abstract type, it applies repOK to them to ensure that it isn't breaking the rep invariant itself. With this approach, a bug in one function is less likely to create the appearance of a bug in another.

RepOK as an identity function

Another convenient way to write repOK is to make it an identity function, and raise an exception if the rep invariant doesn't hold. Making it an identity function lets us conveniently test the rep invariant in various ways, as shown below.

(* The identity function. Checks whether x satisfies the representation invariant. *)
fun repOK(x: int list): int list = ...

Here is an example of we might use repOK for the NatSetNoDups implementation of sets given in lecture:

structure NatSetNoDups :> NATSET = struct
  type set = int list
  (* Abstraction function: the list [a1,...,an] represents the set
   * {a1,...,an}. Thus, the empty list represents the empty set.
   * Representation invariant: The list may not contain duplicates.
   *)

  val empty = []
  fun single(x) = repOK([x])
  fun repOK(s: int list): set =
    case s of
      [] => s
    | h::t => (repOK(t);
               if not(contains(h,t)) then s
	       else raise Fail "RI failed")
  and contains(x,s) =
    case repOK(s) of
       [] => false
     | h::t => x = h orelse contains(x,t)
  fun union(s1, s2) =
     repOK (foldl (fn (x,s) => if contains(x,s) then s else x::s) (repOK(s1)) (repOK(s2)))
  fun size(s) = length(repOK(s))
end

Explicit up and down

If you really want to be careful about  enforcing the rep invariant, there is a further trick that can be used. The abstract type is defined as singleton datatype, e.g.

datatype set = rep of int list

This definition means that you cannot treat a list accidentally as a set; you have to write rep(lst) to convert a list to its abstract view. If the rep constructor is only used within a special function up that performs the mapping of the abstraction function, then you cannot construct a set without checking that it satisfies the rep invariant:

  fun up(lst: int list): set = rep(repOK(lst))
  fun down(s: set): int list = case s of rep(lst) => repOK(lst)
The down function is used to map the other way and obtain the concrete representation for an abstract value while checking that the representation satisfies the invariant:
  fun single(x: int) = up([x])
  fun size(s: set) = length(down(s))
  (* etc. *)

Rep invariants and code evolution

Let us consider the rep invariant for the vector implementation of NATSET. There is some question about what we should write. One possibility is to write the strongest possible specification of the possible values that can be created by the implementation. It happens that the vector representing the set never has trailing false values:

structure NatSetVec :> NATSET = struct
  type set = bool vector
  (* Abstraction function: the vector v represents the set
     of all natural numbers i such that sub(v,i) = true

     Representation invariant: the last element of v is true
   *)
  val empty:set = Vector.fromList []
  

This representation invariant describes an interesting property of the implementation that may be useful in judging its performance. However, we don't need this rep invariant in order to show that the implementation is correct. If there were no rep invariant, we could still argue that the implementation works properly. All of the operations of NatSetVec will work even if sets are somehow introduced that violate the no-trailing-false property. It is not necessary to have the rep invariant in order to argue that the operations of NatSetVec are correct according to the 4-point plan above.

Further, a strong rep invariant is not always the best choice, because it restricts future changes to the module. We described interface specifications as a contract between the implementer of a module and the user. A rep invariant is a contract between the implementer and herself, or among the various implementers of the module, present and future. According to assumption 2, above, ADT operations may be implemented assuming that the rep invariant holds. If the rep invariant is ever weakened (made more permissive), some parts of the implementation may break.Thus, one of the most important purposes of the rep invariant is to document exactly what may and what may not be safely changed about a module implementation. A weak rep invariant forces the implementer to work harder to produce a correct implementation, because less can be assumed about concrete representation values, but conversely it gives maximum flexibility for future changes to the code.