The rep invariant is a condition that is intended to hold for all values of the abstract type. Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single ADT operation is correct in isolation from the rest of the module. It is correct if, assuming that:
we can show that
The rep invariant makes it easier to write code that is provably correct,
because it means that we don't have to write code that works for all possible
incoming concrete representations--only those that satisfy the rep invariant.
This is why NatSetNoDups.union
doesn't have to work on lists that
contain duplicate elements. On return there is a corresponding responsibility to
produce only values that satisfy the rep invariant. As suggested in the figure
above, the rep invariant holds for all reps both before and after the functions,
which is why we call it an invariant at all.
When implementing a complex abstract data type, it is often helpful to write
a function internal to the module that checks that the rep invariant holds. This
function can provide an additional level of assurance about your reasoning the
correctness of the code. By convention we will call this function repOK
;
given an abstract type (say, set
) implemented as a concrete type
(say, int list
) it always has the same specification:
(* Returns whether x satisfies the representation invariant *) fun repOK(x: int list): bool = ...
The repOK
can be used to help us implement a module and be sure
that each function is independently correct. The trick is to bulletproof
each function in the module against all other functions by having it apply repOK
to any values of the abstract type that come from outside. In addition, if it
creates any new values of the abstract type, it applies repOK
to
them to ensure that it isn't breaking the rep invariant itself. With this
approach, a bug in one function is less likely to create the appearance of a bug
in another.
Another convenient way to write repOK is to make it an identity function, and raise an exception if the rep invariant doesn't hold. Making it an identity function lets us conveniently test the rep invariant in various ways, as shown below.
(* The identity function. Checks whether x satisfies the representation invariant. *) fun repOK(x: int list): int list = ...
Here is an example of we might use repOK
for the NatSetNoDups
implementation of sets given in lecture:
structure NatSetNoDups :> NATSET = struct type set = int list (* Abstraction function: the list [a1,...,an] represents the set * {a1,...,an}. Thus, the empty list represents the empty set. * Representation invariant: The list may not contain duplicates. *) val empty = [] fun single(x) = repOK([x]) fun repOK(s: int list): set = case s of [] => s | h::t => (repOK(t); if not(contains(h,t)) then s else raise Fail "RI failed")
and contains(x,s) = case repOK(s) of [] => false | h::t => x = h orelse contains(x,t) fun union(s1, s2) = repOK (foldl (fn (x,s) => if contains(x,s) then s else x::s) (repOK(s1)) (repOK(s2))) fun size(s) = length(repOK(s)) end
If you really want to be careful about enforcing the rep invariant, there is a further trick that can be used. The abstract type is defined as singleton datatype, e.g.
datatype set = rep of int list
This definition means that you cannot treat a list accidentally as a set; you
have to write rep(lst)
to convert a list to its abstract view.
If the rep constructor is only used within a special function up
that performs the mapping of the abstraction function, then you cannot construct
a set without checking that it satisfies the rep invariant:
fun up(lst: int list): set = rep(repOK(lst)) fun down(s: set): int list = case s of rep(lst) => repOK(lst)The
down
function is used to map the other way and obtain the
concrete representation for an abstract value while checking that the
representation satisfies the invariant:
fun single(x: int) = up([x]) fun size(s: set) = length(down(s)) (* etc. *)
Let us consider the rep invariant for the vector implementation of NATSET
.
There is some question about what we should write. One possibility is to write
the strongest possible specification of the possible values that can be created
by the implementation. It happens that the vector representing the set never has
trailing false
values:
structure NatSetVec :> NATSET = struct type set = bool vector (* Abstraction function: the vector v represents the set of all natural numbers i such that sub(v,i) = true Representation invariant: the last element of v is true *) val empty:set = Vector.fromList []
This representation invariant describes an interesting property of the
implementation that may be useful in judging its performance. However, we don't
need this rep invariant in order to show that the implementation is correct. If
there were no rep invariant, we could still argue that the implementation works
properly. All of the operations of NatSetVec
will work even if sets
are somehow introduced that violate the no-trailing-false property. It is not
necessary to have the rep invariant in order to argue that the operations of NatSetVec
are correct according to the 4-point plan above.
Further, a strong rep invariant is not always the best choice, because it restricts future changes to the module. We described interface specifications as a contract between the implementer of a module and the user. A rep invariant is a contract between the implementer and herself, or among the various implementers of the module, present and future. According to assumption 2, above, ADT operations may be implemented assuming that the rep invariant holds. If the rep invariant is ever weakened (made more permissive), some parts of the implementation may break.Thus, one of the most important purposes of the rep invariant is to document exactly what may and what may not be safely changed about a module implementation. A weak rep invariant forces the implementer to work harder to produce a correct implementation, because less can be assumed about concrete representation values, but conversely it gives maximum flexibility for future changes to the code.