We have identified two pieces of implementation-side specification: the abstraction function and the representation invariant. They need to be provided in every module implementation so that implementers can use local reasoning to figure out whether the code they are reading or writing is correct. Ordinary function specifications that appear in the module interface (signature) are a contract between the implementer and the user of the module. By contrast, the abstraction function and rep invariant are a contract between the implementer and other implementers or maintainers of the code.
A rep invariant is a condition that is intended to hold for all values of an abstract type. The abstraction barrier ensures that the module is the only place that the rep invariant can be broken; it is the only place that the concrete type of the values is known.
Therefore, in implementing one of the operations of the abstract data type, it can be assumed that any arguments of the abstract type satisfy the rep invariant. This assumption restores local reasoning about correctness, because we can use the rep invariant and abstraction function to judge whether the implementation of a single operation is correct in isolation from the rest of the module. It is correct if, assuming that:
we can show that
The rep invariant makes it easier to write code that is provably
correct, because it means that we don't have to write code that works
for all possible incoming concrete representations--only those that
satisfy the rep invariant. This is why NatSetNoDups.union doesn't have to
work on lists that contain duplicate elements, for example. On return
from each operation there is a corresponding responsibility to
produce only values that satisfy the rep invariant, ensuring that the
rep invariant is in fact an invariant.
When implementing a complex abstract data type, it is often helpful to write
a function internal to the module that checks that the rep invariant holds. This
function can provide an additional level of assurance about your reasoning the
correctness of the code. By convention we will call this function repOK;
given an abstract type (say, set) implemented as a concrete type
(say, int list) it always has the same specification:
(* Returns whether x satisfies the representation invariant *) fun repOK(x: int list): bool = ...
The repOK can be used to help us implement a module and be sure
that each function is independently correct. The trick is to bulletproof
each function in the module against all other functions by having it apply repOK
to any values of the abstract type that come from outside. In addition, if it
creates any new values of the abstract type, it applies repOK to
them to ensure that it isn't breaking the rep invariant itself. With this
approach, a bug in one function is less likely to create the appearance of a bug
in another.
A more convenient way to write repOK is to make it an identity
function that
raises an exception if the rep invariant doesn't hold. Making it an identity
function lets us conveniently test the rep invariant in various ways, as shown
below.
(* The identity function. Checks: whether x satisfies the rep. invariant. *) fun repOK(x: int list): int list = ...
Here is an example of we might use repOK for the NatSetNoDups
implementation of sets given in lecture:
structure NatSetNoDups :> NATSET = struct type set = int list (* AF: the list [a1,...,an] represents the set {a1,...,an}. * RI: list contains no negative elements or duplicates. *) fun repOK(s: int list): int list = case s of [] => s | h::t => if h < 0 orelse contains'(h, repOK t) then raise Fail "RI failed" else s and contains'(x: int,s: int list) = List.exists (fn y => y=x) s val empty: set = repOK [] fun single(x: int): set = repOK [x] fun contains(x: int, s: set): bool = contains'(x, repOK s) fun union(s1: set, s2: set): set = repOK (foldl (fn (x,s) => if contains'(x,s) then s else x::s) (repOK s1) (repOK s2)) fun size(s: set): int = length(repOK s) end
Here, repOK is implemented using contains'
rather than the function contains, because using contains would
result in a lot of extra repOK checks. This is a common
pattern when implementing a repOK check.
Calling repOK on every argument can be too expensive for
the production version of a program. The repOK above is quite
expensive (though it could be implemented more cheaply). For production code it
may be more appropriate to use a version of repOK that only checks
the parts of the rep invariant that are cheap to check. When there is a
requirement that there be no run-time cost, repOK can be changed to an identity function (or
macro) so the compiler optimizes away the calls to it. It is a good idea to keep
around the full code of repOK (perhaps in a comment) so it can be
easily reinstated during future debugging.
For those who really want to be careful about enforcing and checking the rep invariant, there is a way to use the SML type system to impose more discipline. The trick is to define the abstract type as a singleton datatype, e.g.
datatype set = Rep of int list
This definition means that you cannot treat a list accidentally as a set; you
have to write case...of Rep(lst) to convert a list to its abstract view.
If the Rep constructor is only used within a special function up
that performs the mapping of the abstraction function, then you cannot construct
a set without checking that it satisfies the rep invariant:
fun up(l: int list): set =Rep(repOK l) fun down(s: set): int list = case s ofRep(l) => repOK l
The down function is used to map the other way and obtain the
concrete representation for an abstract value while checking that the
representation satisfies the invariant.
The previous implementation can now be rewritten to use up and down
in the various places that repOK was used formerly:
structure NatSetNoDups2 :> NATSET = struct datatype set = Rep of int list (* AF: Rep [a1,...,an] represents the set {a1,...,an}. * RI: list contains no negative elements or duplicates. *) fun repOK(s: int list): int list = case s of [] => s | h::t => if h < 0 orelse contains'(h, repOK t) then raise Fail "RI failed" else s and contains'(x:int,s:int list) = List.exists (fn y => y=x) s fun up(l: int list): set = Rep(repOK l) fun down(s: set): int list = case s of Rep(l) => repOK l val empty: set = up [] fun single(x: int): set = up [x] fun contains(x: int, s:set): bool = contains'(x, down s) fun union(s1: set, s2: set): set = up (foldl (fn (x,s) => if contains'(x,s) then s else x::s) (down s1) (down s2)) fun size(s: set): int = length(down s) end
The type system will force us to unwrap each input set to obtain the underlying integer list, and then wrap the resulting list back into a set. The up and down functions perform these conversions; they automatically check the rep invariant while performing the conversions.
What does it mean to prove that we have a correct implementation? Can write a proof that the program is correct? Testing the program for a set of possible inputs (and checking the corresponding outputs) is an easy way of getting more convinced that the program works correctly. But are we sure that it will work correctly for all inputs, and we haven't forgotten some corner case? Next time you board on a plane, think about the correctness of the critical pieces of software that control parts of your plane.
Let's consider the implementation of sets using lists with no duplicates. What does it mean that such a set implementation is correct? Informally, the implementation of an operation (e.g., union, size, etc) that works over a concrete structure (a list in this case) is correct if it performs the appropriate operation in the abstract domain (e.g., set union, set size, etc). We know the we can express this notion using the abstraction function (AF) via the commutative diagram. In addition, each operation should expect well-formed concrete structures (i.e., structures that satisfy the RI) and produce well-formed data structures (that satisfy the RI).
Let's take the code that implements set union:
fun union(s1: int list, s2: int list): int list = foldl (fn (x,s) => if contains(x,s) then s else x::s) s1 s2
Proving that this implementation is correct requires proving that:
The requirements 1 and 2 talks about maintaining the RI, whereas requirement 3 talks about the implemention modeling the set union operation in the abstract domain. Note that the code needs to work correctly only for inputs that are well-formed, i.e., that satisfy the RI. The key technique behind proving such properties is induction.
You've learned about this mathematical induction over natural numbers in CS211. To prove that a property P(n) holds for all natural numbers n >= 0, you first prove the base case, that the property holds for the smallest natural number, n = 0. Then, you prove the inductive step: if P(n) holds for some number n, then P(n+1) also holds. Together, these two parts show that P(n) holds for all n >= 0. A simple analogy is that of an infinite sequence of dominoes: flipping the first piece corresponds to the base case, and then each piece will fall over the next one (the inductive step). As a result, all pieces will fall.
Induction can also be performed not only on natural numbers, but also on more complicated sets, like pairs of non-negative integers, or binary trees. For instance, proving that the above function union is correct requires induction on lists.
It is useful to keep in mind the key pieces of a proof by induction and remember to check that all of these items have been covered. Use the following list as a recipe for inductive proofs:
We can now formally prove that the above code correctly implements the set union operation by induction. For this code, the abstraction function AF maps lists to the sets that they correspond: AF([]) = {} and AF(h::t) = {h} U AF(t). Furthermore, the rep invariant (RI) for a list l states that there are no duplicates in the list.
We can prove that union(s1,s2) is correct by induction on the second list s2. We will formulate this as an induction on the length of s2.
f(x,s) = if contains(x,s) then s else x::s.
Then:union(s1,nil)
=> (substitution)
case nil of
nil => s1
| x::xs => foldl f (f(x,s1)) xs
=> (pattern matching)
s1 union(s1,h::t)
=> (substitution)
case h::t of
nil => s1
| x::xs => foldl f (f(x,s1)) xs
=> (pattern matching)
foldl f (f(h,s1)) t foldl f (f(h,s1)) t = union(f(h,s1), t). We know that h is
a list with n elements, so we'd like to apply the induction hypothesis P(n)
to f(h,s1) and t. But for this we need to show that these two lists are
well-formed: