Last lecture we have defined a data abstraction implemented using ML lists. The implementation was functional -- operations such as push and pop create and return new structures instead of destructively modifying the structure.
How about a data abstraction for queues? A queue is a sequence of elements with two ends. The enqueue() operation inserts an elements at the rear of the queue, and the dequeue() operation removes the element at the front, following a FIFO (first-in-first-out) policy. The signature for a queue also includes a function front() that returns the element at the front of the queue, as well as other standard operations (e.g., map, app, fold, etc).
sig type 'a queue exception EmptyQueue val empty : 'a queue val isEmpty : 'a queue -> bool val enqueue : ('a * 'a queue) -> 'a queue val dequeue : 'a queue -> 'a queue val front : 'a queue -> 'a val map : ('a -> 'b) -> 'a queue -> 'b queue end
An implementation must access the two ends of the queue. However, ML lists provide constant time access only to the head element, not to the tail. How can we implement functional queues efficiently?
An elegant algorithm for implementing functional queues uses two stacks. The top of the first stack represents the rear of the queue, and the top of the second stack represents the front of the queue. When attempting to dequeue an element, but the second stack is empty, the reversed first stack becomes the new second stack.
struct
structure S = Stack
type 'a queue = 'a S.stack * 'a S.stack
exception EmptyQueue
val empty : 'a queue = (S.empty, S.empty)
fun isEmpty ((s1,s2):'a queue) =
S.isEmpty s1 andalso S.isEmpty s2
fun enqueue (x:'a, (s1,s2):'a queue) : 'a queue =
(S.push (x,s1), s2)
fun rev (s: 'a S.stack): 'a S.stack = let
fun loop (old: 'a S.stack, new: 'a S.stack): 'a S.stack =
if S.isEmpty old then new
else loop (S.pop old, S.push(S.top old, new))
in
loop(s, S.empty) loop (s, S.empty)
end
fun dequeue ((s1,s2): 'a queue) : 'a * 'a queue =
if S.isEmpty s2
then dequeue(S.empty, S.pop (rev s1))
handle S.EmptyStack => raise EmptyQueue
else (s1, S.pop s2)
fun map (f: 'a -> 'b) ((s1,s2): 'a queue): 'b queue =
(S.map f s1, S.map f s2)
end
We will now discuss the relation between the abstract view of a module and its concrete implementation. As a running example, we will use a very simple abstraction -- a set of natural (non-negative) numbers. The interface is defined below; we augment code with explanatory comments so that users of this module understand how to use the module without looking at its implementation.
signature NATSET = sig (* a "set" is a set of natural numbers: e.g., {1,11,0}, {}, and {1001}*) type set (* empty is the empty set *) val empty : set (* single(x) is {x}. Requires: x >= 0 *) val single : int -> set (* union is set union. *) val union : set*set -> set (* contains(x,s) is whether x is a member of s *) val contains: int*set -> bool (* size(s) is the number of elements in s *) val size: set -> int end
In a real signature for sets, we'd want map and fold
operations as well, but let's keep this simple. There are many ways to implement
this abstraction. One easy way is as a list of integers:
structure NatSet :> NATSET = struct type set = int list val empty = [] fun single(x) = [x] fun union(s1,s2) = s1@s2 fun contains(x,s) = List.exists (fn y => x=y) s fun size(s) = case s of [] => 0 | h::t => size(t) + (if contains(h,t) then 0 else 1) end
This implementation has the advantage of simplicity, although its performance will be poor for large sets. Notice that the types of the functions aren't written down in the implementation; they aren't needed because they're already present in the signature, just like the specifications that are also in the signature and don't need to be replicated in the structure.
How do we know
whether this implementation satisfies its interface NATSET? It
might seem that we need to carefully look at every method and all possible
interactions between the methods. Here is another implementation of NATSET.set
also using int list; this implementation is also correct (and also
slow). Notice that we are using the same representation type yet some important
aspects of the implementation are quite different. Again, it's a bit of
challenge to decide that this implementation really works without more
information.
structure NatSetNoDups :> NATSET = struct type set = int list val empty = [] fun single(x) = [x] fun union(s1, s2) = foldl (fn(x,s) => if contains(x,s) then s else x::s) s1 s2 fun contains(x,s) = List.exists (fn y => x=y) s fun size(s) = length s end
Here's a third, completely different implementation with a fast contains
method. This implementation works pretty well as long as the integers stored in
the set are small. If they're not, it's a terrible implementation.
structure NatSetVec :> NATSET = struct type set = bool vector val empty:set = Vector.fromList [] fun single(x) = Vector.tabulate(x+1, fn(y) => x=y) fun union(s1,s2) = let val len1 = Vector.length(s1) val len2 = Vector.length(s2) fun merge(i) = (i < len1 andalso Vector.sub(s1, i)) orelse (i < len2 andalso Vector.sub(s2, i)) in Vector.tabulate(Int.max(len1, len2), merge) end fun contains(x,s) = x >= 0 andalso x < Vector.length(s) andalso Vector.sub(s,x) fun size(s) = Vector.foldl (fn (b,n) => if b then n+1 else n) 0 s end
You may be able to think of more complicated ways to implement sets that are (usually) better than any of these three. We'll talk about some alternative set implementations in lectures coming up soon.
An important reason why we introduced the writing of function specifications
was to enable local reasoning: once a function has a spec, we can judge
whether the function does what it is supposed to without looking at the rest of
the program. We can also judge whether the rest of the program works without
looking at the code of the function. However, we cannot reason locally about the
individual functions in the three module implementations just given. The problem
is that we don't have enough information about the relationship between the
concrete types (e.g., int list, bool vector) and the
corresponding abstract type (set). This lack of information can be
addressed by adding two new kinds of comments to the implementation: the abstraction
function and the representation invariant for the abstract data type.
The user of one of the three NATSET implementations should be unable to
tell them apart based on their behavior. As far as the user can tell, the values
of, say, NatSet.set act like the mathematical ideal of a set as
viewed through the operations. To the implementer, the lists [3,1],
[1,3], and [1,1,3] are distinguishable; to the user,
they both represent the abstract set {1,3} and cannot be told apart through the
operations of the NATSET signature. From the abstract view of the user,
the abstract
data type describes a set of abstract values and associated operations; the
implementers knows that these abstract values are represented by concrete values
that may contain additional information invisible from the user's view. This
loss of information is described by the abstraction function, which is a
mapping from the space of concrete values to the abstract space. The abstraction
function for NatSet looks like this:
Notice that several concrete values may map to a single abstract value; that
is, the abstraction function may be many-to-one. It is also possible that
some concrete values, such as the list [-1,1], do not map to any
abstract value; the abstraction function may be partial.