CS312 Lecture 8
Functional Structures and Abstractions

Functional Structures

Last lecture we have defined a data abstraction implemented using ML lists. The implementation was functional -- operations such as push and pop create and return new structures instead of destructively modifying the structure.

How about a data abstraction for queues? A queue is a sequence of elements with two ends. The enqueue() operation inserts an elements at the rear of the queue, and the dequeue() operation removes the element at the front, following a FIFO (first-in-first-out) policy. The signature for a queue also includes a function front() that returns the element at the front of the queue, as well as other standard operations (e.g., map, app, fold, etc).

    sig
      type 'a queue
      exception EmptyQueue

      val empty : 'a queue
      val isEmpty : 'a queue -> bool
      val enqueue : ('a * 'a queue) -> 'a queue
      val dequeue : 'a queue -> 'a queue
      val front : 'a queue -> 'a
      val map : ('a -> 'b) -> 'a queue -> 'b queue
    end

An implementation must access the two ends of the queue. However, ML lists provide constant time access only to the head element, not to the tail. How can we implement functional queues efficiently?

An elegant algorithm for implementing functional queues uses two stacks. The top of the first stack represents the rear of the queue, and the top of the second stack represents the front of the queue. When attempting to dequeue an element, but the second stack is empty, the reversed first stack becomes the new second stack.

      struct
      structure S = Stack

      type 'a queue = 'a S.stack * 'a S.stack
      exception EmptyQueue

      val empty : 'a queue = (S.empty, S.empty)

fun isEmpty ((s1,s2):'a queue) = S.isEmpty s1 andalso S.isEmpty s2 fun enqueue (x:'a, (s1,s2):'a queue) : 'a queue = (S.push (x,s1), s2) fun rev (s: 'a S.stack): 'a S.stack = let fun loop (old: 'a S.stack, new: 'a S.stack): 'a S.stack = if S.isEmpty old then new else loop (S.pop old, S.push(S.top old, new)) in loop(s, S.empty) loop (s, S.empty) end fun dequeue ((s1,s2): 'a queue) : 'a * 'a queue = if S.isEmpty s2 then dequeue(S.empty, S.pop (rev s1)) handle S.EmptyStack => raise EmptyQueue else (s1, S.pop s2) fun map (f: 'a -> 'b) ((s1,s2): 'a queue): 'b queue = (S.map f s1, S.map f s2) end

 

A Simple Abstraction: Sets of Natural Numbers

We will now discuss the relation between the abstract view of a module and its concrete implementation. As a running example, we will use a very simple abstraction -- a set of natural (non-negative) numbers. The interface is defined below; we augment code with explanatory comments so that users of this module understand how to use the module without looking at its implementation.

signature NATSET = sig
  (* a "set" is a set of natural numbers: e.g., {1,11,0}, {}, and {1001}*)
  type set
 
  (* empty is the empty set *)
  val empty : set
 
  (* single(x) is {x}. Requires: x >= 0 *)
  val single : int -> set
 
  (* union is set union. *)
  val union : set*set -> set
 
  (* contains(x,s) is whether x is a member of s *)
  val contains: int*set -> bool
 
  (* size(s) is the number of elements in s *)
  val size: set -> int
end

In a real signature for sets, we'd want map and fold operations as well, but let's keep this simple. There are many ways to implement this abstraction. One easy way is as a list of integers:

structure NatSet :> NATSET = struct
  type set = int list
  val empty = []
  fun single(x) = [x]
  fun union(s1,s2) = s1@s2
  fun contains(x,s) = List.exists (fn y => x=y) s
  fun size(s) =
    case s of
      [] => 0
    | h::t => size(t) + (if contains(h,t) then 0 else 1)  
end

This implementation has the advantage of simplicity, although its performance will be poor for large sets. Notice that the types of the functions aren't written down in the implementation; they aren't needed because they're already present in the signature, just like the specifications that are also in the signature and don't need to be replicated in the structure.

How do we know whether this implementation satisfies its interface NATSET? It might seem that we need to carefully look at every method and all possible interactions between the methods. Here is another implementation of NATSET.set also using int list; this implementation is also correct (and also slow). Notice that we are using the same representation type yet some important aspects of the implementation are quite different. Again, it's a bit of challenge to decide that this implementation really works without more information.

structure NatSetNoDups :> NATSET = struct
  type set = int list
  val empty = []
  fun single(x) = [x]
  fun union(s1, s2) =
    foldl (fn(x,s) => if contains(x,s) then s else x::s)
          s1 s2
  fun contains(x,s) = List.exists (fn y => x=y) s
  fun size(s) = length s
end

Here's a third, completely different implementation with a fast contains method. This implementation works pretty well as long as the integers stored in the set are small. If they're not, it's a terrible implementation.

structure NatSetVec :> NATSET = struct
  type set = bool vector
  val empty:set = Vector.fromList []
  fun single(x) = Vector.tabulate(x+1, fn(y) => x=y)
  fun union(s1,s2) =
    let val len1 = Vector.length(s1)
        val len2 = Vector.length(s2)
        fun merge(i) = (i < len1 andalso Vector.sub(s1, i)) orelse
                       (i < len2 andalso Vector.sub(s2, i))
    in
      Vector.tabulate(Int.max(len1, len2), merge)
    end
  fun contains(x,s) =
    x >= 0 andalso x < Vector.length(s) andalso Vector.sub(s,x)
  fun size(s) =
    Vector.foldl (fn (b,n) => if b then n+1 else n) 0 s
end

You may be able to think of more complicated ways to implement sets that are (usually) better than any of these three. We'll talk about some alternative set implementations in lectures coming up soon.

An important reason why we introduced the writing of function specifications was to enable local reasoning: once a function has a spec, we can judge whether the function does what it is supposed to without looking at the rest of the program. We can also judge whether the rest of the program works without looking at the code of the function. However, we cannot reason locally about the individual functions in the three module implementations just given. The problem is that we don't have enough information about the relationship between the concrete types (e.g., int list, bool vector) and the corresponding abstract type (set). This lack of information can be addressed by adding two new kinds of comments to the implementation: the abstraction function and the representation invariant for the abstract data type.

The user of one of the three NATSET implementations should be unable to tell them apart based on their behavior. As far as the user can tell, the values of, say, NatSet.set act like the mathematical ideal of a set as viewed through the operations. To the implementer, the lists [3,1], [1,3], and [1,1,3] are distinguishable; to the user, they both represent the abstract set {1,3} and cannot be told apart through the operations of the NATSET signature. From the abstract view of the user, the abstract data type describes a set of abstract values and associated operations; the implementers knows that these abstract values are represented by concrete values that may contain additional information invisible from the user's view. This loss of information is described by the abstraction function, which is a mapping from the space of concrete values to the abstract space. The abstraction function for NatSet looks like this:

Notice that several concrete values may map to a single abstract value; that is, the abstraction function may be many-to-one. It is also possible that some concrete values, such as the list [-1,1], do not map to any abstract value; the abstraction function may be partial.