CS 3110 Lecture 8
Abstraction Functions

We have observed that the most important use of the "comment" feature of programming languages is to provide specifications of the behavior of declared functions, so that program modules can be used without inspecting their code (modular programming). 

Let us now consider the use of comments in module implementations. The first question we must ask ourselves is who is going to read the comments written in module implementations. Because we are going to work hard to allow module users to program against the module while reading only its interface, clearly users are not the intended audience. Rather, the purpose of implementation comments is to explain the implementation to other implementers or maintainers of the module. This is done by writing comments that convince the reader that the implementation correctly implements its interface.

It is inappropriate to copy the specifications of functions found in the module interface into the module implementation. Copying runs the risk of introducing inconsistency as the program evolves, because programmers don't keep the copies in sync. Copying code and specifications is a major source (if not the major source) of program bugs.  In any case, implementers can always look at the interface for the specification. This rule of thumb can be inconvenient to those using outdated editors that cannot view two files at a time, but the payoff is worth it.

Thus, implementation comments are needed only if there are details of the implementation that are not obvious to the reader. For example, if we see the following signature and structure, it is obvious that the structure implements the signature and thus any additional comment in the structure would be superfluous:

module type CHOOSE = sig
  (* one_to_ten() is a number in 1..10 *)
  val one_to_ten: unit -> int
end

module Choose : CHOOSE = struct
  let one_to_ten() = 7
end

Implementation comments fall into two categories. The first category arises because a module implementation may define new types and functions that are purely internal to the module. If their significance is not obvious, these types and functions should be documented in much the same style that we have suggested for documenting interfaces. Often as the code is written it becomes apparent that the new types and functions defined in the module form an internal data abstraction or at least a collectin of functionality that makes sense as a module in its own right. This is a signal that the internal data abstraction might be moved to a separate module and manipulated only through its operations.

The second category of implementation comments is associated with the use of data abstraction; these comments are the focus of this lecture. Suppose we are implementing an abstraction for a set of natural numbers. The interface might look something like this:

module type SETSIG = sig
  type 'a set
  val empty : 'a set
  val add : 'a -> 'a set -> 'a set
  val mem : 'a -> 'a set -> bool
  val rem : 'a -> 'a set -> 'a set
  val size: 'a set -> int
  val union: 'a set -> 'a set -> 'a set
  val inter: 'a set -> 'a set -> 'a set
end

In a real signature for sets, we'd want operations such as map and fold as well, but let's keep this simple. There are many ways to implement this abstraction. One easy way is as a list of integers:

module Set : SETSIG = struct
  type 'a set = 'a list
  let empty = []
  let add x l = x :: l
  let mem x l = List.mem x l
  let rem x l = List.filter (fun h -> h<>x) l
  let rec size l = 
    match l with
	[] -> 0
      | h::t -> size(t) + (if List.mem h t then 0 else 1)
  let union l1 l2 = l1 @ l2
  let inter l1 l2 = List.filter (fun h -> List.mem h l2) l1
end

This implementation has the advantage of simplicity. For small sets that tend not to have duplicate elements, it will be a fine choice. Its performance will be poor for large sets or applications with many duplicates but for some applications that's not an issue.

Notice that the types of the functions aren't written down in the implementation; they aren't needed because they're already present in the signature, just like the specifications that are also in the signature and don't need to be replicated in the structure.

How do we know whether this implementation satisfies its interface SETSIG? It might seem that we need to carefully look at every method and all possible interactions between the methods. Here is another implementation of SETSIG also using int list; this implementation is also correct (and also slow for large sets). Notice that we are using the same representation type yet some important aspects of the implementation are quite different. Again, it's a bit of challenge to decide that this implementation really works without more information.

module Set : SETSIG = struct
  type 'a set = 'a list
  let empty = []
  let add x l = if List.mem x l then l else x :: l 
  let mem x l = List.mem x l
  let rem x l = List.filter (fun h -> h<>x) l
  let size l = List.length l 
  let union l1 l2 = 
    List.fold_left (fun a x -> if List.mem x l2 then a else x::a) l2 l1
  let inter l1 l2 = List.filter (fun h -> List.mem h l2) l1
end

Another implementation might use some kind of tree structure (which we will cover later in the semester). You may be able to think of more complicated ways to implement sets that are (usually) better than any of these. We'll talk about issues of selecting good implementations in lectures coming up soon.

An important reason why we introduced the writing of function specifications was to enable local reasoning: once a function has a spec, we can judge whether the function does what it is supposed to without looking at the rest of the program. We can also judge whether the rest of the program works without looking at the code of the function. However, we cannot reason locally about the individual functions in the three module implementations just given. The problem is that we don't have enough information about the relationship between the concrete types (e.g., int list, bool vector) and the corresponding abstract type (set). This lack of information can be addressed by adding two new kinds of comments to the implementation: the abstraction function and the representation invariant for the abstract data type.

Abstraction function

The user of any SETSIG implementation should be unable to tell them apart based on their behavior. As far as the user can tell, these operations act like the mathematical ideal of a set as viewed through the operations. To the implementer, the lists [3,1], [1,3], and [1,1,3] are distinguishable; to the user of the first implementation, they all represent the abstract set {1,3} and cannot be told apart through the operations of the SETSIG signature (note that the second implementation does not allow the latter of these as a representation of a set). From the view of the user, the abstract data type describes a set of abstract values and associated operations; the implementers knows that these abstract values are represented by concrete values that may contain additional information invisible from the user's view. This loss of information is described by the abstraction function, which is a mapping from the space of concrete values to the abstract space. The abstraction function for the first implementation of Set looks like this:

Notice that several concrete values may map to a single abstract value; that is, the abstraction function may be many-to-one. It is also possible that some concrete values, such as the list [-1,1], do not map to any abstract value; the abstraction function may be partial.

The abstraction function is important for deciding whether an implementation is correct, and therefore it belongs as a comment in the implementation of any abstract data type. For example, in the NatSet module, we could document the abstraction function as follows:

module Set : SETSIG = struct
  type set = 'a list
  (* Abstraction function: the list [a1;...;an] represents the
   * smallest set containing all of a1;...;an. The list may
   * contain duplicates. The empty list represents the empty set.
   *)
  ...

This comment explicitly points out that the list may contain duplicates, which is probably helpful as a reinforcement of the first sentence. Similarly, the case of an empty list is mentioned explicitly for clarity. The abstraction function for the second implementation, which does not allow duplicates, hints at an important difference: we can write the abstraction function for this second representation a bit more simply because we know that the elements are distinct:

module Set : SETSIG = struct
  type set = 'a list
  (* Abstraction function: the list [a1;...;an] represents the set
   * {a1;...;an}. [] represents the empty set.
   *)
  ...

Another option for defining the abstraction function is to give pseudo-code defining it; for example, in the case of the first implementation of Set we might write:

(* Abstraction Function:
    AF([]) = {}
    AF(h::t) = {h} U AF(t)    (where "U" is mathematical set union)
*)

Using English is generally recommended because some programmers find formalism difficult and because of the potential for confusion when the notation of the implementation (OCaml code) meets the notation of the abstract domain (mathematics, in this case).

In practice the words "Abstraction function" are usually omitted when practitioners write code. However, we will ask you to do it because it's a useful reminder that that is what you are writing in a comment like the ones above. Whenever you write code to implement what amounts to an abstract data type, you should write down the abstraction function explicitly, and certainly keep it in mind.