Lecture 7
Modules and data abstractions

Modular programming

Modular programming is a valuable technique for building medium-sized and large programs because it allows a program to be broken up into loosely coupled pieces that can be developed largely in isolation. It facilitates local reasoning: the programmer(s) can think about the implementation of a piece of a program without full knowledge of the rest of the program. Rather, the rest of the program only needs to be understood abstractly, at the level of detail presented by the interfaces to the various modules on which the piece of code being worked on depends. This abstraction makes the programmer's job much easier; it is helpful even when there is only one programmer working on a moderately large program, and it is crucial when there is more than one programmer.

Because modules can be used only through their declared interfaces, the job of the module implementer is also made easier. The implementer has the flexibility to change the module as long as the module still satisfies its interface. The interface (signature) ensure that the module is loosely coupled to its clients. Loose coupling gives implementers and clients the freedom to work on their code mostly independently.

Data abstractions

We have already talked about functional abstraction, in which a function hides the details of the computation it performs. Structures and signatures in SML provide a new kind of abstraction: data (or type) abstraction. The signature does not state what the type is. This is known as an abstract type.

A data abstraction (or abstract data type, ADT) consists of an abstract type along with a set of operations and values. ML of course provides a number of built-in types with built-in operations; a data abstraction is in a sense a clean extension of the language to support a new kind of type. For example, a record type has builtin projection operators, and datatypes have builtin constructors. For a data abstraction, its signature creates an abstraction barrier that prevents users from using its values in a way incompatible with the operations declared in the signature.

Signatures and Structures

To successfully develop large programs, we need more than the ability to group related operations together. We need to be able to use the compiler to enforce the separation between different modules, which prevents bad things from happening. Signatures are the mechanism in ML that enforces this separation.

A signature declares a set of types and values that any module implementing it must provide. It consists of type, datatype, exception and val specifications. The specifications are a bit different than we are used to so far, they specify only names and types, no values.

A signature specifies an "interface", what a particular module of code does, as opposed to an "implementation" of how a module operates. A signature for a stack might look something like the following:

  signature STACK = 
    sig
      type 'a stack
      exception EmptyStack

      val empty : 'a stack
      val isEmpty : 'a stack -> bool
      val push : ('a * 'a stack) -> 'a stack
      val pop : 'a stack -> 'a * 'a stack
      val map : ('a -> 'b) -> 'a stack -> 'b stack
      val app :  ('a -> unit) -> 'a stack -> unit
   end

Note that this signature defines a parameterized stack type, an exception called EmptyStack, a constant called empty, and two functions that operate on stacks. Note that this example declares the polymorphic stack type 'a stack, but doesn't define it. By convention signature names use all capital letters. A programmer can use stacks based on these definitions without seeing the implementation. Different possible data representations and corresponding code could implement this stack, for instance using lists or arrays.

A structure must be used to implement a signature. A structure must implement all the specifications of its signature. It may implement more than what is in the signature, but those additional definitions are accessible only inside the structure definition itself, not to users of the structure.

There are two ways of specifying that a structure, say Stack, implements a signature, say STACK: either by writing "Stack: STACK", or by writing "Stack :> STACK". In both cases, the structure Stack is a value implementation of the interface. The difference is that in the second case (when using :>) the concrete implementation type defined in the structure is not visible or accessible outside of the structure itself. The users of this structure can only "see" the facts expressed in the signature; the actual implementation type are hidden from them. In this case, we say that the type abstracted by the structure is opaque.

Here is the simplest implementation of stacks that matches the above signature. It is implemented in terms of lists.

  structure Stack :> STACK = 
    struct
      type 'a stack = 'a list
      exception Empty

      val empty : 'a stack = []

      fun isEmpty (l: 'a list): bool = 
         List.null l

      fun push (x: 'a, l: 'a stack): 'a stack = 
         x::l

      fun pop (l: 'a stack): 'a stack = 
         case l of 
           [] => raise Empty
         | x::xs => (x, xs)

      fun map (f:'a -> 'b) (l: 'a stack): 'b stack = List.map f l

      fun app (f:'a -> unit) (l: 'a stack): unit = List.app f l
    end

There are several ways to refer to the elements of a structure. One is with fully qualified names: Stack.empty, Stack.push, Queue.pop., etc Another is by using the open declaration, open Queue, which makes the names accessible without the need to specify the prefix.

The abstraction barrier

The abstraction barrier prevents the clients of the Stack module from using their knowledge of what poly is. In fact, the SML interpreter will not even print out values of a type like stack. Without the signature, we can see what stacks really are:

- Stack.push(1, Stack.empty);
val it = [1]: int Stack.stack

Once the module is protected by its signature, values of the type poly are printed only as a dash:

- Stack.push(1, Stack.empty);
val it = - : int Stack.stack

Hence, we cannot see how int Stack.stack is implemented. The use of signatures with opaque implementations ensures that programmers cannot depend on the implementation inadvertently. They are free to change the implementation later, without worrying about changing the code that uses it. Using a new implementation data structure cannot break the user's code, since the user could never access the internal implementation data structures.

Somehow its incredibly tempting to write code without thinking about the abstract signature or interface versus the concrete implementation. The value of opaque data abstractions show its advantages in large (e.g., tens or thousands of lines of code) programs. If the data abstraction clients write code that depends on the internal structure of the ADT, then changing the implementation will turn into a time-consuming and error-prone task of manually changing each use of the ADT in the application. With opaque types, no changes outside the structure are required.

Designing interfaces

We have talked about what makes a specification good; a few comments about what makes an interface good are also in order. Obviously, an interface should contain good specifications of its components. In addition, a well designed interface strikes a balance between simplicity and completeness. Sometimes it is better not to offer every possible operation that the users might want, particularly if those users can efficiently construct the desired computation by using other operations. An interface that specifies many components is said to be wide; a small interface is narrow. Narrow interfaces are good because they provide a simpler, more flexible contract between the user and implementer. The user is less dependent on the details of the implementation, and the implementer has greater flexibility to change how the abstraction is implemented. Interfaces should be made as narrow as possible while still providing users with the operations they need to get the job done.

Modules in other languages

Modules and interfaces are supported in SML by structures and signatures, but they are also found in other modern programming languages in different form. In Java, interfaces, classes, and packages facilitate modular programming. All three of these constructs can be thought to provide interfaces in the more general sense that we are using in this course. The interface to a Java class or package consists of its public components. The Java approach is to use the javadoc tool to extract this interface into a readable form that is separate from the code. Because the interface consists of the public methods and classes, these are the program components that must be carefully specified.

The C language, on the other hand, works more like SML. Programmers write programs by writing source files (".c files") and header files (".h files"). Source files correspond to ML structures and header files correspond to signatures. Header files may declare abstract types and function types, just like in SML. Therefore, the place to write function specifications in C (and in C++) is in header files.

Java-style interface extraction makes life a little easier for the implementer because a separate interface does not have to be written as in SML. However, automatic interface extraction is also dangerous, because changes to any public components of the class will implicitly change the interface and possibly break client code that depends on that interface. The discipline provided by explicit interfaces is useful in preventing these problems for larger programming projects.