Modular programming is a valuable technique for building medium-sized and large programs because it allows a program to be broken up into loosely coupled pieces that can be developed largely in isolation. It facilitates local reasoning: the programmer(s) can think about the implementation of a piece of a program without full knowledge of the rest of the program. Rather, the rest of the program only needs to be understood abstractly, at the level of detail presented by the interfaces to the various modules on which the piece of code being worked on depends. This abstraction makes the programmer's job much easier; it is helpful even when there is only one programmer working on a moderately large program, and it is crucial when there is more than one programmer.
Because modules can be used only through their declared interfaces, the job of the module implementer is also made easier. The implementer has the flexibility to change the module as long as the module still satisfies its interface. The interface (signature) ensure that the module is loosely coupled to its clients. Loose coupling gives implementers and clients the freedom to work on their code mostly independently.
We have already talked about functional abstraction, in which a function hides the details of the computation it performs. Structures and signatures in SML provide a new kind of abstraction: data (or type) abstraction. The signature does not state what the type is. This is known as an abstract type.
A data abstraction (or abstract data type, ADT) consists of an abstract type along with a set of operations and values. ML of course provides a number of built-in types with built-in operations; a data abstraction is in a sense a clean extension of the language to support a new kind of type. For example, a record type has builtin projection operators, and datatypes have builtin constructors. For a data abstraction, its signature creates an abstraction barrier that prevents users from using its values in a way incompatible with the operations declared in the signature.
To successfully develop large programs, we need more than the ability to group related operations together. We need to be able to use the compiler to enforce the separation between different modules, which prevents bad things from happening. Signatures are the mechanism in ML that enforces this separation.
A signature
declares a set of types and values that any module implementing
it must provide. It consists of type
, datatype
,
exception
and val
specifications. The specifications
are a bit different than we are used to so far, they specify only names and
types, no values.
A signature specifies an "interface", what a particular module of code does, as opposed to an "implementation" of how a module operates. A signature for a stack might look something like the following:
signature STACK = sig type 'a stack exception EmptyStack val empty : 'a stack val isEmpty : 'a stack -> bool val push : ('a * 'a stack) -> 'a stack val pop : 'a stack -> 'a * 'a stack val map : ('a -> 'b) -> 'a stack -> 'b stack val app : ('a -> unit) -> 'a stack -> unit end
Note that this signature defines a parameterized stack type, an exception
called EmptyStack
, a constant called empty, and two functions that operate on
stacks. Note that this example declares the polymorphic stack type 'a
stack
, but doesn't define it. By convention signature names use all capital letters. A programmer can
use stacks based on these definitions without seeing the
implementation. Different possible data representations and corresponding code
could implement this stack, for instance using lists or arrays.
A structure
must be used to implement a signature
.
A structure must implement all the specifications of its signature. It
may implement more than what is in the signature, but those additional
definitions are accessible only inside the structure definition itself, not to
users of the structure.
There are two ways of specifying that a structure, say Stack
,
implements a signature, say STACK
: either by writing "Stack
: STACK
", or by writing "Stack
:> STACK
".
In both cases, the structure Stack
is a value implementation of the
interface. The difference is that in the second case (when using :>
)
the concrete implementation type defined in the structure is not visible or
accessible outside of the structure itself. The users of this structure can only
"see" the facts expressed in the signature; the actual implementation
type are hidden from them. In this case, we say that the type abstracted by the
structure is opaque.
Here is the simplest implementation of stacks that matches the above signature. It is implemented in terms of lists.
structure Stack :> STACK = struct type 'a stack = 'a list exception Empty val empty : 'a stack = [] fun isEmpty (l: 'a list): bool = List.null l fun push (x: 'a, l: 'a stack): 'a stack = x::l fun pop (l: 'a stack): 'a stack = case l of [] => raise Empty | x::xs => (x, xs) fun map (f:'a -> 'b) (l: 'a stack): 'b stack = List.map f l fun app (f:'a -> unit) (l: 'a stack): unit = List.app f l end
There are several ways to refer to the
elements of a structure. One is with fully qualified names: Stack.empty
,
Stack.push
,
Queue.pop
., etc Another is by using the open declaration, open
Queue
, which makes the names accessible without the need to specify the
prefix.
The abstraction barrier prevents the clients of the Stack module from using their knowledge of what poly is. In fact, the SML interpreter will not even print out values of a type like stack. Without the signature, we can see what stacks really are:
- Stack.push(1, Stack.empty); val it = [1]: int Stack.stack
Once the module is protected by its signature, values of the type poly are printed only as a dash:
- Stack.push(1, Stack.empty); val it = - : int Stack.stack
Hence, we cannot see how int Stack.stack
is implemented. The use of
signatures with opaque implementations ensures that programmers cannot depend on
the implementation inadvertently. They are free to change the
implementation later, without worrying about changing the code that uses it. Using
a new implementation data structure cannot break the user's code, since the user
could never access the internal implementation data structures.
Somehow its incredibly tempting to write code without thinking about the abstract signature or interface versus the concrete implementation. The value of opaque data abstractions show its advantages in large (e.g., tens or thousands of lines of code) programs. If the data abstraction clients write code that depends on the internal structure of the ADT, then changing the implementation will turn into a time-consuming and error-prone task of manually changing each use of the ADT in the application. With opaque types, no changes outside the structure are required.
We have talked about what makes a specification good; a few comments about what makes an interface good are also in order. Obviously, an interface should contain good specifications of its components. In addition, a well designed interface strikes a balance between simplicity and completeness. Sometimes it is better not to offer every possible operation that the users might want, particularly if those users can efficiently construct the desired computation by using other operations. An interface that specifies many components is said to be wide; a small interface is narrow. Narrow interfaces are good because they provide a simpler, more flexible contract between the user and implementer. The user is less dependent on the details of the implementation, and the implementer has greater flexibility to change how the abstraction is implemented. Interfaces should be made as narrow as possible while still providing users with the operations they need to get the job done.
Modules and interfaces are supported in SML by structures and signatures, but they are also found in other modern programming languages in different form. In Java, interfaces, classes, and packages facilitate modular programming. All three of these constructs can be thought to provide interfaces in the more general sense that we are using in this course. The interface to a Java class or package consists of its public components. The Java approach is to use the javadoc tool to extract this interface into a readable form that is separate from the code. Because the interface consists of the public methods and classes, these are the program components that must be carefully specified.
The C language, on the other hand, works more like SML. Programmers
write
programs by writing source files (".c files
") and header
files (".h files
"). Source files correspond to ML
structures and header files correspond to signatures. Header files may
declare
abstract types and function types, just like in SML. Therefore, the
place to
write function specifications in C (and in C++) is in header files.
Java-style interface extraction makes life a little easier for the implementer because a separate interface does not have to be written as in SML. However, automatic interface extraction is also dangerous, because changes to any public components of the class will implicitly change the interface and possibly break client code that depends on that interface. The discipline provided by explicit interfaces is useful in preventing these problems for larger programming projects.