Modular programming is a valuable technique for building medium-sized and large programs because it allows a program to be broken up into loosely coupled pieces that can be developed largely in isolation. It facilitates local reasoning: the programmer can think about the implementation of a piece of a program without full knowledge of the rest of the program. Rather, the rest of the program only needs to be understood abstractly, at the level of detail presented by the interfaces to the various modules on which the piece of code being worked on depends. This abstraction makes the programmer's job much easier; it is helpful even when there is only one programmer working on a moderately large program, and it is crucial when there is more than one programmer.
Because modules can be used only through their declared interfaces, the job of the module implementer is also made easier. The implementer has the flexibility to change the module as long as the module still satisfies its interface. The interface (signature) ensure that the module is loosely coupled to its clients. Loose coupling gives implementers and clients the freedom to work on their code mostly independently.
Suppose
that an implementer provides a module defined by a structure Struct
and a signature SIG
:
signature SIG = sig type t val f1: t1 val f2: t2 ... end structure Struct :> SIG = struct type t = ... fun f1(...)= ... end
The principle of modular programming is that we should be able to
use Struct
without knowing anything about it than what is given in SIG
.
Therefore
SIG
needs to include the specification of all the
functions
(values) and types that it declares.
Furthermore, it is a bad idea to also put specifications in the
structure Struct
,
because then there will be two copies of the same specification. A good
rule of
thumb for programming is to avoid copying code, because inevitably the
copies
diverge over time. Copying code is the fastest way to add new bugs to a
program.
Bug fixes and changes that are applied to one copy often don't make it
into the
others. Copying specifications creates the same kinds of problems. If
the
specification is included with the implementation, it's easy to forget
to update
the signature when the implementation is changed in a way that might
break user
code.
We have already talked about functional abstraction, in
which a
function hides the details of the computation it performs. Structures
and signatures provide a new kind of abstraction:
data (or type) abstraction. The signature SIG
does not state what the type
t
is; that type is hidden outside the structure (e.g., Struct
)
that implements SIG
. The type t
is known as
an abstract type.
A data abstraction (or abstract data type, ADT) consists of an abstract type along with a set of operations and values. ML of course provides a number of built-in types with built-in operations; a data abstraction is in a sense a clean extension of the language to support a new kind of type. For example, a record type has builtin projection operators, and datatypes have builtin constructors. For a data abstraction, its signature creates an abstraction barrier that prevents users from using its values in a way incompatible with the operations declared in the signature.
Suppose we want to develop a data abstraction for polynomials; that is, expressions of the form a+bx+cx2+dx3+...+ zxn. We'd like to be able at least to create polynomials and to add, subtract, and multiply them. The name of the variable is not important, so we only need to track of is the finite sequence of coefficients a, b, c, etc.
The following signature POLYNOMIAL
is an interface to a data
abstraction for polynomials:
signature POLYNOMIAL = sig (* Overview: A poly is a polynomial with integer coefficients. * For example, 2 + 3*x - x^3. *) type poly (* zero is the polynomial 0 *) val zero: poly (* singleton(c,d) is the polynomial c*x^d. * Requires: d >= 0 *) val singleton: int*int -> poly (* degree(p) is the degree of the polynomial: * the largest exponent of the polynomial with * a nonzero coefficient *) val degree: poly -> int (* evaluate(p,x) is p evaluated at x *) val evaluate: poly*int -> int (* coeff(p,n) is the coefficient c of the term * of form c*x^n, or zero if there is no such term. * Requires: d >= 0 *) val coeff: poly*int -> int (* plus, minus, times are +, -, * on polynomials, * respectively *) val plus: poly*poly -> poly val minus: poly*poly -> poly val times: poly*poly -> poly end
The type poly
is an abstract type that may be
implemented in
different ways by different structures that implement this signature.
By looking at the signature, we can't tell what poly
is. The signature prevents clients from depending on the module in
inappropriate
ways, by hiding all the things they're not supposed to know about. The
signature
also acts like a defensive perimeter that prevents clients from
constructing
values of a declared types except through the operations provided.
Thus, the
signature is a contract between the implementer of the module
and the
clients of the module. As long as both sides abide by the contract --
the
implementer by providing all of the operations that the signature
defines, and
the client, by only using the module in accordance with the signature
-- the two
sides can work without stepping on one another's toes. The client
doesn't need
to see or think about the code that the implementer is writing, and the
implementer doesn't have to think about the details of how clients are
using the
code.
This signature provides not only the types of the operations but also their specifications. As discussed earlier, the signature is the right place to put these specifications. There are two views of an data abstraction: the abstract view, which is the view from the standpoint of the user of the data abstraction, and the concrete view, which is the view of the implementer. The abstract view is presented by the module interface; the concrete view by the module implementation. A well-designed data abstraction can be used entirely from the abstract view, without knowing the concrete type that represents the abstract values, or the actual algorithm being used to implement the operations. Thus, the specifications that appear in the signature should always be from the abstract view, not the concrete view, which would violate the abstraction barrier.
The signature contains a new kind of specification: a data
abstraction
overview, introduced by a comment starting with "Overview:
".
The purpose of the overview is to give the abstract view of the values
of the
data abstraction. The overview does not provide any information about
how values
of the abstract type are represented. When writing a data abstraction
overview, it is often useful to provide an example
or two of abstract values. Examples are an opportunity to define
notation for talking about the values of the abstract type;
this notation can then be used when specifying the functions that are
the operations of the abstract type.
The singleton
and coeff
operations are both partial functions because they are not defined for
negative
exponents, and hence have requires clauses. In the specifications
for plus
, minus
, times
, we
rely on the reader's understanding of polynomials to
avoid writing tedious specifications of the form, "plus(p,q) is
p+q
",
etc. It is acceptable and even a good idea to rely on the reader's
likely
knowledge to avoid long specifications. However, as with all writing
tasks, this
requires a judgment about your likely reader. If that reader is
yourself
(perhaps at some time in the future), it is relatively easy to assess
what will
be comprehensible! But when writing code for a larger organization more
care must be
taken.
The right way develop modules is to figure out the signature (interface) first, then write the structure (module implementation) to match the interface. This approach has two big advantages. First, a lot of design problems become evident when the signature is being written. It's much lower cost in terms of development time to get the design right before trying to implement the module. Another advantage is that code can be written using the interface even before the implementation is complete; the module client and module implementer can work in parallel, speeding up development. And because the interface is known by both parties, it is more likely that when they finish their work, the complete program will work as intended.
Choosing the right representation for a data abstraction is the first step in any implementation. The following is a simple representation of polynomials:
type poly = int list
The first item in the list will be the coefficient a, the second one b, and so on. The number of items in the list will tell us the degree of the polynomial. In addition, we will try to make sure that the list never ends in a trailing sequence of zeros, because that would be inefficient and also might mislead us about the degree of the polynomial. The empty list will represent the polynomial 0. This is just one of many reasonable ways to represent a polynomial.
Now we can start to
implement the operations specified in the signature POLYNOMIAL
.
For example,
the function degree
:
fun degree(p: poly):int =
case p of
[] => 0
| _ => length(p) - 1
How about polynomial addition?
fun plus(p: poly, q: poly): poly = case (p, q) of (nil, q) => q | (p, nil) => p | (a::p2, b::q2) => (a+b)::plus(p2,q2)
Actually this doesn't quite work. Why? Because the result might have
trailing
zeros if the two polynomials cancel each other out, causing the degree
function to return the wrong result.
- plus([1,2], [1,~2]); val it = [2,0]: poly - degree(it) val it = 1: int
We can avoid this by checking as follows:
fun plus(p: poly, q: poly): poly = case (p,q) of (nil,q) => q | (p, nil) => p | (a::p2, b::q2) => case (a+b)::plus(p2,q2) of [0] => [] | r => r
- plus([1,2], [1,~2]); val it = [2]: poly
Here is more of the implementation:
structure Polynomial :> POLYNOMIAL = struct type poly = int list
val zero: poly = [] fun singleton(coeff: int, degree: int):poly = case (coeff, degree) of (0, _) => zero | (c, 0) => [c] | (c, d) => 0::singleton(c, d-1) fun degree(p:poly):int = length(p)-1 fun plus(p:poly, q:poly):poly = case (p,q) of (nil,q) => q | (p, nil) => p | (a::p2, b::q2) => case (a+b)::plus(p2,q2) of [0] => [] | r => r
fun evaluate(p:poly, x:int): int = case p of nil => 0 | a::q => a + x*evaluate(q, x) ... end
We can provide this module to other programmers and they can then
create
polynomials using Polynomial.zero
and Polynomial.singleton
and manipulate them with Polynomial.degree
and Polynomial.plus
.
In fact, they don't even have to know that polynomials are really lists
of
integers.
The abstraction barrier prevents the clients of the Polynomial module from using their knowledge of what poly is. In fact, the SML interpreter will not even print out values of a type like poly. Without the signature, we can see what poly's really are:
- Polynomial.zero; val it = []: Polynomial.poly
Once the module is protected by its signature, values of the type poly are printed only as a dash:
- Polynomial.zero; val it = - : Polynomial.poly
Without the abstraction barrier, users might get into trouble. For
example, a
client using the Polynomial
structure might see
that
polynomials are really lists and write code like this:
let z: Polynomial.poly = [2,3,4] in ... end
It looks convenient; what's wrong with it? Two things: this code
depends on
the actual type used to represent polynomials. An implementer cannot
change
between int list
and another representation of
polynomials without breaking this code; therefore
we've lost loose coupling. Second, there is nothing that prevents the
client
from constructing lists that violate our no-trailing-zeros condition.
The
operations defined on polynomials will not work properly if polynomials
are
constructed out of such lists. In general, a misbehaving client could
cause the
program to give wrong answers or even crash with an exception in a
module that
another programmer wrote! This is bad because it makes it hard to
assign blame
for bugs.
The abstraction barrier gives the implementer has the freedom to
change what
the poly type is bound to and correspondingly change the implementation
of degree
, plus
, zero
,
etc. to match.For example, the implementer might decide to use the SML vector
type instead of list
, resulting in a more efficient
implementation
of polynomials:
structure Polynomial = struct type poly = int vector
val zero:poly = Vector.fromList([]) fun singleton(coeff: int, degree: int):poly = case (coeff, degree) of (0, _) => zero | (c, d) => Vector.tabulate(d+1, fn(n:int) => if n=d then c else 0) fun degree(p:poly):int = (Vector.length p) - 1 ... end
We have talked about what makes a specification good; a few comments about what makes an interface good are also in order. Obviously, an interface should contain good specifications of its components. In addition, a well designed interface strikes a balance between simplicity and completeness. Sometimes it is better not to offer every possible operation that the users might want, particularly if those users can efficiently construct the desired computation by using other operations. An interface that specifies many components is said to be wide; a small interface is narrow. Narrow interfaces are good because they provide a simpler, more flexible contract between the user and implementer. The user is less dependent on the details of the implementation, and the implementer has greater flexibility to change how the abstraction is implemented. Interfaces should be made as narrow as possible while still providing users with the operations they need to get the job done.
Modules and interfaces are supported in SML by structures and signatures, but they are also found in other modern programming languages in different form. In Java, interfaces, classes, and packages facilitate modular programming. All three of these constructs can be thought to provide interfaces in the more general sense that we are using in this course. The interface to a Java class or package consists of its public components. The Java approach is to use the javadoc tool to extract this interface into a readable form that is separate from the code. Because the interface consists of the public methods and classes, these are the program components that must be carefully specified.
The C language, on the other hand, works more like SML. Programmers
write
programs by writing source files (".c files
") and header
files (".h files
"). Source files correspond to ML
structures and header files correspond to signatures. Header files may
declare
abstract types and function types, just like in SML. Therefore, the
place to
write function specifications in C (and in C++) is in header files.
Java-style interface extraction makes life a little easier for the implementer because a separate interface does not have to be written as in SML. However, automatic interface extraction is also dangerous, because changes to any public components of the class will implicitly change the interface and possibly break client code that depends on that interface. The discipline provided by explicit interfaces is useful in preventing these problems for larger programming projects.