# Abstraction and Specification
* * *
* abstraction by specification
* specification of functions
* * *
A *specification* is a contract between a *client* of some unit of code
and the *implementer* of that code. The most common place we find
specifications is as comments in the interface (`.mli`) files for a
module. There, the implementer of the module spells out what the client
may and may not assume about the module's behavior. This contract makes
it clear who to blame if something goes wrong: Did the client misuse
the module? Or did the implementer fail to deliver the promised
Specifications usually involve preconditions and postconditions.
The preconditions inform what the client must guarantee about inputs
they pass in, and what the implementer may assume about those inputs.
The postconditions inform what they client may assume about outputs
they receive, and what the implementer must guarantee about those outputs.
An implementation *satisfies* a specification if it provides the behavior
described by the specification. There may be many possible implementations
of a given specification that are feasible. The client may not assume anything
about which of those implementations is actually provided. The implementer,
on the other hand, gets to provide one of their choice.
Good specifications have to balance two conflicting goals; they must be
* **sufficiently restrictive**, ruling out implementations that
would be useless to clients, as well as
* **sufficiently general**, not ruling out implementations that
would be useful to clients.
Some common mistakes include not stating enough in preconditions, failing to
identify when exceptions will be thrown, failing to specify behavior at
boundary cases, writing operational specifications instead of definitional
and stating too much in postconditions.
Writing good specifications is a skill that you will work to master the
rest of your career. It's hard because the language and compiler do
nothing to check the correctness of a specification: there's no type
system for them, no warnings, etc. (Though there is ongoing research on
how to improve specifications and the writing of them.) The
specifications you write will be read by other people, and with that
reading can come misunderstanding. Reading specifications requires close
attention to detail.
Specifications should be written quite early. As soon as a design decision
is made, document it in a specification. Specifications should continue
to be updated throughout implementation. A specification becomes obsolete
only when the code it specifies becomes obsolete and is removed from the
Clear specifications serve many important functions in software
development teams. One important one is when something goes wrong,
everyone can agree on whose job it is to fix the problem: either the
implementer has not met the specification and needs to fix the
implementation, or the client has written code that assumes something
not guaranteed by the spec, and therefore needs to fix the using code.
Or, perhaps the spec is wrong, and then the client and implementer need
to decide on a new spec. This ability to decide whose problem a bug is
prevents problems from slipping through the cracks.
The client should not assume more about the implementation than is given
in the spec because that allows the implementation to change. The
specification forms an *abstraction barrier* that protects the
implementer from the client and vice versa. Making assumptions about the
implementation that are not guaranteed by the specification is known as
*violating the abstraction barrier*. The abstraction barrier enforces
local reasoning. Further, it promotes *loose coupling* between
different code modules. If one module changes, other modules are less
likely to have to change to match.
## Abstraction by specification
Abstraction enables modular programming
by hiding the details of implementations. Specifications are a part
of that kind of abstraction: they reveal certain information about
the behavior of a module without disclosing all the details of the
*Locality* is one of the benefits of abstraction by specification.
A module can be understood without needing to examine its implementation.
This locality is critical in implementing large programs, and even in
in implementing smaller programs in teams. No one person can keep the entire
system in their head at a time.
*Modifiability* is another benefit. Modules can be reimplemented
without changing the implementation of other modules or functions.
Software libraries depend upon this to improve their functionality
without forcing all their clients to rewrite code every time the library
is upgraded. Modifiability also enables performance enhancements: we
can write simple, slow implementations first, then improve bottlenecks
## Specifications for functions
A specification is written for humans to read, not machines. Specs can
take time to write well, and it is time well spent. The main goal is
clarity. It is also important to be concise, because client programmers
will not always take the effort to read a long spec. As with anything we
write, we need to be aware of your audience when writing specifications.
Some readers may need a more verbose specification than others.
A well-written specification usually has several parts communicating
different kinds of information about the thing specified. If we know
what the usual ingredients of a specification are, we are less likely to
forget to write down something important. Let's now look at a recipe for
How might we add a specification to `sqr`, assuming that it is a
square-root function? First, we need to describe its result. We will
call this description the *returns clause* because it is a part of the
specification that describes the result of a function call. It is also
known as a *postcondition*: it describes a condition that holds
after the function is called. Here is an example of a returns clause:
(* returns: [sqr(x)] is the square root of [x]. *)
For numerical programming, we should probably add some information about
how accurate it is.
(* returns: [sqr(x)] is the square root of [x].
* Its relative accuracy is no worse than 1.0*10^-6. *)
Similarly, we might write a returns clause for a `find` function. It
is okay to leave the introductory "`returns:`" implicit:
(* [find lst x] is the index of [x] in [lst], starting from zero. *)
A good specification is concise but clear—it should say enough that
the reader understands what the function does, but without extra
verbiage to plow through and possibly cause the reader to miss the
point. Sometimes there is a balance to be struck between brevity and
These two specifications use a useful trick to make them more concise:
they talk about the result of applying the function being specified to
some arbitrary arguments. Implicitly we understand that the stated
postcondition holds for all possible values of any unbound variables
(the argument variables).
The specification for `sqr` doesn't completely make sense because the
square root does not exist for some `x` of type `real`. The mathematical
square root function is a *partial* function that is defined over only
part of its domain. A good function specification is complete with
respect to the possible inputs; it provides the client with an
understanding of what inputs are allowed and what the results will be
for allowed inputs.
We have several ways to deal with partial functions. A straightforward
approach is to restrict the domain so that it is clear the function
cannot be legitimately used on some inputs. The specification rules out
bad inputs with a *requires clause* establishing when the function may
be called. This clause is also called a *precondition* because it
describes a condition that must hold before the function is called.
Here is a requires clause for `sqr`:
(* [sqr(x)] is the square root of [x].
* Its relative accuracy is no worse than 1.0x10^-6.
* requires: [x >= 0]
This specification doesn't say what happens when `x < 0`, nor does it
have to. Remember that the specification is a contract. This contract
happens to push the burden of showing that the square root exists onto
the client. If the requires clause is not satisfied, the implementation is
permitted to do anything it likes: for example, go into an infinite loop
or throw an exception. The advantage of this approach is that the
implementer is free to design an algorithm without the constraint of
having to check for invalid input parameters, which can be tedious and
slow down the program. The disadvantage is that it may be difficult to
debug if the function is called improperly, because the function can
misbehave and the client has no understanding of how it might misbehave.
Another way to deal with partial functions is to convert them into
total functions (functions defined over their entire domain). This
approach is arguably easier for the client to deal with because the
function's behavior is always defined; it has no precondition. However,
it pushes work onto the implementer and may lead to a slower
How can we convert `sqr` into a total function? One approach that is
(too) often followed is to define some value that is returned in the
cases that the requires clause would have ruled; for example:
(* [sqr(x)] is the square root of [x] if [x >= 0],
* with relative accuracy no worse than 1.0x10^-6.
* Otherwise, a negative number is returned.
This practice is not recommended because it tends to encourage broken,
hard-to-read client code. Almost any correct client of this abstraction will
write code like this if the precondition cannot be argued to hold:
if sqr(a) < 0.0 then ... else ...
The error must still be handled in the `if` expression, so
the job of the client of this abstraction isn't any easier than with a
requires clause: the client still needs to wrap an explicit test around
the call in cases where it might fail. If the test is omitted, the
compiler won't complain, and the negative number result will be silently
treated as if it were a valid square root, likely causing errors later
during program execution. This coding style has been the source of
innumerable bugs and security problems in the Unix operating systems and
its descendents (e.g., Linux).
A better way to make functions total is to have them raise an exception
when the expected input condition is not met. Exceptions avoid
the necessity of distracting error-handling logic in the client's code. If
the function is to be total, the specification must say what exception
is raised and when. For example, we
might make our square root function total as follows:
(* [sqr(x)] is the square root of [x]
* with relative accuracy no worse than 1.0x10^-6.
* raises: [Negative] if [x < 0].
let sqr x = ...
Note that the implementation of this `sqr` function must check whether
`x>=0`, even in the production version of the code, because some client
may be relying on the exception to be raised.
It can be useful to provide an illustrative
example as part of a specification. No matter how clear and well written
the specification is, an example is often useful to clients.
(* [find lst x] is the index of [x] in [lst], starting
* from zero.
* example: [find ["b","a","c"] "a" = 1] *)
## How not to write comments
In addition to specifying functions, programmers need to provide
comments in the body of the functions. In fact,
programmers usually do not write enough comments in their code. But
this doesn't mean that adding more comments is always better. The wrong
comments will simply obscure the code further. Shoveling as many
comments into code as possible usually makes the code worse! Both code
and comments are precise tools for communication (with the computer and
with other programmers) that should be wielded carefully.
It is particularly annoying to read code that contains many interspersed
comments (typically of questionable value), e.g.:
let y = x+1 (* make y one greater than x *)
For complex algorithms, some comments may be necessary to explain how
the code implementing the algorithm works. Programmers are often tempted
to write comments about the algorithm interspersed through the code. But
someone reading the code will often find these comments confusing
because they don't have a high-level picture of the algorithm. It is
usually better to write a paragraph-style comment at the beginning of
the function explaining how its implementation works. Explicit points in
the code that need to be related to that paragraph can then be marked
with very brief comments, like `(* case 1 *)`.
Another common but well-intentioned mistake is giving variables long,
descriptive names, as in the following verbose code:
let number_of_zeros_in_the_list =
fold_left (fun (accumulator:int) (list_element:int) ->
accumulator + (if list_element=0 then 1 else 0)) 0 the_list
Code using such long names will be very verbose and hard to read.
Instead of trying to embed a complete description of a variable in its
name, use a short and suggestive name (e.g., `zeroes` or `nz`), and if
necessary, add a comment at its declaration explaining the purpose of
A related bad practice is to encode the type of the variable in its
name, e.g. naming a variable `count` a name like `i_count` to show that
it's an integer. Instead, just write a type declaration. If the variable
is so far from its type that you can't see the type declaration, the
code should probably be restructured anyway.
## Terms and concepts
* abstraction by specification
* example clause
* partial function
* raises clause
* requires clause
* returns clause
* total function
## Further reading
* [*Program Development in Java: Abstraction, Specification, and
Object-Oriented Design*][liskov-guttag], chapters 3 and 9, by Barbara
Liskov with John Guttag.