Lecture 25
Large Systems: Requirements, Specifications, Interfaces, Refactoring.

The programs we have written or examined in this course were at most a few hundred lines of code. This length was sufficient to illustrate the concepts, datastructures, and programming techniques that we were studying. Many real-life applications, however, are much bigger, often exceeding 10,000 - 100,000 lines of code (LOC), and sometimes having sizes of up to a few million LOC. The size of the source code of modern operating systems is in the range of tens of millions of LOC; precise numbers are to come by, for proprietary reasons, but also due to methodological reasons. (What is a 'line of code, really? Should the accept the authors' formatting, or should we reformat the input before counting?)

Experience has shown that large programs are characterized by new behaviors that emerge due to the sheer complexity of the underlying code. In this lecture we will discuss some of the means for managing and mastering this complexity.

Large programs will most often be written by teams of programmers, over an extended period of time. The inherent complexity of a large program prevents any individual from comprehending it in its entirety; inevitably programmers will acquire true expertise in only part of the program.  The individuals involved in development must coordinate and document their work, and must be able to adapt to changes in the composition and/or administrative structure of the team. Once written, large programs tend to be used for extended periods of time during which the original requirements almost always evolve, while the underlying technologies (software: compilers, operating systems; hardware: processors speed, memory size, hard disk size, network bandwidth and latency) change spectacularly. Changes in requirements and/or the underlying technologies often impose changes in both the structure and functionality of programs.

Modularity and Interfaces

Even large, monolithic programs have an internal structure that emerges naturally; the interaction (and dependency) between certain parts of the program that address a specific sub-problem are much stronger than the interaction between these and other parts.

In languages with imperative features, where values can be changed, one further disadvantage of large, monolithic programs is that any global value might be changed from anywhere else, even inadvertently (for example, because a programmer misspells a variable name).

One solution to these problems is to carefully break up large programs into parts (subsystems) that can be designed, developed and tested separately before being put together to solve the problem at hand. The usual name for these subsystems is that of module.

Good modular design decomposes the problem into units that map naturally onto subproblems of task at hand. A typical module will have a few hundred lines of code, and as such, it is much easier for programmers to understand it (compared to the entire program). Testing at the module level allows rapid isolation and resolution of errors, significantly speeding up development. Once modules are debugged, modules can be put together to build and debug the final system. Note that it is possible for a system composed of individually correct modules not to be correct as a whole (e.g. when modules do not interact correctly).

It is practical to isolate modules as much as possible from each other. Many programming languages provide specific mechanisms that enforce, or at least encourage such strict isolation. There are several goals that can be achieved by isolating modules: (a) the source code can be maintained separately; (b) datastructures in one module can not be changed from the other module, or can only be changed in controlled ways; (c) if modules allow only for the minimum amount of interaction that is needed for the correct perfomance of their task, then changing the internal structure of modules becomes easier (we say that modules are loosely coupled).

Programmers control the interaction between a module and and its users by defining interfaces. Simply put, an interface is a contract, in which the module guarantees that it will implement all the advertised functionality, while the users guarantee that they will only rely on advertised functionality (and they will use the module only through the respective interface - this latter requirement is often automatically enforced by compilers).

Programming languages differ widely in the way in which they support and/or enforce modular programming. The relevant language mechanisms for three well-known languages are shown below:

C Java ML
Module source file class structure
Interface header files interfaces, or
public members of a class
signature

Note that the Java interface is not identical to the notion of interface that we discuss here. The interface of a class that does not implement any interfaces is the collection of public functions and variables that the class declares. Besides their role in specifying the interaction between a module (class) and its users, a Java interface also has a role in simulating the implementation of multiple inheritance (which exists, for example, in C++).

Inexperienced programmers, often under the pressure to show quick and tangible results, are often tempted to split a bigger problem into modules informally, and start writing code before clearly defining interfaces. This can be an extremely dangerous mistake, as it might lead to incompatibilities between modules. Fixing such problems can be very expensive, and often involves extensive rewrites of the source code.

There is a natural tension between module implementers and users. Users are often interested in having complex, powerful functionality accessible through the interface ("give me a function for every small problem I need to solve"). Implementers, on the other hand, are interested in exposing only a small number of simple operations, which are easy to implement, test, and maintain. The situation is not always so clear cut, however, as sometimes implementers are tempted to expose functionality that is complex and requires a lot of effort just because it is "cool."   Thus good interface design generally requires both implementers and users of the module, to balance the

Designing good interfaces requires a good understanding of the problem at hand, good skills, and experience. In many cases complex functionality is rarely, if ever, used, but its implementation might require a lot of resources and might have forced design decisions that make the entire module much less efficient than it could otherwise be. Some principles of interface design have been distilled down to statements like "do a few things, but do them well," or "don't do it just because it's cool." While every case is different, you should be strongly biased toward simple interfaces and loose coupling between modules.

Requirements and Specifications

Most programs are written to meet some need.  The requirements for a program are a description of that need. In a business setting, the requirements are generally created by product management or marketing people, who do not have technical backgrounds.  Good requirements are difficult to write, and are generally best written by someone who knows about the domain and can work closely with users.  Requirements invariably change over time, particularly for new software where users and designers are trying to envision what the software will do, rather than reacting to actual software.

For most software the requirements are too high-level and ambiguous to be useful in guiding the development of the software.  Generally the software is described in more detail in what is called a functional specification.   A functional specification describes what the software does - its functionality - without describing how it does it.  The separation of the what from the how is important both in providing flexibility in technical design and implementation but also in providing flexibility in the user interface design.  That is, the functions are described independent of things such as layout and design of a GUI.  Good functional specifications are critical to software development and are not easy to write.

Technical specifications provide an even more detailed description of what a program does.  Generally the functional specification is used to develop a technical design.  This design specifies aspects of how the functionality is to be implemented. It provides both a higher level description than the code itself, and documents various assumptions underlying the design.  We have seen very simple examples of technical specifications in the comments that document things such as preconditions, or constraints on the input (e.g., that an integer be non-negative).  Analogously, postconditions document constraints on the output.

Here is a simple example:

(*
   requires: x >= 0
   results: r = sqrt(x) >= 0; | r * r - x | < 0.0001
   checks: x >=0, if not true raises exception Fail "negative argument"
   effects: prints the value of the square root
*)
fun sqrt (x: real): real = 
  if x < 0.0
  then raise Fail "negative argument"
  else
    let
      fun helper(root: real): real = 
        if Real.abs(root * root - x) <= 0.0001
        then root
        else helper(0.5 * (root + x/root))
      val root = helper 1.0
    in
      print ("root = " ^ (Real.toString root) ^ "\n");
      root
    end

- sqrt 4.0;
root = 2.00000009292
val it = 2.00000009292 : real
-
- sqrt ~4.0;

uncaught exception Fail: negative argument
  raised at: stdIn:62.14-62.38

In keeping with the general principle that an interface must expose as little functionality as possible, specifications should be kept as simple and as abstract as possible. The module implementor must assume that any information he exposed in the past has been relied on by users. If a lot of information has been disclosed through the interface, then the respective module can not be changed easily (and - if changed - must respect the specification as written originally). Such modules are said to be tightly coupled.

We say that a specification is definitional if the respective specification states what the interface does, but provides as little detail as possible about how the functionality is achieved. An operational specification provides a lot of information on how the functionality is achieved. Definitional specifications tend to lead to loose coupling, while operational specifications tend to lead to tight coupling.

Consider the following two partial specifications:

version 1: returns j such that a[j] = y
version 2: loop on j from 0 to n - 1, 
             compare a[j] to y, if equal, returns j

Note: These specifications are incomplete (what happens if there is no j such that a[j] = y?), and are only meant to illustrate the difference between definitional and operational specifications.

Version 1 should be preferred to version 2 whenever possible. To clarify why this is so, let us assume that the array a holds a complex datastructure, whose comparison operation is performed by a user-supplied custom function. A user could define a comparison operation that had side effects, and the rest of the program might come to rely on these side effects occuring in a certain order, corresponding to the loop index growing from 0 upwards. If this happens, the modules become tightly coupled, and we can not change the module implementation in ways that do not implement the linear search pattern the interface exposes. On the other hand, version 1 would have forced the module's user not to rely on comparisons to occur in a certain order. This might require a slightly more complex code at the user's end, but this is often not the case in practice. Version 1 does not disclose the existence of a linear search pattern even though it might actually use it in the implementation. The advantage is that now we can change the function more easily, and we are not prevented from renouncing the linear search pattern.

Version 1 is also an example of non-deterministic specification, i.e. it allows for several possible results (any j such that a[j] = y is acceptable). A specification given as "returns the smallest j such that a[j] = y" would make the answer unambiguous, i.e. it would be deterministic. We note that the specification we have just given is still better than version 2 above - we do not necessarily have to search in a linear fashion in the array to find the smallest j with the given property.

Natural languages are inherently ambiguous, so their use in specifications might lead to misunderstandings. Formal specifications use a special, unambiguous notation to describe clauses. Being precise, these specifications reduce the possibility of a misunderstanding. Their other big advantage is that formal specifications can be processed using specialized tools which can determine whether they are sound, and whether a certain piece of code (say, function) actually conforms to its specification. Automated program checking greatly speeds development and reduces costs, while offering guarantees of program correctness. Next time you fly try to decide whether you would want the software that keeps the airplane in the air to have been debugged by a tired hacker working overnight, or by tools that used formal specifications to warrant that there are no hidden errors. Unfortunately, developing formal specifications is not always easy, and many domains prove remarkably resilient to efforts in this direction.

Module Hierarchies

Truly large programs can not be simply split into an unstructured collection of modules; at some point the number of modules becomes so great that the system becomes unmanageable. The answer to this problem is given by module hierarchies. Hierarchies of modules emerge natually if one breaks down the problem into a few big subproblems first, then attacks and further decomposes each subproblem using a similar strategy, and repeating this process as long as necessary to obtain subproblems of reasonable size. This is not always easy to do well, and wrong choices will lead to subproblems that exhibit many mutual dependencies. A good decomposition minimizes the number and complexity of dependencies.

In such a system, one will have modules that implement functionality which does not rely on any other module. There will be, however, modules that integrate the functionality of simpler, more basic modules. These dependencies naturally create a hierarchy of modules. Care must be taken not to introduce circular dependencies in module hierarchies; e.g. one does not want module A to depend on module B, to depend on module C, which in turn depends on module A (can you tell why?). Good module hierarchies can be represented as trees or DAGs.

Once modules and interfaces have been defined, one must proceed to implementation. There are two basic implementation methods: top-down and bottom-up.

The bottom-up method starts with the implementation of modules at the lowest level of the hierarchy, followed by the implementation of modules that depend only on modules on the lowest level, and proceeds in similar fashion up to the root of the hierarchy. The bottom-up method is preferred by many because code that is developed can be tested immediately (all the functionality a module depends on has already been implemented). Immediate testing allows for the discovery and early fix of errors, but it also allows for the early discovery of low-level efficiency problems. The downside of this method is that the "big picture" is somewhat blurred during development, and high-level, systemic design flaws are discovered only late, possibly too late for any changes to be possible.

The top-down method starts with the development of top-level modules, then implements the modules upon which top-level modules rely, and proceeds in a similar fashion toward the modules that are lowest in the hierarchy. The advantages and disadvantages of this method mirror those of the bottom-up method. High-level design flaws become visible early, and arrow for quick corrections and - often - for higher-quality specifications. The major disadvantage is that testing becomes much more difficult in the early stages of a project, as not all the functionality needed by already implemented modules is available. Testing is not impossible, however; one can develop modules that simulate the behavior of as-of-yet unimplemented modules, for example (note that such simulation will often not allow for full-fledged testing of existing modules).

Testing

Testing is an incredibly important part of any programming process, and particularly for the development of large systems that involve many programmers over a long time period.  As you have even seen in this course, good testing makes all the difference between code that only appears to work and code that works reliably for a wide range of test conditions.  Several different types of testing are generally used in combination in any large software project, these include: unit testing, regression testing, black box testing and white box testing.  In large projects where there is a separate testing group, the software developers are still involved in the design and implementation of many of the tests.

Unit testing refers to tests that check whether a given module operates according to its specification.  These tests are local in the sense that they test the operation of a module or relatively small piece of code.  It is good practice to have a clear specification for a module and to write unit tests at the time the module is written.  There are packages such as JUnit for Java that assist with this.

Regression testing refers to running tests that ensure that the system continues to operate as it did previously.  Regression tests are generally automated and run on a very regular basis.  For example many development projects do nightly builds of the software, and run regression tests on each new build.  As developers check in code it is made part of the nightly build and their code is expected not to cause the build to fail or the regression tests to fail.  Tests are only rarely removed from a regression suite (only if functionality is actually removed from the software which is rare).  New tests are added as new functionality is added, and also as bugs are fixed, because bugs are generally a good indication of something that was not tested adequately.

Black box testing refers to testing of the code without any access to the code itself - treating the code as a "black box".  For instance, user testing is a form of black box testing.  There are a number of black box testing packages that allow one to simulate series of user actions and check that certain outcomes occur for those series of actions.  Most black box testing is at best semi-automated, however.

White box testing refers to testing that has access to the code.  This can include simple access to the code, such as via public interfaces (API's), or can include special "hooks" that are in the code in order to test certain functions.  Unit tests are a form of white box test, but the term white box testing generally is used to refer to tests of multi-component interaction rather than localized unit tests.

Development Process

The methodology by which a software development team works is very important to ensuring coordination between the members of the team.  There are a wide range of software development methodologies, and often quite a bit of posturing over which methodology is best.  The best methodology depends on the kind of software being developed, the skills and experience of the team, and where the software is in its life cycle (e.g., a new system, large functionality changes to an existing system, or maintenance and bug fixing of an existing system).

It is widely recognized today that for the majority of software development projects, iterative development processes that to some degree interleave requirements gathering, functional specification, technical design and specification, and testing are the most rapid and effective way to develop good software. This is in contrast to so-called waterfall methods where there are successive stages for each such step.  Iterative methods often still have distinct stages but they are relatively short and are iterated to improve the design and implementation.

Changing Module Implementations. Refactoring.

Clearly code is written or modified in order to add new functionality to a piece of software or to fix bugs in that software.  There is a third very important reason to write code which is to improve existing software without changing its functionality.  This is referred to as refactoring.

As mentioned above, big programs tend to have a long life, during which technologies change, requirements evolve, and - likely - various flaws of the original implementation become visible. Any of these can prompt the rewrite of some parts of the code base. Even during shorter time periods, for instance while a large system is being implemented, assumptions made in the implementation of certain modules can turn out to not be the case, resulting in code that is inefficient, or overly complex for what it is actually used for.  Rewrites must generally be accomplished so that the program remains functional at all times. In a well-designed system changes should only occur inside modules, and all these changes should leave the interfaces unaffected. Refactoring is a technique that allows for the internal restructuring of a module while its external behavior remains unchanged. Refactoring is changing the code to improve it without changing its functionality.  Such improvements might be in efficiency or in simplifying the code for easier maintenance and readability.

Certain aspects of refactoring are highly structured, and consist of the systematic identification (detection) of certain design patterns. The design patterns are called code smells (seriously!). Big repositories of smells grouped by categories are available on the web, and they are also available in the published literature.

Refactoring is used to improve the structure, design, readability, efficiency, compactness and other such aspects of the code, but not its functionality.  Generally refactored code is substantially shorter than older code.  Some of the signs that code needs to be refactored (code smells) are duplicated code, nearly duplicated code, different modules (or objects) that share nearly all the same operations.  All of these are cases where, when the code is refactored it is generally substantially shorter.

Assume that in many functions internal to a module a pair of arguments with the meaning "start date" and "end date," respectively, is always used together. In such a case it is a good idea to create a new type, say, a pair of dates, with the meaning of "interval." This change better expresses the logic of the situation, makes the program easier to understand, reduces the number of function arguments (making the program more readable), and reduces the possibility of errors (since the chance of pairing incorrect dates is reduced). All these advantages emerge from recognizing a simple smell! Systematically identifying smells and restructuring programs accordingly can greatly improve the performance, readability, and maintainability of a big program.

We have stated that refactoring should not change the external behavior or functionality of modules. But how do we know that the behavior did not change? Ideally, we would have generated formal specifications and we would have used automated tools to confirm the correctness of our implementation unambiguously. In practice, however, we need to rely on testing, particularly regression testing, as discussed above.