Abstraction

Topics:

What is abstraction/information hiding?
Function abstraction
Clients vs. implementers
Data abstraction
Abstract Data Types (ADTs)
ADTs via visibility qualifiers
ADTs via Java interfaces
Programming with interfaces

Designing and building things that work is at the heart of engineering. Engineers have to work together in teams to build complex systems, whether software, hardware, cars, planes, or bridges. This complexity is a major source of engineering cost and errors. With the wrong engineering approach, every team member can end up needing to understand every other team member's work, which means the total effort requires to build a system of size n is proportional to n²!

The solution to complexity is abstraction, also known as information hiding. Abstraction is simply the removal of unnecessary detail. The idea is that to design a part of a complex system, you must identify what about that part others must know in order to design their parts, and what details you can hide. The part others must know is the abstraction.

Function abstractions

Programming languages like Java are designed to support you in creating abstractions. You are already familiar with one kind of abstraction, a function abstraction. Every time you declare a function (in Java, a method), you are creating an abstraction by giving a name to a piece of code. Other programmers can use your abstraction by invoking the function. If you have done a good job of documenting the function, they don't have to read your code to use it.

For example, you might declare a function search as follows:

/** Return an index of x in a.
 *  Requires: a is sorted in ascending order, and x is found in the array a
 *  somewhere between indices left and right.
 */
int search(int x, int[] a, int left, int right) {
    ...
}

Another programmer can now use this function without understanding how it works. A function abstraction creates two roles: the client and the implementer. In the example just given, the client is the programmer writing code that calls the function (or the code itself). The implementer is the programmer who write the implementation: the code of search. A function abstraction works by allowing the client to call a function written by the implementer without necessarily understanding how it is implemented.

A smart implementer will probably implement the function search as a binary search, but this fact need not be part of the abstraction; the abstraction might simply specify that the run time is logarithmic in the length of the array.

Data abstractions

Another important kind of abstraction, with which you may be less familiar, is data abstraction. Here is the idea is to hide information about data and how it is represented in the program. For example, imagine a client using a data structure in order to keep track of a collection of elements. The client might not care exactly what data structure is used. For example, the client might simply need a data structure providing operations add, remove, contains, and size. The client can think of this data as simply a collection of elements -- that is the abstraction. The implementer must create an actual data structure -- for example, a linked list -- that provides the necessary operations on the data.

This data abstraction is an example of an Abstract Data Type, or ADT for short. An ADT consists of a named type whose representation is hidden (abstract), along with a set of operations that can be used on that type. The operations of the type are known as the interface of the ADT, a term that should not be confused with the Java language mechanism of the same name.

Benefits of abstraction

The interface creates a contract between the client and the implementer. The client knows the interface but should not have to know how the implementer provided that interface. This has several benefits. First, the interface protects the implementer from incorrect use of the data structure by the client. An interface creates an abstraction barrier that protects the implementation. A client that uses knowledge of the implementation not contained in the interface is violating the abstraction barrier. The interface to a data abstraction is also said to encapsulate the implementation details.

Second, the interface also gives the implementer flexibility to change the implementation, as long as the client is only using the implementation through the defined operations, and those operations are still provided by the new implementation. This flexibility results in a loosely coupled system. The interface creates a contract between the client and implementer, which frees them up to work more independently as long as they stick to the contract. In a tightly coupled system, the client and the implementation are not as free to change because changes to either are more likely to change the interface.

Third, when something goes wrong in a program, the presence of an interface makes it easier to assign blame to either the client or the implementer. Either the client is using the interface incorrectly (possibly violating the abstraction barrier), or the implementer is implementing it incorrectly. A clear interface makes it possible to argue that one of the two needs to fix their code.

ADTs in Java #1: visibility qualifiers

Java has language mechanisms that make it easier to define data abstractions. One such mechanism is visibility qualifiers. Classes, fields, and methods can be labeled public or private to indicate whether they should be visible to clients. A class can be used to implement an ADT by declaring all the methods in the interface to be public, and everything else private. Methods or fields that are declared private cannot be accessed by any code outside their class. Therefore, private qualifier enforce the abstraction barrier.

For example, in our example of a collection data abstraction implemented as a linked list, we could use visibility qualifiers as follows:

class Collection {
    private ListNode first, last;
    private int size;

    /** @return the number of elements in the collection */
    public int size() { return size; }
    public void add(Object elem) {
	ListNode n = new ListNode();
	if (last != null)
	    first = last.next = n;
	last = n;
	last.elem = elem;
    }
    /** Remove elem from the collection. */
    public void remove(Object elem) { ... }

    /** @return true if elem is in the collection. */
    public boolean contains(Object elem) { ... }
}

This implementation gives some of the code for a header object of class Collection. Because the fields first, last, and size are private, clients cannot use them to read or update those fields. That's a good thing because the implementation depends on certain data structure invariants holding. For example, the field size must always contain the number of elements in the list that starts at first, and last must be the last element in the list. If a client were able to change the values of these fields, it could break this invariant and cause the code to produce incorrect results.

Therefore, it does not usually make sense to include fields as part of the interface to a data abstraction; fields should be marked private. An exception is final fields, which cannot be changed and are sometimes used to keep track of shared constants. Because they cannot be changed, final fields cannot be used to violate invariants. Data structure invariants in a data abstraction are known as representation invariants (or even just rep invariants), because they govern the representation of the abstraction: in this case, the private fields.

Package (default) visibility

Some of the rest of the implementation is in the class ListNode:

class ListNode {
    Object elem;
    ListNode next;
}

If methods or fields have no explicit visibility qualifier, they obtain default (or package) visibility, which means that they can be accessed from other classes in the same Java package, but not by classes in different packages. This is useful when implementing a data abstraction that spans multiple classes, and these classes need more access to each others' internals than should be exposed in the interface. The solution is to put all the classes in the same package and use default visibility to allow them more access than is given to clients. This is why the fields of ListNode have no visibility qualifier. However, the class ListNode itself is not declared public, and therefore can only be used from within the package in which it and Collection reside.

ADTs in Java #2: Java interface declarations

Java has a mechanism called an interface that supports the idea of an interface and allows the programmer to give an interface its own name. For example, we might implement the Collection ADT by declaring an interface:

/** A Collection represents a mutable set of elements. */
interface Collection {
    /** Add an element. */
    void add(Object elem);
    /** Remove an element if it is in the set. */
    void remove(Object elem);
    /** Return true if elem is in the set. */
    boolean contains(Object elem);
    /** Return the number of elements in the set. */
    int size();
}

This Java interface says nothing about the representation or implementation of the collection: it is just an interface. The interface also says nothing about how to create a collection; it merely defines instance operations that can be performed on a collection.

We can now define an implementation of this interface. An interface is implemented by a class, in which every method declared in the interface is implemented by a public method.

public class List implements Collection {
    private ListNode first, last;
    private int size;

    public List() { size = 0; first = last = null; }

    public int size() { return size; }
    public void add(Object elem) { ...  }
    public void remove(Object elem) { ... }
    public boolean contains(Object elem) { ... }
}

We don't need to write comments on these methods to describe what they do, because that's in the interface. The interface just describes the instance methods of the class, not the static methods or constructors. The class may expose more operations (a wider interface) than its declared interface does, by declaring them public. To allow List objects to be created, this class provides a public constructor. Although the interface can't declare the constructor, it is a good idea to declare and implement it explicitly in the class because it is a way to enforce the rep invariant. To create a collection, we can use the List constructor:

Collection a = new List();
a.add(5);
a.add(3);
a.add(new List();
Collection b = new List();
b.add(a);

We can also write code that uses collections without caring or knowing that they are lists: /** Return a collection of the even elements in c. */ Collection selectEvens(Collection c)

We can also define multiple implementations of the interface, and a function like selectEvens can work with any of them. For example, we can implement collections as a tree, too:

public class Tree implements Collection {
    private TreeNode root;
    private int size;
    public Tree() {
	root = null; size = 0;
    }
    public add(Object elem) {
	if (root == null) {
	    root = new TreeNode(elem);
	} else {
	    root.add(elem);
	}
    }
    ...
}

Now we can put a tree into a variable of type Collection, too:

  Collection c = new Tree();
  c.add(3);
  Collection d = c.selectEvens();

The interface is an abstraction barrier because once we put a list or tree into a Collection, other parts of the class that are not declared in the interface cannot even be named:

  Collection c = new List();
  ListNode x = c.first; // Java compiler rejects: Collection does not have a field first

It is possible to find out what implementation was used to construct an object of a given interface type, and even to view the object at its class type.

  Collection a = new List();
  ...
  if (a instanceof List) {
     List b = (List) a;
     // now call List-only methods on b
  }

So if a very strong abstraction barrier is desired, visibility qualifiers should be used. The advantage of interfaces is that they provide very loose coupling. They say nothing about how their object are implemented or represented, so the implementation can be completely replaced. This helps with the maintenance and evolution of code. A good design approach is to use both interfaces and visibility qualifiers to enforce a strong abstraction barrier.

Programming with interfaces

One key to designing a large system is to start by designing the interfaces between different parts of the system, before writing a lot of code. This leads to a divide-and-conquer approach to coding, in which the program is broken down into smaller pieces that are easier to understand and to implement. Good interfaces are especially helpful when working in a team, but they are valuable even when working alone, because they keep the amount that the programmer has to remember about his own code manageable. We'll talk more in future lectures about how to design an interface.