Generics and More Lists

Support for generics is not an essentially object-oriented idea, and was not originally part of Java. We already saw that object-oriented languages support subtype polymorphism; generics give us a different kind of polymorphism, parametric polymorphism. Recall that polymorphism means that something can be used at multiple types. Subtyping allows client code to be written with respect to a single type, while interoperating with multiple implementations of that type. Parametric polymorphism, by contrast, allows an implementation to be written with respect to a single type, but used by differently typed clients.

Application: Collections

Java comes with a library of different collection abstractions and implementations for collections of information. The biggest reason why generics were added to Java was to make the Collections Framework more effective. In Java 1.4, the interface Collection looked like the following (only some key methods are shown):

/** A mutable collection. */
interface Collection {
    /** Return whether object o is in the collection. */
    boolean contains(Object o);
    /** Add object o to the collection. Return true if this changes
     *  the state of the collection.
     */
    boolean add(Object o);
    /** Remove object o from the collection. Return true if this changes
     *  the state of the collection.
     */
    boolean remove(Object o);
    ...
}

All the collection knows about its contained elements is that they are objects. This loss of information leads to programmer errors and makes code more awkward. Here is an example:

Collection c = ...;
c.add(2); // no check that we are inserting the right kind of object
...
for (Object o : c) {
    Integer i = (Integer) o;
    // use i here
}

Here, we expect c to be a collection of Integers, but we have to use a downcast to use the elements of the collections as the type we expect. The downcast is not only awkward and verbose, but it might fail at run time, because there is nothing about the collection that prevents us from accidentally putting something into it of the wrong type.

Notice that the same problem doesn't happen when using an array:

Integer[] c;
c[0] = 2; // statically checked
...
for (Integer i : c) {
    // use i here
}

The key is that array is a parameterized type. We can think of the type Integer[] as the application of a type-level function (call it array) to the type parameter Integer. The idea of generics is to allow user-defined parameterized types.

Type parameterization

The generics allows programmers to define their own parameterized types. For example, we can make Collection become a parameterized type that can be applied to an arbitrary type T using the “angle bracket” syntax: Collection<T>. Thus, the type Collection<Integer> is a collection of Integers, the type Collection<String> is a collection of Strings, and the type Collection<Collection<String>> is a collection of collections of strings.

A parameterized type is declared by giving it a formal type parameter that can then be used as a type inside the type's definition—for example, in method signatures:


Interface.java

Inside the definition, the type parameter T stands for whatever actual type the client chooses to apply it to. A type like Collection<String> is called an instantiation of the parameterized type Collection on the type argument String. The signatures of the methods of Collection<String> are exactly the signatures appearing in the declaration of Interface, except that every occurrence of T is replaced with String. For example, the add method of Collection<String> behaves exactly as if its signature were boolean add(String x).

Now, the compiler can tell when we are trying to add an element of the wrong type, and we don't have to worry about getting the wrong type of element out of the collection:


InterfaceUse.java

Implementing generics

Parametric polymorphism also helps us when we are implementing abstractions. Let's consider implementing the Collection interface using a linked list. First, we will want linked list nodes that can contain data of an arbitrary type:

Node.java

We can't use the Node class to implement the Collection interface directly, because an empty list is represented as a null, which we can't invoke methods on. Therefore, to implement the Collection interface, we use an additional header object to point to the rest of the list. The header object is implemented by the LinkedList class:

LinkedList.java

Generic methods

So far we've seen that classes and interface can be parameterized. We can also give methods their own type parameters. For example, suppose that some non-generic code outside the implementation of linked lists or collections needs to be able to print out collections regardless of what kind of element is in the collection. We can define a generic method to accomplish this:

generic_print.java

Notice that a call to the print method does not need to specify the actual type parameter Integer. The compiler is able to infer the missing parameter automatically. It is also possible to supply type parameters to generic method calls explicitly, by putting the type parameter in angle brackets after the dot.

Subtyping

Like other implements declarations, the declaration above that LList<T> implements Collection<T> generates a subtype relationship: in fact, a family of subtype relationships, because the subtype relationship holds regardless of what actual type T is chosen. The compiler therefore understands that the relationship LList<String> <: Collection<String> holds. What about these other possible relationships?

Both of these sound reasonable at first glance. But they are actually unsound, leading to possible run-time type errors. The following example shows the problem:

variance-unsound.java

The last element of the list, which is assigned to a variable of type String, is actually an Integer!

The idea that there can be a subtyping relationship between different instantiations of the same generic type is called variance. Variance is tricky to support in a sound way, so Java does not support variance. Other languages such as Scala do have variance.

Wildcards

To make up for the lack of variance, Java has a feature called wildcards, in which question marks are used as type arguments. The type LList<?> represents an object that is an LList<T> for some type T, though precisely which type T is not known at compile time (or, in fact, even at run time).

A value of type LList<T> (for any T) can be used as if it had type LList<?>, so there is a family of subtyping relationships LList<T> <:LList<?>. This means that a method can provide a caller with a list of any type without the client knowing what is really stored in the list; the client can get elements from the list but cannot change the list:

usesite.java

Notice that the type of the elements iterated over is not really known, either, but we do at least know that the type hidden by ? is a subtype of Object. So it is type-safe to declare the variable o as an Object.

If we need to know more about the type hidden by the question mark, it is possible to add an extends clause. For example, suppose we have an interface Animal with two implementing classes Elephant and Rhino. Then the type Collection<? extends Animal< is a supertype of both Collection<Elephant> and Collection<Rhino>, and we can iterate over the collection and extract Animals rather than just Objects.

usesite2.java

Limitations

The way generics are actually implemented in Java is that all actual type parameters are erased at run time. This implementation choice leads to a number of limitations of the generics mechanism in Java when in a generic context where T is a formal parameter:

  1. Constructors of T cannot be used; we cannot write new T(). The workaround for this limitation is to have an object with a factory method for creating T objects.
  2. Arrays with T as elements cannot be created, either. We cannot write new T[n], because the type T is not known at run time and so the type T[] cannot be installed into the object's header. The workaround for this limitation is to use an array of type Object[] instead:
    T[] a = (T[]) new Object[n];
    

    This of course creates an array that could in principle be used to store things other than T's, but as long as we use that array through the variable a, we won't. The compiler gives us an alarming warning when we use this trick because of the unsafe cast, but this programming idiom is fairly safe.

  3. We can't use instanceof to find out what type parameters are, because the object does not contain that information. If, for example, we create an LList<String> object, the object's header word only records that it is an LList. So an LList<String> object that is statically typed as an Object can be tested to see if it is some kind of LList, but not whether the actual type parameter is String:
    instanceof.java
    

    The last four lines above illustrate how downcasts interoperate with generics. Code can cast to a type with an actual type parameter, but the type parameter is not actually checked at run time; Java takes the programmer's word that the type parameter is correct. We can cast to a wildcard instantiation, but such a cast is not very useful if we need to use the elements at their actual type. Finally, we can cast to the raw type LList; casting to raw types is unsafe. It is essentially the same as casting to LList<?> except that Java allows a raw type to be used as if it were any particular instantiation. Raw types should be avoided when possible.

Accessing type operations

What if we want to use methods of T in a generic context where T is a formal parameter? There is more than one way to do this, but in Java the most powerful approach is to provide a separate model object that knows how to perform the operations that are needed. For example, suppose we want to compare objects of type T using the compareTo method. We declare a generic interface Comparator<T>:

Comparator.java

Now, a generic method for sorting an array takes an extra comparator parameter:

comparator_sort.java

A class can then implement the comparator interface and be used to make the right comparator operation available to the generic code.

comparator_sort.java

Notice that here we are using String's own compareTo operation as a model for the comparator, but we don't have to. For example, we could have used the compareToIgnoreCase method to sort strings while ignoring the difference between upper and lower case. It turns out that we can also use Java's new “lambda expressions” to implement the interface even more compactly. Here is how we would sort the array using a lambda expression while also ignoring case:

sort(a, (x,y) -> x.compareToIgnoreCase(y));

The lambda expression (x,y) -> x.compareToIgnoreCase(y) is actually just a very convenient syntactic sugar for declaring a class like the one above and instantiating it with new.

Generic classes may need to access parameter type operations too. The typical approach is to accept the model object in constructors and to then store it in an instance variable for later use by other methods:


SortedList.java