We have already seen that object-oriented languages support subtype polymorphism; generics give us a different kind of polymorphism, parametric polymorphism. Recall that polymorphism means that something can be used at multiple types. Subtype polymorphism allows client code to be written with respect to a single type (as specified by, say, an interface or abstract class) while interoperating with multiple implementations of that type. Parametric polymorphism, also known as generics, solves the opposite problem, in a sense: it allows a single implementation to interoperate with multiple clients who want to use that implementation for different types. This goal is achieved by writing the implementation with respect to a type parameter, a variable ranging over types that can be instantiated with different concrete types. Support for generics is not an essentially object-oriented idea, and was not originally part of Java, having been first introduced in Java 5.
The main motivation for adding parametric polymorphism and generic types to Java was the
Java Collections Framework. This is a built-in
library of different abstractions and implementations
for collections: types that one can put things into, take things out of, test for membership in, and iterate over.
Generics were introduced in Java 5 to make the Collections Framework more effective.
Prior to this time, the interface Collection
looked like the following:
/** A mutable collection. */ interface Collection { /** Test whether object o is in the collection. */ boolean contains(Object o); /** Add object o to the collection. Return true if this changes * the state of the collection. */ boolean add(Object o); /** Remove object o from the collection. Return true if this changes * the state of the collection. */ boolean remove(Object o); ...[other methods]... }
All the compiler knows about a contained element at the type level is that it
is an Object
. This loss of information leads to programming errors
and makes code more awkward. Here is an example:
Collection c = ...; c.add(new Integer(2)); // no check that we are inserting an Integer ... for (Object o : c) { Integer i = (Integer) o; // use i here }
This code only ever puts Integer
objects into c
, but every time
an element of c
is extracted, it must be downcasted to use it as an Integer
.
These downcasts are awkward, verbose, and error prone. Any downcast
might fail at run time, because there is nothing about the collection that prevents
us from accidentally putting something into it of the wrong type. In addition,
the run-time type checks take time.
We don't have these problems when we use an array. All the operations are statically checked:
Integer[] c; c[0] = 2; // statically checked ... for (Integer i : c) { // use i here }
The reason is that arrays are a parameterized type. We can think of the type
Integer[]
as the application of a type-level function (we might call it array
if we were to give it a readable name)
to the type parameter Integer
and returning the type Integer[]
.
Obviously, parameterized types are useful. The idea of generics is to allow the programmer to
define their own parameterized types, to obtain the same static checking that is available with
the built-in array type.
The generics feature allows programmers to define their own parameterized types. For example, we
can make Collection
become a parameterized type that can be applied to an arbitrary
type T using the “angle bracket” syntax: Collection<T>
. Thus,
the type Collection<Integer>
is a collection of Integer
s,
the type Collection<String>
is a collection of String
s,
and the type Collection<Collection<String>>
is a collection of
collections of strings.
A parameterized type is declared by giving it a formal type parameter that can then be used as a type inside the type's definition—for example, in method signatures:
Interface.java
Inside the definition, the type parameter T stands for whatever actual type the client chooses to apply it
to. A type like Collection<String>
is called an instantiation of the
parameterized type Collection
on the type argument String
.
The signatures of the methods of Collection<String>
are exactly
the signatures appearing in the declaration of Collection<T>
,
except that every occurrence of T is replaced with String
. For example, the
add
method of Collection<String>
behaves exactly as if its signature were boolean add(String x)
.
Now, the compiler can tell when we are trying to add an element of the wrong type, and we don't have to worry about getting the wrong type of element out of the collection at run time:
InterfaceUse.java
Parametric polymorphism also helps us when we are implementing abstractions. Let's consider
implementing the Collection
interface using a linked list. First, we will want generic linked list nodes that can contain data of an arbitrary type:
Node.java
Then we can build a generic linked list class for null-terminated lists of generic nodes:
LinkedList.java
So far we've seen that classes and interfaces can be parameterized. We can also give methods their own type parameters. For example, suppose we would like to write a method that could print out a collection regardless of what kind of elements it contains. We can define a generic method to accomplish this. The syntax is a bit awkward in that the formal type parameter is written before the name of the method:
generic_print.java
Notice that a call to the print
method does not need to specify the
actual type parameter Integer
. The compiler is able to infer the missing
parameter automatically. It is also possible to supply type parameters to generic
method calls explicitly, by putting the type parameter in angle brackets after the dot.
Like other implements
declarations, the declaration above that
LList<T> implements Collection<T>
generates a subtype relationship:
in fact, a family of subtype relationships, because the subtype relationship holds regardless
of what actual type T is chosen. The compiler therefore understands that the relationship
LList<String> <: Collection<String>
holds. What about these other
possible relationships?
LList<String> <: LList<Object> ?
LList<String> <: Collection<Object> ?
Both of these look reasonable at first glance. But they are actually unsound, leading to possible run-time type errors. The following example shows the problem:
variance-unsound.java
The head element of the list, which is assigned to a variable of type String, is actually an Integer! This is erroneous, so the Java compiler will not allow it. A similar situation arises with arrays, although in that case the error is unfortunately only caught at run time.
variance-unsound.java
The idea that there can be a subtyping relationship between different instantiations of the same generic type is called variance. Variance is tricky to support in a sound way, so Java does not support variance. Other languages such as Scala do have variance.
To make up for the lack of variance, Java has a feature called
wildcards, in which question marks are used as type arguments. The type
LList<?>
represents an object that is an LList<T>
for some type T, though precisely which type T is not known at compile
time (or for that matter, even at run time).
A value of type LList<T>
(for any T) can be used as if it had
type LList<?>
, so there is a family of subtyping relationships
LList<T>
<: LList<?>
. This means that a method
can provide a caller with a list of any type without the client knowing what is
really stored in the list; the client can get elements
from the list but cannot change the list:
usesite.java
Note that the type of the elements iterated over is not really known either, but
at least we know that the type hidden by ? is a subtype of Object
. So it is
type-safe to declare the variable o
as an Object
.
We cannot implement the type-unsafe code shown above because the wildcard type makes the collection
effectively immutable. For example, the attempt to add the integer 7 to the variable lst
fails because the compiler does not know that the actual type argument is Integer
.
Operationally, when the compiler sees a type containing wildcards, like LList<?>
, it invents a fresh type
name (say, T137
) not used anywhere else in the program, and constructs the method signatures
for the wildcard type by replacing the formal type parameter T
with the fresh type name. At the
call lst.add(7)
above, the actual argument has type Integer
but the expected type
is T137
. Since T137
has no relationship to Integer
,
the code does not type-check. Instantiation with fresh type names is
done for every distinct occurrence of wildcard type, so the compiler does not
even assume that two objects of the wildcard type have the same underlying type
argument.
If we need to know more about the type hidden by the question mark, it is possible to
add an extends
clause. For example, suppose we have an interface
Animal
with two implementing classes Elephant
and
Rhino
. Then
the type Collection<? extends Animal>
is a supertype of both
Collection<Elephant>
and Collection<Rhino>
,
and we can iterate over the collection and extract Animal
s rather
than just Object
s.
usesite2.java
The way generics are actually implemented in Java is that all actual type parameters are erased at run time. This implementation choice leads to a number of limitations of the generics mechanism in Java when in a generic context where T is a formal parameter:
new T()
. The workaround for
this limitation is to have an object with a factory method for creating T
objects.new T[n]
,
because the type T is not known at run time and so the type T[]
cannot be installed
into the object's header. The workaround for this limitation is to use an array of type
Object[]
instead:
T[] a = (T[]) new Object[n];
This of course creates an array that could in principle be used to store things other than T's, but
as long as we use that array through the variable a
, we won't. The compiler
gives us an alarming warning when we use this trick because of the unchecked cast, but when used in this
limited way, this programming idiom is fairly safe. Note that if we need to
create an array of T
in a context where T
is known to
be a subtype of some type, then the array that should be created is an array of
that other type, rather than of Object
.
Similarly, we can't create an array whose type includes a parameter type:
HashSet<String>[] sets = new HashSet<String>[n]; // error: generic array creation
One workaround is to use a wildcard type to create the array, and dynamically cast it to the desired type:
HashSet<String>[] sets = (HashSet<String>[]) new HashSet<?>[n];
Equivalently, one can use a raw type, a type in which type parameters have been explicitly erased, to create the array:
HashSet<String>[] sets = new HashSet[n];
Raw types are an unchecked end run around the Java type system: Java allows a raw type to be used as if it were any particular instantiation without a cast. Raw types make it easy to introduce run-time type errors into programs, so they should be used sparingly, limited to idioms like this one where it is clear that they do not pose a threat.
instanceof
to find out what type parameters are, because the
object does not contain that information. If, for example, we create an LList<String>
object, the object's header word only records that it is an LList
. So
an LList<String>
object that is statically typed as an Object
can be tested to see if it is some kind of LList
, but not whether the actual
type parameter is String
:
instanceof.java
The last four lines above illustrate how downcasts interoperate with generics. Code can cast to
a type with an actual type parameter, but the type parameter is not actually checked at
run time; Java takes the programmer's word that the type parameter is correct. We can cast to a wildcard
instantiation, but such a cast is not very useful if we need to use the elements at their actual
type. Finally, we can cast to the raw type LList
. Casting to raw types is unsafe;
it is essentially the same as casting to LList<?>
but less safe.
What if we want to use methods of T in a generic context where T is a formal parameter? There
is more than one way to do this, but in Java the most powerful approach is to provide a
separate model object that knows how to perform the operations that are needed. For example, suppose
we want to compare objects of type T using the compareTo
method. We declare a
generic interface Comparator<T>
:
Comparator.java
Now, a generic method for sorting an array takes an extra comparator parameter:
comparator_sort.java
A class can then implement the comparator interface and be used to make the right comparator operation available to the generic code.
comparator_sort.java
Notice that here we are using String's own compareTo operation as a model for the comparator, but we don't have to. For example, we could have used the compareToIgnoreCase method to sort strings while ignoring the difference between upper and lower case. It turns out that we can also use Java 8's new lambda expressions to implement the interface even more compactly. Here is how we would sort the array using a lambda expression while also ignoring case:
sort(a, (x,y) -> x.compareToIgnoreCase(y));
The lambda expression (x,y) -> x.compareToIgnoreCase(y) is actually
just a very convenient syntactic sugar for declaring a class like the one above
and instantiating it with new
.
Generic classes may need to access parameter type operations too. The typical approach is to accept the model object in constructors, then to store it in an instance variable for later use by other methods:
SortedList.java