12. Collections and Generics

Essentially all programs that we write involve two principal components, data structures and algorithms. Data structures allow us to collect together related pieces of data, organizing them in a particular way that will enable us to efficiently process them using algorithms (computational routines). Different scenarios benefit from organizing data in different ways. On computer file systems, directories (folders) and files are arranged hierarchically, which allows related files easier access to each other, and makes it easier for your operating system to manage file permissions. As we have already seen, dictionaries store their words in sorted order, enabling a fast way to look up definitions. Social networks model the connectivity of their users in a way that allows them to extract information about users’ interests and tailor their experience accordingly. Databases, which can manage many gigabytes or terabytes of data, use a variety of data structures to organize the information in a way that enables fast queries in real-world systems.

In this middle section of the course, we will survey the quintessential collection types that form the basis for most programs: lists, stacks, queues, trees, heaps, hash tables, maps, and graphs. Doing this will require all of the tools that we have introduced thus far. We’ll use our understanding of Java’s memory semantics to visualize the underlying structures of these collections. We’ll use the object-oriented design principles from the past few lectures to encapsulate these collections and provide intuitive interfaces to their clients. We’ll use invariants, specifications, and testing to reason about correctness of our implementations. Finally, we’ll use asymptotic complexity to analyze the performance trade-offs of these collections.

Data Structures and Abstract Data Types

Before we can begin to implement and analyze different collections, we need to establish some basic terminology.

Definition: Collection

A collection is a type that stores one or more instances of another type.

We have already seen one example of a collection in the course, arrays. An array is a collection that consists of a fixed number of slots (i.e., indices) in which primitive values or objects of a particular type can be stored. For example, the code String[] words = new String[6]; initializes words to refer to an array that can store 6 Strings. The state of this collection consists of the capacity of the array, its length 6, and the six contiguous memory locations where references to these String entries are stored. The behaviors of this String[] array consist of querying its length and reading and writing to each of its entries by their index using Java’s built-in square bracket ([]) syntax. The contiguous nature of the array’s storage allows the client to access the entry at any particular index of the array in \(O(1)\) time (in the background, Java can locate this address using a single multiplication and addition). This fact is often referred to as the array’s random access guarantee, and will be a central tool in our analysis of many data structures that involve arrays.

Definition: Random Access Guarantee

We say that arrays have a random access guarantee since their client can access (i.e., read the entry at or write a new entry to) the entry at any valid index of the array in \(O(1)\) time, independent of the array's length or the value of the index.

In other words, accessing the entry a[2] of an array a takes the same amount of time whether a has length 10 or length 1 million, and in the latter case it will also take the same amount of time to access a[800000]. From this discussion of arrays, we can start to see a distinction between the ways that the client (who writes code that uses arrays) and the implementer (in this case, the developers of Java who added support for arrays) think about collections.

The client is primarily concerned with the behaviors that the collection supports (and the runtime/space complexities required to implement these behaviors). Can they add or remove elements from the collection, and what is the syntax to do this? How quickly will the additions/removals be carried out, and does this depend on the collection’s size, the element that will be added/removed, etc.? How do they check whether a particular element is present in the collection and access its location if it is present? What, if anything, can the client do to modify particular elements within the collection? In which use-cases is this collection an appropriate choice? Are there other constraints or concerns that they need to be aware of to get the best performance out of a collection?

On the other hand, the implementer is tasked with figuring out how to support these behaviors. What variables and objects will they need to represent the state of the collection, and what class invariants should be enforced for these? How does the selection of a state representation impact how quickly different operations can be performed and invariants can be restored? What sort of encapsulation will be necessary to hide the sometimes messy inner workings of a collection’s class from the client and only expose a neat, seamless interface?

To distinguish these different views of collections, we’ll introduce two additional high-level terms, abstract data types (or ADTs) and data structures.

Definition: Abstract Data Type (ADT)

An abstract data type describes the behaviors that are supported by a collection without specifying the details of its underlying implementation.

Definition: Data Structure

A data structure is a class that realizes an abstract data type by specifying its state representation and using this to provide definitions for each of the behaviors it supports.

Since they outline a set of supported behaviors but elide implementation details, abstract data types are naturally modeled in Java using interfaces. A class that implements an ADT will utilize a data structure to represent its state. In this way, a single ADT can be realized by multiple different data structures, and each implementation can have its own performance characteristics. Arrays, as such a low-level type with their own special language-supported syntax, blur the lines between ADT and data structure. Therefore, we’ll use another ADT, the List, throughout the rest of this and the following lecture to better demonstrate this distinction.

The `List` ADT

A list is a linearly ordered data structure, similar to an array, in which entries are indexed by consecutive integer indices beginning with 0. Lists differ from arrays since they do not have a fixed length. Rather, their size grows dynamically to accommodate adding an arbitrary number of elements. The Java language includes a List ADT in the java.util.List interface, but we will practice developing our own ADT interface CS2110List that supports a subset of its features.

To work toward this implementation, let’s start by restricting our focus to only a list that can collect (non-null) Strings, as this will make it easier to design the method signatures. We’ll call this the CS2110StringList interface. What operations should this interface support?

First, we’ll want to be able to add a String to the (end of the) list.

CS2110StringList.java

1
2
3
4
5


/**
 * Adds the given `elem` String to the end of this list. 
 * Requires that `elem` is not `null`.  
 */
 void add(String elem);

1
2
3
4
5


/**
 * Adds the given `elem` String to the end of this list. 
 * Requires that `elem` is not `null`.  
 */
 void add(String elem);

We may also want to be able to insert a String at some other position in the list, shifting down the Strings that currently sit at later positions to make space for it.

CS2110StringList.java

1
2
3
4
5
6


/**
 * Inserts the given `elem` String at the given `index` in this list, 
 * shifting all later elements one index to the right to make space. 
 * Requires that `0 <= index <= size()` and `elem` is not `null`.
 */
 void insert(int index, String elem);

1
2
3
4
5
6


/**
 * Inserts the given `elem` String at the given `index` in this list, 
 * shifting all later elements one index to the right to make space. 
 * Requires that `0 <= index <= size()` and `elem` is not `null`.
 */
 void insert(int index, String elem);

In the “Requires” clause of this method’s spec, we see that we’ll need a way to access the current number of elements in the list, which we’ll do with the size() method.

CS2110StringList.java

1
2
3
4


/**
 * Returns the current number of elements in this list.
 */
 int size();

1
2
3
4


/**
 * Returns the current number of elements in this list.
 */
 int size();

What other accessor methods will be useful to query properties of the list’s contents? We may want to know what String is stored at a particular index, which we’ll support with a get() method. We might also want to know whether the list contains a particular String (supported by a contains() method). More specifically, we may want to know the (first) index where a particular String is stored (supported by an indexOf() method).

CS2110StringList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/**
 * Returns the String located at the given `index` in this list. 
 * Requires that `0 <= index < size()`.
 */
 String get(int index);

/**
 * Returns whether String `elem` is stored in this list.
 */
boolean contains(String elem);

/** 
 * Returns the smallest index `i` at which String `elem` is stored in this list. 
 * Requires that `contains(elem)` is true.
 */
int indexOf(String elem);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/**
 * Returns the String located at the given `index` in this list. 
 * Requires that `0 <= index < size()`.
 */
 String get(int index);

/**
 * Returns whether String `elem` is stored in this list.
 */
boolean contains(String elem);

/** 
 * Returns the smallest index `i` at which String `elem` is stored in this list. 
 * Requires that `contains(elem)` is true.
 */
int indexOf(String elem);

Lastly, let’s consider some methods that modify the contents of a list in ways other than adding elements. First, a client may wish to change the element stored at a particular index, which we’ll support with a set() method. A client may also want to remove an element from the list. We’ll support two variants of removal. First, the client can ask us to remove (and return) an element at a given index, which we’ll support with a remove() method. Alternatively, the client can pass in an element and tell us to remove (the first instance of) it from the list, which we’ll support with a delete() method.

CS2110StringList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


/**
 * Reassign the entry at the given `index` in this list to store String `elem`. 
 * Requires that `0 <= index < size()` and `elem` is not `null`.
 */
void set(int index, String elem);

/**
 * Removes and returns the element stored at the given `index` in this list, 
 * shifting all later elements left to close the gap left by the removal. 
 * Requires that `0 <= index < size()`.
 */
String remove(int index);

/** 
 * Removes the first instance of `elem` from this list, shifting all 
 * later elements left to close the gap left by the removal. Requires 
 * that `contains(elem)` is true.
 */
void delete(String elem);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


/**
 * Reassign the entry at the given `index` in this list to store String `elem`. 
 * Requires that `0 <= index < size()` and `elem` is not `null`.
 */
void set(int index, String elem);

/**
 * Removes and returns the element stored at the given `index` in this list, 
 * shifting all later elements left to close the gap left by the removal. 
 * Requires that `0 <= index < size()`.
 */
String remove(int index);

/** 
 * Removes the first instance of `elem` from this list, shifting all 
 * later elements left to close the gap left by the removal. Requires 
 * that `contains(elem)` is true.
 */
void delete(String elem);

Remark:

Note that some of the names and signatures given above deviate slightly from those in Java's List interface. This is intentional, and meant to draw parallels between operations for different data structures that we will study, avoiding ambiguities (e.g., by treating deletion and removal as semantically different operations, rather than Java's approach of calling both "removal").

Generics

While the CS2110StringList interface is suitable for declaring operations on a list of Strings, it will not suffice for working with a list of another data type, such as a list of Accounts or Points or even a list of other lists. If we want to support a list of Points, we’d need to define a second, parallel interface such as CS2110PointList that had signatures involving the Point type. The behaviors for this list of Points (adding Points, removing Points, checking whether the list contained a certain Point, etc.) would be the same, leading to a lot of repeated code. This isn’t practical. We’d like a way to develop a single interface that is capable of storing any type of data. In other words, we’d like a way to create a polymorphic list interface.

Subtype polymorphism is one possibility. We could define a CS2110ObjectList that stores Objects. Since Object is a supertype of every class, a client could create a list of Strings with this interface, or a list of Points, or a list of any type they wish. However, they could also create a list containing a mix of Strings and Points. We’d have no way to enforce that a particular list instance only contains one type (finer than Object) of data. An alternate approach, that allows this constraint on types, is to use parametric polymorphism.

Definition: Parametric Polymorphism, Generic Type

Parametric polymorphism achieves polymorphic behavior by parameterizing the definition of an interface or class on one or more generic types that are specified by the client code.

At this point, we are familiar with parameterizing the methods that we write. When we declare a method, we add one or more variable names (and their static types) within parentheses in the method’s signature. For example, the rectangleArea() method below is parameterized on width and height values.

1
2
3
4
5
6


/**
 * Returns the area of the rectangle with the given `width` and `height`. 
 */ 
static double rectangleArea(double width, double height) {
  return width * height; 
}

1
2
3
4
5
6


/**
 * Returns the area of the rectangle with the given `width` and `height`. 
 */ 
static double rectangleArea(double width, double height) {
  return width * height; 
}

These names width and height become variables that we can use as we develop the method. When the method is utilized in client code (i.e., called), the client passes arguments into the rectangleArea method that fix the values of width and height, and these are substituted as the method is being evaluated. Just as we can parameterize a method on values, we can parameterize a class or interface on a generic type. We do this using angle brackets, such as

CS2110List.java

1
2
3
4


/** Models CS 2110's ADT for a list containing (non-`null`) elements of type `T`. */
public interface CS2110List<T> {
  ...
}

1
2
3
4


/** Models CS 2110's ADT for a list containing (non-`null`) elements of type `T`. */
public interface CS2110List<T> {
  ...
}

This introduces a generic type parameter T that can be used to describe a type throughout the interface or class. When the client declares a variable with type CS2110List, they will specify which reference type T will represent for that variable. For example, if they declare a list

1

CS2110List<String> words;

1

CS2110List<String> words;

this tells the compiler to substitute the String type for T within CS2110List whenever it is accessed through the words variable. You can think about this as replacing T with String within the interface (or class) definition, just as we replace the method parameters with their values when executing the method.

Within an interface or class with a generic type, we can use that type parameter in any place where we would declare an ordinary type. For example, the add method of our generic CS2110List type will no longer take in a String parameter; it will take in a T parameter for whatever type the client specified as T.

CS2110List.java

1
2
3
4


/**
 * Adds the given `elem` to the end of this list. Requires that `elem` is not `null`.
 */
void add(T elem);

1
2
3
4


/**
 * Adds the given `elem` to the end of this list. Requires that `elem` is not `null`.
 */
void add(T elem);

Similarly, the get() method will no longer have a String return type, it will have T return type.

CS2110List.java

1
2
3
4
5


/**
 * Returns the `T` element located at the given `index` in this list. 
 * Requires that `0 <= index < size()`.
 */
T get(int index);

1
2
3
4
5


/**
 * Returns the `T` element located at the given `index` in this list. 
 * Requires that `0 <= index < size()`.
 */
T get(int index);

In a generic class definition, we can also declare fields and local variables with generic types (as we will soon see). We cannot, however, construct new objects of generic types or call methods with generic targets. Intuitively, since the generic type parameter can represent any type, we don’t know whether that type supports a method (including a constructor) with a particular signature. The complete code for the generic CS2110List interface is given below.

CS2110List.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


/**
 * Models CS 2110's ADT for a list containing (non-`null`) elements of type `T`.
 */
public interface CS2110List<T> {
  /**
   * Adds the given `elem` to the end of this list. 
   * Requires that `elem` is not `null`.
   */
  void add(T elem);

  /**
   * Inserts the given `elem` at the given `index` in this list, shifting 
   * all later elements one index to the right to make space. Requires that 
   * `0 <= index <= size()` and `elem` is not `null`.
   */
  void insert(int index, T elem);

  /**
   * Returns the current number of elements in this list.
   */
  int size();

  /**
   * Returns the `T` element located at the given `index` in this list. 
   * Requires that `0 <= index < size()`.
   */
  T get(int index);

  /**
   * Returns whether `elem` is stored in this list.
   */
  boolean contains(T elem);

  /**
   * Returns the smallest index `i` at which `elem` is stored in this list. 
   * Requires that `contains(elem)` is true.
   */
  int indexOf(T elem);

  /**
   * Reassign the entry at the given `index` in this list to store `elem`. 
   * Requires that `0 <= index < size()` and `elem` is not `null`.
   */
  void set(int index, T elem);

  /**
   * Removes and returns the element stored at the given `index` in this list, 
   * shifting all later elements left to close the gap left by the removal. 
   * Requires that `0 <= index < size()`.
   */
  T remove(int index);

  /**
   * Removes the first instance of `elem` from this list, shifting all later 
   * elements left to close the gap left by the removal. Requires that 
   * `contains(elem)` is true.
   */
  void delete(T elem);
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


/**
 * Models CS 2110's ADT for a list containing (non-`null`) elements of type `T`.
 */
public interface CS2110List<T> {
  /**
   * Adds the given `elem` to the end of this list. 
   * Requires that `elem` is not `null`.
   */
  void add(T elem);

  /**
   * Inserts the given `elem` at the given `index` in this list, shifting 
   * all later elements one index to the right to make space. Requires that 
   * `0 <= index <= size()` and `elem` is not `null`.
   */
  void insert(int index, T elem);

  /**
   * Returns the current number of elements in this list.
   */
  int size();

  /**
   * Returns the `T` element located at the given `index` in this list. 
   * Requires that `0 <= index < size()`.
   */
  T get(int index);

  /**
   * Returns whether `elem` is stored in this list.
   */
  boolean contains(T elem);

  /**
   * Returns the smallest index `i` at which `elem` is stored in this list. 
   * Requires that `contains(elem)` is true.
   */
  int indexOf(T elem);

  /**
   * Reassign the entry at the given `index` in this list to store `elem`. 
   * Requires that `0 <= index < size()` and `elem` is not `null`.
   */
  void set(int index, T elem);

  /**
   * Removes and returns the element stored at the given `index` in this list, 
   * shifting all later elements left to close the gap left by the removal. 
   * Requires that `0 <= index < size()`.
   */
  T remove(int index);

  /**
   * Removes the first instance of `elem` from this list, shifting all later 
   * elements left to close the gap left by the removal. Requires that 
   * `contains(elem)` is true.
   */
  void delete(T elem);
}

Remark:

A generic type parameter can represent any reference type, which begs the question: "What if we want to instantiate a generic class using a primitive type?" Java allows this through the use of wrapper classes for each primitive type, which you can think of as a reference type whose object contains a single field of a primitive type. For example, the Integer reference type wraps the int primitive type. Java supports "auto-boxing" and "auto-unboxing" to automatically convert between primitive types and objects of their wrapper classes, allowing us to write code such as

1
2


CS2110List<Integer> list = ...;  // construct a suitable object
list.add(7); // auto-boxing converts `int` 7 to an `Integer` object storing 7

1
2


CS2110List<Integer> list = ...;  // construct a suitable object
list.add(7); // auto-boxing converts `int` 7 to an `Integer` object storing 7

that behaves as intended.

Writing Tests for an ADT

We can use the specifications for the methods declared in the CS2110List interface to develop tests for this ADT that we can run against any of its implementations. These tests will typically consist of constructing some CS2110List object (i.e., an object of some class implementing the CS2110List interface) and performing a series of mutating methods on this object, using accessor methods within JUnit assertions between these mutating methods to verify that the list is in the correct state.

One concern that arises from this testing pattern is that we will need to construct CS2110List objects. This appears to require us to know in advance the name of the class that will implement the CS2110List interface so we can call its constructor. This will also lock us into testing that particular class (i.e., the data structure) rather than developing a set of tests that will work for any CS2110List implementation. This should be a somewhat familiar problem at this point, perhaps from a slightly new angle. We want to write code (tests) that will extract common behaviors from a bunch of specialized classes. We achieve this using inheritance. We’ll write a CS2110ListTest superclass, which will contain all of our test definitions. Its subclasses will be responsible for constructing the lists that are used in the tests, which they will do by overriding a constructList() method to call the constructor of the particular list implementation that they are testing. To ensure that all of the subclasses define constructList(), we’ll make this an abstract method of CS2110ListTest (which now must be marked as an abstract class).

To see this testing pattern in action, let’s suppose that the DynamicArrayList class (which we’ll define shortly) implements the CS2110List interface. Then, we’ll define

CS2110ListTest.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


import static org.junit.jupiter.api.Assertions.*;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

public abstract class CS2110ListTest {
  /** 
   * Used to construct an empty list of some subtype of CS2110List, 
   * as determined by the subclass. 
   */
  public abstract <T> CS2110List<T> constructList();

  @DisplayName("WHEN a new list is constructed, THEN it should be empty, so have size 0.")
  @Test
  public void testEmptyAtConstruction() {
    CS2110List<String> list = constructList();
    assertEquals(0, list.size());
  }

  // ... many more tests
}

class DynamicArrayListTest extends CS2110ListTest {
  @Override
  public <T> CS2110List<T> constructList() {
    return new DynamicArrayList<>();
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


import static org.junit.jupiter.api.Assertions.*;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

public abstract class CS2110ListTest {
  /** 
   * Used to construct an empty list of some subtype of CS2110List, 
   * as determined by the subclass. 
   */
  public abstract <T> CS2110List<T> constructList();

  @DisplayName("WHEN a new list is constructed, THEN it should be empty, so have size 0.")
  @Test
  public void testEmptyAtConstruction() {
    CS2110List<String> list = constructList();
    assertEquals(0, list.size());
  }

  // ... many more tests
}

class DynamicArrayListTest extends CS2110ListTest {
  @Override
  public <T> CS2110List<T> constructList() {
    return new DynamicArrayList<>();
  }
}

When starting to work with generic types, the syntax can be a bit quirky. Let’s stop to remark on some of the quirks that arise in this testing code.

In the signature of CS2110ListTest.constructList(), there is an extra “<T>” before the return type. This is used to introduce a new generic type within the constructList() method, required since the return type of the method will change depending on what type of objects the constructed list will store. The value of this generic type is inferred by Java each time that this method is invoked. For example, on line 15, the return value of constructList(), which has type CS2110List<T> is assigned to a variable with static type CS2110List<String>, so T must be String in this invocation.
In DynamicArrayListTest.constructList(), we have omitted the generic type T from the constructor call on line 25, writing only DynamicArrayList<>(). This is acceptable because the generic type can be inferred by Java; we are returning a reference to this object and the return type is CS2110List<T>, so the generic argument should be T. It would also be acceptable to write return new DynamicArrayList<T>();, though this would likely prompt a warning in IntelliJ.
Unrelated to generics, but we have declared a second class DynamicArrayListTest within the “outer scope” of the “CS2110ListTest.java” file. This is acceptable since this class is not marked as public. Since the DynamicArrayListTest has limited functionality (its only responsibility is constructing lists for the CS2110ListTest test cases), it is appropriate to place it here rather than in its own file.

When we develop tests for ADTs, we want to make sure that they provide good coverage. Here are some tips for achieving this:

Thoroughly test the small size “edge cases” of an ADT. Often, the way that a data structure handles operations for small sizes requires somewhat different logic. Many bugs arise when this is not handled carefully. There should be many tests focusing on adding to and querying empty collections and collections with only one element, as well as removing elements from a collection until it is empty.
For collections with intrinsic orders, make sure that you test interacting with the “edge” elements, since often these require different logic than the “middle” elements. Include tests that add/remove/modify the first element and the element just after the first element. Similarly, try adding/removing/modifying the last element and the element just before the last element. On a related note, write tests that cover elements well within the “middle” of the structure.
Write some tests that “stress” your implementation for larger sizes. While your code may appear to be working correctly for small amounts of data, it may have memory or performance inefficiencies that only become apparent once the collection includes hundreds or thousands of elements.
Make sure that your tests include assertions about the return values of all the accessor methods. It is a common pitfall to only include assertions about some properties of the collection (such as its size, or its string representation), which may hide implementation issues that are revealed by other accessors.

We have written a set of comprehensive unit tests for the CS2110List ADT that are included in this lecture’s source code. This level of thoroughness and documentation is what you should strive for when you develop tests throughout the rest of the course. You don’t want your client to find bugs in your data structure that your tests didn’t cover!

Dynamic Arrays

Now that we have defined a List ADT and written tests that enforce its specifications, we are ready to think about its implementation. For the rest of today’s lecture, we’ll focus on one implementation using a dynamic array data structure. In the next lecture, we’ll write another implementation using a linked data structure.

A List is a linearly ordered ADT, just like an array. Therefore, it seems natural to represent the state of a list using an array, in which the ith entry of the array stores the element at index i in the list. We’ll call this array the backing storage for the list object. What should the size of this storage array be? One fixed size will not work because a list is allowed to hold arbitrarily many elements. If we fixed an array of (say) 1000 elements to use as our state representation, then we’d have a problem when the client tried to add the 1001st element. Instead, we will need to resize our storage array periodically when we realize it has run out of room. This is the main intuition behind the dynamic array data structure.

Definition: Dynamic Array

A dynamic array is a data structure that stores its data in an array. This array is automatically resized to add more capacity when it becomes too full (and perhaps also to remove unused capacity when it becomes too empty).

To better understand the idea behind a dynamic array, let’s distinguish two different notions. We’ll refer to the length of the backing storage array as its capacity. This is the maximum number of elements that it can currently store. Separately, the size of the structure represented by the dynamic array is the number of elements that it is actively storing. To make the array’s indices align with the list’s, we will “pack” all the elements on its left, so indices [..size) contain the list’s elements, and indices [size..capacity) are “empty”, which we’ll represent with null.

Suppose that we initialize the storage array’s capacity to 4. The following animation visualizes the changes to the size and capacity as we update the contents of the list.

By performing these periodic resizes, our dynamic array data structure is able to use bounded-length arrays to represent a list with unbounded capacity. Let’s formalize this approach by defining a class to implement the CS2110List interface with a dynamic array data structure.

`DynamicArrayList` Class

State Representation

We’ll call this class the DynamicArrayList. This is analogous to Java’s ArrayList class that also leverages a dynamic array. We’ll represent the state of our dynamic array list with two fields, the backing storage array (with the generic array type T[]) and the current size of the list. The class invariant stipulates that the (non-null) list entries occupy the first size entries of storage, and the remaining entries of storage are null.

Remark:

Technically this class invariant on storage makes the size field redundant, since we can always compute the size by scanning over the array entries until we encounter null. Doing this scanning is inefficient, and will result in a linear-time size() implementation. Storing the field takes up negligible extra space and reduces this time complexity to \(O(1)\).

Let’s set up the DynamicArrayList class with these fields and include a private assertInv() that will enforce this class invariant as we are developing the rest of the class.

DynamicArrayList

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/** An implementation of the CS2110List ADT using a dynamic array. */
public class DynamicArrayList<T> implements CS2110List<T> {
  /**
   * The backing storage of this list. Must have `storage[..size) != null` 
   * and `storage[size..] == null`.
   */
  private T[] storage;

  /**
   * The current size of this list. Must have `0 <= size <= storage.length`.
   */
  private int size;

  /**
   * Asserts the DynamicArrayList class invariant.
   */
  private void assertInv() {
    assert storage != null; // implicit invariant
    assert 0 <= size;
    assert size <= storage.length;

    for (int i = 0; i < size; i++) {
      assert storage[i] != null;
    }
    for (int j = size; j < storage.length; j++) {
      assert storage[j] == null;
    }
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/** An implementation of the CS2110List ADT using a dynamic array. */
public class DynamicArrayList<T> implements CS2110List<T> {
  /**
   * The backing storage of this list. Must have `storage[..size) != null` 
   * and `storage[size..] == null`.
   */
  private T[] storage;

  /**
   * The current size of this list. Must have `0 <= size <= storage.length`.
   */
  private int size;

  /**
   * Asserts the DynamicArrayList class invariant.
   */
  private void assertInv() {
    assert storage != null; // implicit invariant
    assert 0 <= size;
    assert size <= storage.length;

    for (int i = 0; i < size; i++) {
      assert storage[i] != null;
    }
    for (int j = size; j < storage.length; j++) {
      assert storage[j] == null;
    }
  }
}

Remark:

Now that we're more comfortable working with fields and instance methods, we'll start to drop the use of this from our sample code where it is unambiguous.

Constructor

Now, let’s define a constructor DynamicArrayList that creates an empty list. We should initialize size = 0, but how should we initialize storage? We want to start off with an array that has enough capacity to accommodate some additions to the list, but not too much capacity that will remain unused. Let’s add a constant INITIAL_CAPACITY to represent this value, which we’ll set to 10 (just as Java’s implementation does).

DynamicArrayList

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/** An implementation of the CS2110List ADT using a dynamic array. */
public class DynamicArrayList<T> implements CS2110List<T> {
  /**
   * The initial capacity for the backing storage array.
   */
  public static final int INITIAL_CAPACITY = 10;

  /**
   * Constructs a new, initially empty, DynamicArrayList.
   */
  public DynamicArrayList() {
    size = 0;
    storage = (T[]) new Object[INITIAL_CAPACITY];
    assertInv();
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/** An implementation of the CS2110List ADT using a dynamic array. */
public class DynamicArrayList<T> implements CS2110List<T> {
  /**
   * The initial capacity for the backing storage array.
   */
  public static final int INITIAL_CAPACITY = 10;

  /**
   * Constructs a new, initially empty, DynamicArrayList.
   */
  public DynamicArrayList() {
    size = 0;
    storage = (T[]) new Object[INITIAL_CAPACITY];
    assertInv();
  }
}

Remark:

Eww. This initialization of storage looks atrocious! There has to be a better way, right? Nope, sorry. Working with generic arrays in Java is unfortunately pretty ugly. Support for generics was added relatively late in the development of the language, so some less-than-ideal compromises had to be made.

Next, we’ll work on defining the methods from the CS2110List interface. As we develop these, we’ll run the corresponding test cases from our CS2110ListTest test suite to check our progress.

Accessor Methods

Let’s start with the basic accessor methods in the class, as these are used in most of our test cases; we will need them to be defined before we can check the functionality of other methods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


@Override
public int size() {
  return size;
}

@Override
public T get(int index) {
  assert 0 <= index && index < size; // defensive programming
  return storage[index];
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


@Override
public int size() {
  return size;
}

@Override
public T get(int index) {
  assert 0 <= index && index < size; // defensive programming
  return storage[index];
}

`insert()` and `add()`

Next, we’ll write the insert() and add() methods, since both of these encounter the possible need to resize the backing storage. We’ll extract this out into a private helper method increaseCapacity() that will allocate a new T[] array with double the capacity, copy the entries from the old backing storage to the initial indices of this new array, and reassign the storage field. We can achieve this with a single call to Java’s Arrays.copyOf() method, though it’s a good exercise to develop your own alternate implementation using a loop.

DynamicArrayList.java

1
2
3
4
5
6


/**
 * Copies the current entries of `storage` to a new backing array with double the capacity.
 */
private void increaseCapacity() {
  storage = Arrays.copyOf(storage, 2 * storage.length);
}

1
2
3
4
5
6


/**
 * Copies the current entries of `storage` to a new backing array with double the capacity.
 */
private void increaseCapacity() {
  storage = Arrays.copyOf(storage, 2 * storage.length);
}

In the insert() method, we should first check whether storage is full by comparing size to its capacity. If it is full, we should call increaseCapacity(). Next, we will need to shift later elements when we insert at an earlier position in the array, which we can do using the System.arraycopy() method. Then, we can store elem at the now-vacated index. Since this is a mutating method, we should end the method with a call to assertInv() to check that the class invariant has been maintained.

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


@Override
public void insert(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index;
  assert index <= size;
  if (size == storage.length) {
    increaseCapacity();
  }
  assert size < storage.length; // after potential resize
  System.arraycopy(storage, index, storage, index + 1, size - index);
  storage[index] = elem;
  assertInv();
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


@Override
public void insert(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index;
  assert index <= size;
  if (size == storage.length) {
    increaseCapacity();
  }
  assert size < storage.length; // after potential resize
  System.arraycopy(storage, index, storage, index + 1, size - index);
  storage[index] = elem;
  assertInv();
}

When we re-run our test cases, we see that some fail with an AssertionError in the assertInv() method, particularly on the line where we assert that the elements at the end of the storage array are null. What has gone wrong? Have we re-established the invariant in the insert() method? Take a careful look at the code and then check your answer below.

What is the mistake in the above code?

We did not finish re-establishing the class invariant in this implementation of the insert() method. In particular, we did not increment the size field after adding a new element to the array. This broke the class invariant that storage[size..] == null, since now storage[size] == elem != null. To restore the invariant, we must increment size:

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


@Override
public void insert(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index;
  assert index <= size;
  if (size == storage.length) {
    increaseCapacity();
  }
  assert size < storage.length; // after potential resize
  System.arraycopy(storage, index, storage, index + 1, size - index); // right shift
  storage[index] = elem;
  size++;
  assertInv();
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


@Override
public void insert(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index;
  assert index <= size;
  if (size == storage.length) {
    increaseCapacity();
  }
  assert size < storage.length; // after potential resize
  System.arraycopy(storage, index, storage, index + 1, size - index); // right shift
  storage[index] = elem;
  size++;
  assertInv();
}

While somewhat basic, this example demonstrates the importance of testing your code as you develop it and using assertions to verify properties that should be true at different points of execution. While some of this may seem tedious, these frequent and thorough checks will make you a much more efficient programmer as it will reduce the time you spend debugging.

The add() method is just the special case of insert() with index = size. Thus, we can call the insert() method from within add() to avoid duplicating code.

DynamicArrayList.java

1
2
3
4


@Override
public void add(T elem) {
  insert(size, elem);
}

1
2
3
4


@Override
public void add(T elem) {
  insert(size, elem);
}

`set()`

The set() method is another more straightforward modifying method that consists of some precondition checking and one array entry reassignment.

DynamicArrayList.java

1
2
3
4
5
6
7


@Override
public void set(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index && index < size;
  storage[index] = elem;
  assertInv();
}

1
2
3
4
5
6
7


@Override
public void set(int index, T elem) {
  assert elem != null; // defensive programming
  assert 0 <= index && index < size;
  storage[index] = elem;
  assertInv();
}

`remove()`

The remove() method is similar to the insert() method; it may require the shifting of a range of elements, now to “plug up” the hole that is left when we remove an array element. We must also make sure to reassign the now-unused array entry to null to restore the class invariant.

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


@Override
public T remove(int index) {
  assert 0 <= index && index < size;
  T removed = storage[index];
  System.arraycopy(storage, index + 1, storage, index, size - index - 1); // left shift
  size--;
  storage[size] = null; // restore class invariant
  assertInv();
  return removed;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


@Override
public T remove(int index) {
  assert 0 <= index && index < size;
  T removed = storage[index];
  System.arraycopy(storage, index + 1, storage, index, size - index - 1); // left shift
  size--;
  storage[size] = null; // restore class invariant
  assertInv();
  return removed;
}

`contains()`, `indexOf()`, and `delete()`

The three remaining methods all require us to locate a particular element (by value) within the list. We can extract this common subroutine into a private helper method find() that performs a linear search (using the equals() method to test for object equality rather than ==). Once we have done this, we can use the CS2110List method specifications to complete the definitions of these methods.

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/**
 * Returns the index of the first instance of `elem` in this list. 
 * Returns `size` if `elem` is not present in this list.
 */
private int find(T elem) {
  for (int i = 0; i < size; i++) {
    if (storage[i].equals(elem)) {
      return i;
    }
  }
  return size;
}

@Override
public boolean contains(T elem) {
  return find(elem) < size;
}

@Override
public int indexOf(T elem) {
  int i = find(elem);
  assert i < size; // `elem` is present in list
  return i;
}

@Override
public void delete(T elem) {
  remove(find(elem)); // `remove()` call asserts pre-condition and class invariant
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/**
 * Returns the index of the first instance of `elem` in this list. 
 * Returns `size` if `elem` is not present in this list.
 */
private int find(T elem) {
  for (int i = 0; i < size; i++) {
    if (storage[i].equals(elem)) {
      return i;
    }
  }
  return size;
}

@Override
public boolean contains(T elem) {
  return find(elem) < size;
}

@Override
public int indexOf(T elem) {
  int i = find(elem);
  assert i < size; // `elem` is present in list
  return i;
}

@Override
public void delete(T elem) {
  remove(find(elem)); // `remove()` call asserts pre-condition and class invariant
}

If we re-run our test cases, we should see that they all pass. We’ve finished our definition of the DynamicArrayList, providing the client an implementation of the CS2110List ADT. It remains to reason about the performance of this implementation.

Complexity of `DynamicArrayList`

Just as we did for arrays, we’ll let the variable \(N\) denote the size of our list (i.e., the number of elements it stores, not the capacity of its backing storage). Before we analyze the time complexity of the DynamicArrayList methods, let’s consider its memory usage.

Space Complexity

Overall, how much space will a DynamicArrayList of \(N\) elements take up (not including the space taken up by the element objects that it stores)? The size field takes up a constant amount of space, and each element that is stored uses a constant amount of space, for its reference in the backing storage array. In addition, the empty cells at the end of the backing array take up memory. After the INITIAL_CAPACITY = 10 (i.e., \(O(1)\)) empty cells at construction, our resizing strategy always doubles the array capacity when the array becomes full. Therefore, the number of empty cells will not exceed the number of filled cells, meaning the empty cells contribute \(O(N)\) to the memory usage of a DynamicArrayList, for an overall \(O(N)\) size.

Remark:

This analysis does not account for the effect of removals. Currently, our DynamicArrayList does not resize down when too many elements are removed, which can cause the empty space to occupy much more than half of the backing storage array. To truly achieve an \(O(N)\) space guarantee, we'd need to refine our resizing logic. See Exercise 12.6 for more details.

Most of the DynamicArrayList methods have an \(O(1)\) space complexity, only allocating a constant number of local variables. The one exception is increaseCapacity(), whose Arrays.copyOf() call allocates a temporary \(O(N)\) length array during the copying.

Time Complexity

Let’s analyze the worst-case time complexities for the DynamicArrayList methods using the accounting strategies that we discussed a few lectures ago. Since the method definitions are relatively short, we summarize these analyses below. Note that we do not factor the runtime of any assertInv() calls into our analysis. assert statements are a development tool and are turned off (or omitted) in the final code that is shipped to clients.

size(): \(O(1)\), consisting of a single memory access.

get(): \(O(1)\), consisting of a single memory access.

increaseCapacity(): \(O(N)\), since Arrays.copyOf() iterates over the entries to copy them to the new array.

insert(): \(O(N)\), in the case that the array was full, the runtime is dominated by the call to increaseCapacity(). Even when a resize is not needed, we need to shift \(N - i = O(N)\) elements to make space for the new element.

add(): \(O(N)\), in the case that the array was full, the runtime is dominated by the call to increaseCapacity().

set(): \(O(1)\), consists of a single array access and reassignment.

remove(): \(O(N)\), in the worst-case, the first element is removed and the other \(N-1 = O(N)\) elements must all be shifted over to fill this space.

find(): \(O(N)\), since we are performing a linear search over the first \(N\) elements of the storage array.

contains(): \(O(N)\), dominated by find().

indexOf(): \(O(N)\), dominated by find().

delete(): \(O(N)\), dominated by find(), and the potential \(O(N)\) element shift when an early element is deleted.

Amortized Analysis

We just stated that the worst-case runtime complexity of the add() method is \(O(N)\); however, this does not provide a very good summary of its “typical” performance. When the backing array is not full, an \(O(N)\) resize is not needed. Moreover, since the new element is inserted at the end of the array, no shifts are needed and the add() call executes in \(O(1)\) time. Almost always, add() will have this \(O(1)\) runtime, since resizes are infrequent (and become exponentially more infrequent as the size of the list grows). A sketch of the runtime, visualized as a histogram, is shown below.

The optimal worst-case bound for this runtime complexity is linear, since the heights of the tall “resizing” bars grow as a linear function in \(N\). However, the very small runtimes of all the other bars can “average out” these infrequent “blips” to give a more reasonable notion of runtime complexity across multiple calls to add(). This is the idea behind an amortized analysis.

Definition: Amortized Complexity Analysis

In an amortized worst-case time complexity analysis of a method, we compute the total worst-case time complexity of a sequence of method calls and divide this by the number of method calls, giving a notion of a "long-term average" runtime of the method.

Let’s consider the total work performed over the first \(N\) calls to add() (on lists of sizes \(0,1,\dots,N-1\)) for increasing values of \(N\).

When \(N < 10\) each call requires only “1 unit” (i.e., \(O(1)\)) work, so we perform a total of \(N\) units of work over these \(N\) calls for an amortized 1 unit of work per add() call.
When \(N = 11\), the first 10 add() calls require 1 unit of work, and the final add() call requires 11 (i.e., \(O(N)\)) units of work. This gives a total of 21 units of work over 11 add() calls which amortizes to \(<\) 2 units of work per add() call.
When \(12 < N < 20\), then \(N-1\) of the add() calls require 1 unit of work, and the 11’th call requires 11 units of work, for a total of \(N + 10\) units of work over \(N\) calls. This amortizes to \( \frac{N+10}{N} < 2 \) units of work per add() call.
When \(N = 21\), then 19 of the add() calls require 1 unit of work, the 11’th call requires 11 units of work and the 21’st call requires 21 units of work. This gives a total of 51 units of work over 21 add() calls which amortizes to \(<\) 3 units of work per add() call.

Continuing with this analysis, we’ll find that we’ll never perform more than 3 = \(O(1)\) units of amortized work per add() call, meaning the amortized worst-case time complexity of the add() method is \(O(1)\).

We can observe this visually by “rearranging” the bar heights in our runtime plot. Note that we ultimately care about the average height of a bar, and this average height does not change when we move height from one bar to another. Specifically, we can “topple” all of the taller \(O(N)\) bars to their left, leaving one unit of their work in place and moving 2 units of work on each bar proceeding to its left (until all units have been accounted for). The tall bars are spaced out in such a way that no bar will end up with more than 3 units of work after this “toppling”, giving the same amortized \(O(1)\) complexity.

Remark:

Here, our choice to double the capacity of the backing storage array (or more generally, increase it by a multiplicative factor) during each resize was critical to achieve this \(O(1)\) amortized complexity, as it ensured that the "tall bars" were spaced out enough for the averaging to converge to a constant. An alternate resizing strategy, such as increasing the array capacity by a constant amount, would be insufficient. See Exercise 12.9 for more details.

An amortized time complexity gives us a different picture about the runtime of a method which may be more or less useful in certain circumstances. When we want to understand how we can expect that a method will perform over many invocations (e.g., understanding the performance of a data structure across many insertions and removals), then amortized complexity may be most appropriate. In critical applications (such as flight software or medical equipment), we may be more concerned with the absolute worst-case performance of a single method call. If a resize of a very large data logging array could take place during a critical instant in a rocket’s trajectory, we’d want to be aware of this possibility. In this case, a standard worst-case performance guarantee may be more appropriate.

Main Takeaways:

An abstract data type (ADT) describes a set of operations that we can perform on a collection of data. We model ADTs using interfaces in Java. One example of an ADT is a list, which is a linearly ordered collection of data whose elements are accessible via their indices.
A data structure is an implementation of an ADT using a particular state representation. One ADT may be realizable with multiple different data structures, each with its own performance characteristics.
Generic type parameters enable the parameterization of a class or method on an unknown type that is supplied by the client. Java supports generic types using angle bracket (<>) syntax.
A dynamic array is a data structure that uses arrays for the backing storage of its elements, reallocating larger arrays and copying over the data when the backing storage becomes full. The list ADT can be implemented using a dynamic array.
In amortized time-complexity analysis we report the average runtime of a method taken over a sequence of calls. This often provides a more meaningful summary of a method's "typical" performance than its worst-case time complexity.

Exercises

Exercise 12.1: Check Your Understanding

(a)

Consider the following generic class.

1
2
3
4


public class Shelter<T> {
  public T adopt(String name) { ... }
  public void surrender(T pet) { ... }
}

1
2
3
4


public class Shelter<T> {
  public T adopt(String name) { ... }
  public void surrender(T pet) { ... }
}

Assume that Cat, Dog, and Animal are types with the following subtype relationships: Cat <: Animal, and Dog <: Animal. A client attempts to use this class as follows:

1
2
3
4


Shelter<Cat> shelter = ...; // Assume valid creation of a Shelter
Animal pet = shelter.adopt("fluffy");
Dog spot = ...; // Assume valid creation of Dog or subtype of Dog
shelter.surrender(spot);

1
2
3
4


Shelter<Cat> shelter = ...; // Assume valid creation of a Shelter
Animal pet = shelter.adopt("fluffy");
Dog spot = ...; // Assume valid creation of Dog or subtype of Dog
shelter.surrender(spot);

What will happen when the client attempts to compile and run their code?

Check Answer

(b)

You are reviewing a colleague’s code printed out on paper (for some reason). You identify a variable c of type Collection. However, the printer was low on ink, so you can’t read the generic type parameter E that c was declared with. Later, you see the following statement:

1

c.add(true);

1

c.add(true);

The author of the code says it compiles and runs just fine. What can you infer about c’s illegible parametric type E?

Check Answer

(c)

Consider the following method:

1
2
3
4
5


public static <T> void removeAll(CS2110List<T> list) {
  while (list.size() != 0) {
    list.remove(0);
  }
}

1
2
3
4
5


public static <T> void removeAll(CS2110List<T> list) {
  while (list.size() != 0) {
    list.remove(0);
  }
}

What is the worst-case runtime of removeAll()?

Check Answer

Exercise 12.2: Generic Pairs

We want to define a class Pair that holds two fields. This can be used to model a variety of things such as coordinates on the 2D plane or an alternative to the Book record class from Assignment A4.

(a)

Implement a class called SamePair that is generic on T. Both fields must have the same type. Define methods first() and second() to get the value of these fields. Define methods setFirst() and setSecond() to set the values of each element respectively.

1
2


/** A pair of elements with the same type. */
public class SamePair<T> { ... }

1
2


/** A pair of elements with the same type. */
public class SamePair<T> { ... }

(b)

Implement a class called Pair that is generic on two type parameters, U and V, where U is the type of the first field and V is the type of the second field. Implement the same four methods as in part a.

1
2


/** A pair of elements with possibly different types. */
public class Pair<U, V> { ... }

1
2


/** A pair of elements with possibly different types. */
public class Pair<U, V> { ... }

Exercise 12.3: Bag ADT

Another abstract data type is the Bag or a Multiset. A Bag is a collection of unordered items that can contain duplicates. We can model a Bag ADT with the following interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/** 
 * A collection of unordered items with possibly duplicates. 
 */
public interface Bag<T> {
  /** 
   * Adds `elem` to the bag. 
   */
  void add(T elem);

  /**
   * Removes one instance of `elem` from the bag. Requires `contains(elem) is true`.
   */
  void delete(T elem);

  /**
   * Returns the number of times that `elem` is present in the bag.
   */
  int frequencyOf(T elem);

  /**
   * Returns whether there is a non-zero number of `elem`s in the bag.
   */
  boolean contains(T elem);

  /**
   * Returns the total number of elements in the bag.
   */
  int size();
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/** 
 * A collection of unordered items with possibly duplicates. 
 */
public interface Bag<T> {
  /** 
   * Adds `elem` to the bag. 
   */
  void add(T elem);

  /**
   * Removes one instance of `elem` from the bag. Requires `contains(elem) is true`.
   */
  void delete(T elem);

  /**
   * Returns the number of times that `elem` is present in the bag.
   */
  int frequencyOf(T elem);

  /**
   * Returns whether there is a non-zero number of `elem`s in the bag.
   */
  boolean contains(T elem);

  /**
   * Returns the total number of elements in the bag.
   */
  int size();
}

(a)

Define a class called DynamicArrayBag that implements the Bag ADT using a dynamically sized array.

(b)

The main difference between the Bag and List ADT is the enforcement of an order. Analyze the asymptotic worst-case time complexity of each method in DynamicArrayBag. Does non-order improve the efficiency of operations? If so, how?

Exercise 12.4: More Operations for CS2110List

Suppose we want to add the following methods to CS2110List interface. Add method definitions in DynamicArrayList to override each of the following methods. State the worst-case runtime complexity for each.

(a)

Note that NoSuchElementException <: RuntimeException.

1
2
3
4
5


/**
 * Returns the last index of `elem` in the list. Throws `NoSuchElementException` 
 * if `contains(elem) == false`.
 */
int lastIndexOf(T elem);

1
2
3
4
5


/**
 * Returns the last index of `elem` in the list. Throws `NoSuchElementException` 
 * if `contains(elem) == false`.
 */
int lastIndexOf(T elem);

(b)

1
2
3
4


/**
 * Returns frequency of `elem` in the list.
 */
int frequencyOf(T elem);

1
2
3
4


/**
 * Returns frequency of `elem` in the list.
 */
int frequencyOf(T elem);

(c)

1
2
3
4
5


/**
 * Modifies `this` by adding all the elements of `other` in order to the end 
 * of the list.
 */
void extend(CS2110List<T> other);

1
2
3
4
5


/**
 * Modifies `this` by adding all the elements of `other` in order to the end 
 * of the list.
 */
void extend(CS2110List<T> other);

Exercise 12.5: SortedList

Another possible implementation of the CS2110List ADT is with a sorted list data structure. This data structure enforces a sorted order invariant on the elements in the array.

1
2


/** A list that is sorted in ascending order. */
public class SortedList<T> implements CS2110List<T> { ... }

1
2


/** A list that is sorted in ascending order. */
public class SortedList<T> implements CS2110List<T> { ... }

(a)

Implement the class by overriding all methods defined in CS2110List, including the ones added in Exercise 12.4. State the worst-case time complexity of each method. The frequencyOf() method should run in \(O(\log n)\) time. As a hint, view Exercise 5.5 in Lecture 5.

In your implementation, you may suppose you have access to this method that determines the order of two objects of type T. This is similar to the Comparator interface that will be introduced in a later lecture.

1
2
3
4
5
6


/**
 * Compares its two arguments for order. Returns a negative integer, zero, or a 
 * positive integer if the first argument is less than, equal to, or greater 
 * than the second (respectively).
 */
static <T> int compare(T o1, T o2) { ... }

1
2
3
4
5
6


/**
 * Compares its two arguments for order. Returns a negative integer, zero, or a 
 * positive integer if the first argument is less than, equal to, or greater 
 * than the second (respectively).
 */
static <T> int compare(T o1, T o2) { ... }

(b)

Say you want to store a list of Integers as a field for a class. In what scenarios would you choose to use a DynamicArrayList over SortedList and vice versa?

Exercise 12.6: Dynamic Shrinking

To achieve the \(O(N)\) space guarantee, we need to shrink the backing array when the space is no longer needed.

(a)

What relationship should size and capacity satisfy to justify shrinking? Keep in mind that we want to maintain the same (amortized) runtime complexities for all the methods in DynamicArrayList after adding this shrinking behavior.

(b)

When we choose to shrink the backing array, by how much should we shrink?

(c)

Revise the delete() and remove() methods so that the backing array is properly shrunk.

Exercise 12.7: Applications of CS2110List

Implement the following methods that utilize a CS2110List (that conforms to its specifications). For both DynamicArrayList and SortedList (defined in Exercise 12.5), state the worst-case time complexity for each method, considering the case where the input parameter could have either dynamic type.

(a)

1
2
3
4
5
6


/**
 * Returns the element that appears the most frequently in the list. If multiple 
 * elements share the same maximum frequency, return any such element. Requires
 * `list.size() > 0`.
 */
static <T> T maxFrequency(CS2110List<T> list) { ... }

1
2
3
4
5
6


/**
 * Returns the element that appears the most frequently in the list. If multiple 
 * elements share the same maximum frequency, return any such element. Requires
 * `list.size() > 0`.
 */
static <T> T maxFrequency(CS2110List<T> list) { ... }

(b)

1
2
3
4


/**
 * Returns `list` in reverse order.
 */
static <T> CS2110List<T> reverse(CS2110List<T> list) { ... }

1
2
3
4


/**
 * Returns `list` in reverse order.
 */
static <T> CS2110List<T> reverse(CS2110List<T> list) { ... }

(c)

Again, assume you have access to the compare() method defined in Exercise 12.5.a.

1
2
3
4
5


/**
 * Returns the value `r` with `0 <= r <= list.size()` such that `list[..r) < v`
 * and `list[r..) >= v`. Requires that `list` is sorted (in ascending order). 
 */
static <T> int binarySearch(CS2110List<T> list, T v) { ... }

1
2
3
4
5


/**
 * Returns the value `r` with `0 <= r <= list.size()` such that `list[..r) < v`
 * and `list[r..) >= v`. Requires that `list` is sorted (in ascending order). 
 */
static <T> int binarySearch(CS2110List<T> list, T v) { ... }

Exercise 12.8: Generic Bounds

We can enforce certain conditions on the generic types of a class. For instance, we can specify that the class is generic on <T extends Account>, which means that any generic type T must satisfy T <: Account. We can substitute Account with any class or interface. Note that with interfaces, we still use extends. Recall the Point record class from the previous lecture. Let's make this record class generic on T extends Number. Number is an abstract class that is a supertype of wrapper classes, such as Integer and Double.

1
2
3
4
5
6
7
8


/** An immutable class representing a point in the 2D coordinate plane with `T` coordinates. */
public class Point<T extends Number> {
  /** The x-coordinate of this point. */
  private T x;

  /** The y-coordinate of this point. */
  private T y;
}

1
2
3
4
5
6
7
8


/** An immutable class representing a point in the 2D coordinate plane with `T` coordinates. */
public class Point<T extends Number> {
  /** The x-coordinate of this point. */
  private T x;

  /** The y-coordinate of this point. */
  private T y;
}

(a)

Implement the method distanceTo(). Sift through the Number API to find an appropriate method to use.

1
2


/** Returns the distance between this point and `other`. */
public double distanceTo(Point<T> other) { ... }

1
2


/** Returns the distance between this point and `other`. */
public double distanceTo(Point<T> other) { ... }

(b)

How does this improve code extensibility? Consider if a client wanted to represent integral points or points with greater precision than double.

Exercise 12.9: Amortized Analysis: Constant Size Increase

Assume that DynamicArrayList increases the size of the backing array by \( 10 \) each time instead of doubling.

(a)

Suppose we have an infinite loop that adds an element to DynamicArrayList each iteration. With an initial capacity of \( 10 \), after how many iterations must the backing array resize?

(b)

After 21, 51, or 101 iterations of this loop, how many times will the backing array be resized? What about for an arbitrary \( k > 0 \) iterations?

(c)

Derive a formula for the amount of work done after \(N\) iterations in terms of \(N\). Given this, what is the average work done per iteration? What is the amortized runtime complexity of add() in this scenario?

Exercise 12.10: Amortized Analysis: insertLeft()

Suppose we define a new method in DynamicArrayList called insertLeft(), defined below, that prepends an element into the list. We'll analyze the asymptotic runtime complexity of this method.

1
2
3
4
5
6


/**
 * Inserts `elem` into the list at index 0.
 */
public void insertLeft(T elem) {
  insert(0, elem);
}

1
2
3
4
5
6


/**
 * Inserts `elem` into the list at index 0.
 */
public void insertLeft(T elem) {
  insert(0, elem);
}

(a)

In both the cases when resizing is and is not necessary, what are the runtimes of this method?

(b)

What is the amortized runtime complexity of this method?

12. Collections and Generics

Data Structures and Abstract Data Types

The List ADT

Generics

Writing Tests for an ADT

Dynamic Arrays

DynamicArrayList Class

State Representation

Constructor

Accessor Methods

insert() and add()

set()

remove()

contains(), indexOf(), and delete()

Complexity of DynamicArrayList

Space Complexity

Time Complexity

Amortized Analysis

Main Takeaways:

Exercises

On this page:

The `List` ADT

`DynamicArrayList` Class

`insert()` and `add()`

`set()`

`remove()`

`contains()`, `indexOf()`, and `delete()`

Complexity of `DynamicArrayList`