12. Collections and Generics
Essentially all programs that we write involve two principal components, data structures and algorithms. Data structures allow us to collect together related pieces of data, organizing them in a particular way that will enable us to efficiently process them using algorithms (computational routines). Different scenarios benefit from organizing data in different ways. On computer file systems, directories (folders) and files are arranged hierarchically, which allows related files easier access to each other, and makes it easier for your operating system to manage file permissions. As we have already seen, dictionaries store their words in sorted order, enabling a fast way to look up definitions. Social networks model the connectivity of their users in a way that allows them to extract information about users’ interests and tailor their experience accordingly. Databases, which can manage many gigabytes or terabytes of data, use a variety of data structures to organize the information in a way that enables fast queries in real-world systems.
In this middle section of the course, we will survey the quintessential collection types that form the basis for most programs: lists, stacks, queues, trees, heaps, hash tables, maps, and graphs. Doing this will require all of the tools that we have introduced thus far. We’ll use our understanding of Java’s memory semantics to visualize the underlying structures of these collections. We’ll use the object-oriented design principles from the past few lectures to encapsulate these collections and provide intuitive interfaces to their clients. We’ll use invariants, specifications, and testing to reason about correctness of our implementations. Finally, we’ll use asymptotic complexity to analyze the performance trade-offs of these collections.
Data Structures and Abstract Data Types
Before we can begin to implement and analyze different collections, we need to establish some basic terminology.
A collection is a type that stores one or more instances of another type.
We have already seen one example of a collection in the course, arrays. An array is a collection that consists of a fixed number of slots (i.e., indices) in which primitive values or objects of a particular type can be stored. For example, the code String[] words = new String[6];
initializes words
to refer to an array that can store 6 String
s. The state of this collection consists of the capacity of the array, its length
6, and the six contiguous memory locations where references to these String
entries are stored. The behaviors of this String[]
array consist of querying its length
and reading and writing to each of its entries by their index using Java’s built in square bracket ([]
) syntax. The contiguous nature of the array’s storage allows the client to access the entry at any particular index of the array in \(O(1)\) time (in the background, Java can locate this address using a single multiplication and addition). This fact is often referred to as the array’s random access guarantee, and will be a central tool in our analysis of many data structures that involve arrays.
We say that arrays have a random access guarantee since their client can access (i.e., read the entry at or write a new entry to) the entry at any valid index of the array in \(O(1)\) time, independent of the array's length or the value of the index.
In other words, accessing the entry a[2]
of an array a
takes the same amount of time whether a
has length 10 or length 1 million, and in the latter case it will also take the same amount of time to access a[800000]
. From this discussion of arrays, we can start to see a distinction between the ways that the client (who writes code that uses arrays) and the implementer (in this case, the developers of Java who added support for arrays) think about collections.
The client is primarily concerned with the behaviors that the collection supports (and the runtime/space complexities required to implement these behaviors). Can they add or remove elements from the collection, and what is the syntax to do this? How quickly will the additions/removals be carried out, and does this depend on the collection’s size, the element that will be added/removed, etc.? How do they check whether a particular element is present in the collection and access its location if it is present? What, if anything, can the client do to modify particular elements within the collection? In which use-cases is this collection an appropriate choice? Are there other constraints or concerns that they need to be aware of to get the best performance out of a collection?
On the other hand, the implementer is tasked with figuring out how to support these behaviors. What variables and objects will they need to represent the state of the collection, and what class invariants should be enforced for these? How does the selection of a state representation impact how quickly different operations can be performed and invariants can be restored? What sort of encapsulation will be necessary to hide the sometimes messy inner workings of a collection’s class from the client and only expose a neat, seamless interface?
To distinguish these different views of collections, we’ll introduce two additional high-level terms, abstract data types (or ADTs) and data structures.
An abstract data type describes the behaviors that are supported by a collection without specifying the details of its underlying implementation.
A data structure is a class that realizes an abstract data type by specifying its state representation and using this to provide definitions for each of the behaviors it supports.
Since they outline a set of supported behaviors but elide implementation details, abstract data types are naturally modeled in Java using interfaces. A class that implements an ADT will utilize a data structure to represent its state. In this way, a single ADT can be realized by multiple different data structures, and each implementation can have its own performance characteristics. Arrays, as such a low-level type with their own special language-supported syntax, blur the lines between ADT and data structure. Therefore, we’ll use another ADT, the List, throughout the rest of this and the following lecture to better demonstrate this distinction.
The List
ADT
A list is a linearly ordered data structure, similar to an array, in which entries are indexed by consecutive integer indices beginning with 0. Lists differ from arrays since they do not have a fixed length. Rather, their size grows dynamically to accommodate adding an arbitrary number of elements. The Java language includes a List
ADT in the java.util.List
interface, but we will practice developing our own ADT interface CS2110List
that supports a subset of its features.
To work toward this implementation, let’s start by restricting our focus to only a list that can collect (non-null
) String
s, as this will make it easier to design the method signatures. We’ll call this the CS2110StringList
interface. What operations should this interface support?
First, we’ll want to be able to add a String
to the (end of the) list.
CS2110StringList.java
|
|
|
|
We may also want to be able to insert a String
at some other position in the list, shifting down the Strings that currently sit at later positions to make space for it.
CS2110StringList.java
|
|
|
|
In the “Requires” clause of this method’s spec, we see that we’ll need a way to access the current number of elements in the list, which we’ll do with the size()
method.
CS2110StringList.java
|
|
|
|
What other accessor methods will be useful to query properties of the list’s contents? We may want to know what String is stored at a particular index, which we’ll support with a get()
method. We might also want to know whether the list contains a particular String (supported by a contains()
method). More specifically, we may want to know the (first) index where a particular String is stored (supported by an indexOf()
method).
CS2110StringList.java
|
|
|
|
Lastly, let’s consider some methods that modify the contents of a list in ways other than adding elements. First, a client may wish to change the element stored at a particular index, which we’ll support with a set()
method. A client may also want to remove an element from the list. We’ll support two variants of removal. First, the client can ask us to remove (and return) an element at a given index, which we’ll support with a remove()
method. Alternatively, the client can pass in an element and tell us to remove (the first instance of) it from the list, which we’ll support with a delete()
method.
CS2110StringList.java
|
|
|
|
Note that some of the names and signatures given above deviate slightly from those in Java's List
interface. This is intentional, and meant to draw parallels between operations for different data structures that we will study, avoiding ambiguities (e.g., by treating deletion and removal as semantically different operations, rather than Java's approach of calling both "removal").
Generics
While the CS2110StringList
interface is suitable for declaring operations on a list of String
s, it will not suffice for working with a list of another data type, such as a list of Account
s or Point
s or even a list of other lists. If we want to support a list of Point
s, we’d need to define a second, parallel interface such as CS2110PointList
that had signatures involving the Point
type. The behaviors for this list of Point
s (adding Point
s, removing Point
s, checking whether the list contained a certain Point
, etc.) would be the same, leading to a lot of repeated code. This isn’t practical. We’d like a way to develop a single interface that is capable of storing any type of data. In other words, we’d like a way to create a polymorphic list interface.
Subtype polymorphism is one possibility. We could define an CS2110ObjectList
that stores Object
s. Since Object
is a supertype of every class, a client could create a list of String
s with this interface, or a list of Points
, or a list of any type they wish. However, they could also create a list containing a mix of String
s and Point
s. We’d have no way to enforce that a particular list instance only contain one type (finer than Object
) of data. An alternate approach, that allows this constraint on types, is to use parametric polymorphism.
Parametric polymorphism achieves polymorphic behavior by parameterizing the definition of an interface or class on one or more generic types that are specified by the client code.
At this point, we are familiar with parameterizing the methods that we write. When we declare a method, we add one or more variable names (and their static types) within parentheses in the method’s signature. For example, the rectangleArea()
method below is parameterized on width
and height
values.
|
|
|
|
These names width
and height
become variables that we can use as we develop the method. When the method is utilized in client code (i.e., called), the client passes arguments into the rectangleArea
method that fix the values of width
and height
, and these are substituted as the method is being evaluated. Just as we can parameterize a method on values, we can parameterize a class or interface on a generic type. We do this using angle brackets, such as
CS2110List.java
|
|
|
|
This introduces a generic type parameter T
that can be used to describe a type throughout the interface or class. When the client declares a variable with type CS2110List
, they will specify which reference type T
will represent for that variable. For example, if they declare a list
|
|
|
|
this tells the compiler to substitute the String
type for T
within CS2110List
whenever it is accessed through the words
variable. You can think about this as replacing T
with String
within the interface (or class) definition, just as we replace the method parameters with their values when executing the method.
Within an interface or class with a generic type, we can use that type parameter in any place where we would declare an ordinary type. For example, the add method of our generic CS2110List
type will no longer take in a String
parameter; it will take in a T
parameter for whatever type the client specified as T
.
CS2110List.java
|
|
|
|
Similarly, the get()
method will no longer have a String
return type, it will have T
return type.
CS2110List.java
|
|
|
|
In a generic class definition, we can also declare fields and local variables with generic types (as we will soon see). We cannot, however, construct new objects of generic types or call methods with generic targets. Intuitively, since the generic type parameter can represent any type, we don’t know whether that type supports a method (including a constructor) with a particular signature. The complete code for the generic CS2110List
interface is given below.
CS2110List.java
|
|
|
|
A generic type parameter can represent any reference type, which begs the question: "What if we want to instantiate a generic class using a primitive type?" Java allows this through the use of wrapper classes for each primitive type, which you can think of as a reference type whose object contains a single field of a primitive type. For example, the Integer
reference type wraps the int
primitive type. Java supports "auto-boxing" and "auto-unboxing" to automatically convert between primitive types and objects of their wrapper classes, allowing us to write code such as
|
|
|
|
Writing Tests for an ADT
We can use the specifications for the methods declared in the CS2110List
interface to develop tests for this ADT that we can run against any of its implementations. These tests will typically consist of constructing some CS2110List
object (i.e., an object of some class implementing the CS2110List
interface) and performing a series of mutating methods on this object, using accessor methods within JUnit assertions between these mutating methods to verify that the list is in the correct state.
One concern that arises from this testing pattern is that we will need to construct CS2110List
objects. This appears to require us to know in advance the name of the class that will implement the CS2110List
interface so we can call its constructor. This will also lock us into testing that particular class (i.e., the data structure) rather than developing a set of tests that will work for any CS2110List
implementation. This should be a somewhat familiar problem at this point, perhaps from a slightly new angle. We want to write code (tests) that will extract common behaviors from a bunch of specialized classes. We achieve this using inheritance. We’ll write a CS2110ListTest
superclass, which will contain all of our test definitions. Its subclasses will be responsible for constructing the lists that are used in the tests, which they will do by overriding a constructList()
method to call the constructor of the particular list implementation that they are testing. To ensure that all of the subclasses define constructList()
, we’ll make this an abstract
method of CS2110ListTest
(which now must be marked as an abstract
class).
To see this testing pattern in action, let’s suppose that the DynamicArrayList
class (which we’ll define shortly) implements the CS2110List
interface. Then, we’ll define
CS2110ListTest.java
|
|
|
|
When starting to work with generic types, the syntax can be a bit quirky. Let’s stop to remark on some of the quirks that arise in this testing code.
- In the signature of
CS2110ListTest.constructList()
, there is an extra “<T>
” before the return type. This is used to introduce a new generic type within theconstructList()
method, required since the return type of the method will change depending on what type of objects the constructed list will store. The value of this generic type is inferred by Java each time that this method is invoked. For example, on line 15, the return value ofconstructList()
, which has typeCS2110List<T>
is assigned to a variable with static typeCS2110List<String>
, soT
must beString
in this invocation. - In
DynamicArrayListTest.constructList()
, we have omitted the generic typeT
from the constructor call on line 25, writing onlyDynamicArrayList<>()
. This is acceptable because the generic type can be inferred by Java; we are returning a reference to this object and the return type isCS2110List<T>
, so the generic argument should beT
. It would also be acceptable to writereturn new DynamicArrayList<T>();
, though this would likely prompt a warning in IntelliJ. - Unrelated to generics, but we have declared a second class
DynamicArrayListTest
within the “outer scope” of the “CS2110ListTest.java” file. This is acceptable since this class is not marked aspublic
. Since theDynamicArrayListTest
has limited functionality (its only responsibility is constructing lists for theCS2110ListTest
test cases), it is appropriate to place it here rather than in its own file.
When we develop tests for ADTs, we want to make sure that they provide good coverage. Here are some tips for achieving this:
- Thoroughly test the small size “edge cases” of an ADT. Often, the way that a data structure handles operations for small sizes requires somewhat different logic. Many bugs arise when this is not handled carefully. There should be many tests focusing on adding to and querying empty collections and collections with only one element, as well as removing elements from a collection until it is empty.
- For collections with intrinsic orders, make sure that you test interacting with the “edge” elements, since often these require different logic than the “middle” elements. Include tests that add/remove/modify the first element and the element just after the first element. Similarly, try adding/removing/modifying the last element and the element just before the last element. On a related note, write tests that cover elements well within the “middle” of the structure.
- Write some tests that “stress” your implementation for larger sizes. While your code may appear to be working correctly for small amounts of data, it may have memory or performance inefficiencies that only become apparent once the collection includes hundreds or thousands of elements.
- Make sure that your tests include assertions about the return values all of the accessor methods. It is a common pitfall to only include assertions about some properties of the collection (such as its size, or its string representation), which may hide implementation issues that are revealed by other accessors.
We have written a set of comprehensive unit tests for the CS2110List
ADT that are included in this lecture’s source code. This level of thoroughness and documentation is what you should strive for when you develop tests throughout the rest of the course. You don’t want your client to find bugs in your data structure that your tests didn’t cover!
Dynamic Arrays
Now that we have defined a List ADT and written tests that enforce its specifications, we are ready to think about its implementation. For the rest of today’s lecture, we’ll focus on one implementation using a dynamic array data structure. In the next lecture, we’ll write another implementation using a linked data structure.
A List is a linearly ordered ADT, just like an array. Therefore, it seems natural to represent the state of a list using an array, in which the i
th entry of the array stores the element at index i
in the list. We’ll call this array the backing storage for the list object. What should the size of this storage array be? One fixed size will not work because a list is allowed to hold arbitrarily many elements. If we fixed an array of (say) 1000 elements to use as our state representation, then we’d have a problem when the client tried to add the 1001’st element. Instead, we will need to resize our storage array periodically when we realize it has run out of room. This is the main intuition behind the dynamic array data structure.
A dynamic array is a data structure that stores its data in an array. This array is automatically resized to add more capacity when it becomes too full (and perhaps also to remove unused capacity when it becomes too empty).
To better understand the idea behind a dynamic array, let’s distinguish two different notions. We’ll refer to the length of the backing storage array as its capacity. This is the maximum number of elements that it can currently store. Separately, the size of the structure represented by the dynamic array is the number of elements that it is actively storing. To make the array’s indices align with the list’s, we will “pack” all the elements on its left, so indices [..size)
contain the list’s elements, and indices [size..capacity)
are “empty”, which we’ll represent with null
.
Suppose that we initialize the storage array’s capacity to 4. The following animation visualizes the changes to the size and capacity as we update the contents of the list.
previous
next
By performing these periodic resizes, our dynamic array data structure is able to use bounded-length arrays to represent a list with unbounded capacity. Let’s formalize this approach by defining a class to implement the CS2110List
interface with a dynamic array data structure.
DynamicArrayList
Class
State Representation
We’ll call this class the DynamicArrayList
. This is analogous to Java’s ArrayList class that also leverages a dynamic array. We’ll represent the state of our dynamic array list with two fields, the backing storage
array (with the generic array type T[]
) and the current size
of the list. The class invariant stipulates that the (non-null
) list entries occupy the first size
entries of storage
, and the remaining entries of storage
are null
.
Technically this class invariant on storage
makes the size
field redundant, since we can always compute the size by scanning over the array entries until we encounter null
. Doing this scanning is inefficient, and will result in a linear-time size()
implementation. Storing the field takes up negligible extra space and reduces this time complexity to \(O(1)\).
Let’s set up the DynamicArrayList
class with these fields and include a private assertInv()
that will enforce this class invariant as we are developing the rest of the class.
DynamicArrayList
|
|
|
|
Now that we're more comfortable working with fields and instance methods, we'll start to drop the use of this
from our sample code where it is unambiguous.
Constructor
Now, let’s define a constructor DynamicArrayList
that creates an empty list. We should initialize size = 0
, but how should we initialize storage
? We want to start off with an array that has enough capacity to accommodate some additions to the list, but not too much capacity that will remain unused. Let’s add a constant INITIAL_CAPACITY
to represent this value, which we’ll set to 10 (just as Java’s implementation does).
DynamicArrayList
|
|
|
|
Eww. This initialization of storage
looks atrocious! There has to be a better way, right? Nope, sorry. Working with generic arrays in Java is unfortunately pretty ugly. Support for generics was added relatively late in the development of the language, so some less-than-ideal compromises had to be made.
Next, we’ll work on defining the methods from the CS2110List
interface. As we develop these, we’ll run the corresponding test cases from our CS2110ListTest
test suite to check our progress.
Accessor Methods
Let’s start with the basic accessor methods in the class, as these are used in most of our test cases; we will need them to be defined before we can check the functionality of other methods.
|
|
|
|
insert()
and add()
Next, we’ll write the insert()
and add()
methods, since both of these encounter the possible need to resize the backing storage. We’ll extract this out into a private
helper method increaseCapacity()
that will allocate a new T[]
array with double the capacity, copy the entries from the old backing storage to the initial indices of this new array, and reassign the storage
field. We can achieve this with a single call to Java’s Arrays.copyOf()
method, though it’s a good exercise to develop your own alternate implementation using a loop.
DynamicArrayList.java
|
|
|
|
In the insert()
method, we should first check whether storage
is full by comparing size
to its capacity. If it is full, we should call increaseCapacity()
. Next, we will need to shift later elements when we insert at an earlier position in the array, which we can do using the System.arrayCopy()
method. Then, we can store elem
at the now-vacated index
. Since this is a mutating method, we should end the method with a call to assertInv()
to check that the class invariant has been maintained.
DynamicArrayList.java
|
|
|
|
When we re-run our test cases, we see that some fail with an AssertionError
in the assertInv()
method, particularly on the line where we assert that the elements at the end of the storage
array are null
. What has gone wrong? Have we re-established the invariant in the insert()
method? Take a careful look at the code and then check your answer below.
What is the mistake in the above code?
The add()
method is just the special case of insert()
with index = size
. Thus, we can call the insert()
method from within add()
to avoid duplicating code.
DynamicArrayList.java
|
|
|
|
set()
The set()
method is another more straightforward modifying method that consists of some precondition checking and one array entry reassignment.
DynamicArrayList.java
|
|
|
|
remove()
The remove()
method is similar to the insert()
method, in that may require the shifting of a range of elements, now to “plug up” the hole that is left when we remove an array element. We must also make sure to reassign the now-unused array entry to null
to restore the class invariant.
DynamicArrayList.java
|
|
|
|
contains()
, indexOf()
, and delete()
The three remaining methods all require us to locate a particular element (by value) within the list. We can extract this common subroutine into a private helper method find()
that performs a linear search (using the equals()
method to test for object equality rather than ==
). Once we have done this, we can use the CS2110List
method specifications to complete the definitions of these methods.
DynamicArrayList.java
|
|
|
|
If we re-run our test cases, we should see that they all pass. We’ve finished our definition of the DynamicArrayList
, providing the client an implementation of the CS2110List
ADT. It remains to reason about the performance of this implementation.
Complexity of DynamicArrayList
Just as we did for arrays, we’ll let the variable \(N\) denote the size of our list (i.e., the number of elements it stores, not the capacity of its backing storage). Before we analyze the time complexity of the DynamicArrayList
methods, let’s consider its memory usage.
Space Complexity
Overall, how much space will a DynamicArrayList
of \(N\) elements take up (not including the space taken up by the element objects that it stores)? The size
field takes up a constant amount of space, and each element that is stored uses constant amount of space, for its reference in the backing storage array. In addition, the empty cells at the end of the backing array take up memory. After the INITIAL_CAPACITY = 10
(i.e., \(O(1)\)) empty cells at construction, our resizing strategy always doubles the array capacity when the array becomes full. Therefore, the number of empty cells will not exceed the number of filled cells, meaning the empty cells contribute \(O(N)\) to the memory usage of a DynamicArrayList
, for an overall \(O(N)\) size.
This analysis does not account for the effect of removals. Currently, our DynamicArrayList
does not resize down when too many elements are removed, which can cause the empty space to occupy much more than half of the backing storage array. To truly achieve an \(O(N)\) space guarantee, we'd need to refine our resizing logic. See Exercise 12.6 for more details.
Most of the DynamicArrayList
methods have an \(O(1)\) space complexity, only allocating a constant number of local variables. The one exception is increaseCapacity()
, whose Arrays.copyOf()
call allocates a temporary \(O(N)\) length array during the copying.
Time Complexity
Let’s analyze the worst-case time complexities for the DynamicArrayList
methods using the accounting strategies that we discussed a few lectures ago. Since the method definitions are relatively short, we summarize these analyses below. Note that we do not factor the runtime of any assertInv()
calls into our analysis. assert
statements are a development tool and are turned off (or omitted) in the final code that is shipped to clients.
size()
: \(O(1)\), consisting of a single memory access.
get()
: \(O(1)\), consisting of a single memory access.
increaseCapacity()
: \(O(N)\), since Arrays.copyOf()
iterates over the entries to copy them to the new array.
insert()
: \(O(N)\), in the case that the array was full, the runtime is dominated by the call to increaseCapacity()
. Even when a resize is not needed, we need to shift \(N - i = O(N)\) elements to make space for the new element.
add()
: \(O(N)\), in the case that the array was full, the runtime is dominated by the call to increaseCapacity()
.
set()
: \(O(1)\), consists of a single array access and reassignment.
remove()
: \(O(N)\), in the worst-case, the first element is removed and the other \(N-1 = O(N)\) elements must all be shifted over to fill this space.
find()
: \(O(N)\), since we are performing a linear search over the first \(N\) elements of the storage
array.
contains()
: \(O(N)\), dominated by find()
.
indexOf()
: \(O(N)\), dominated by find()
.
delete()
: \(O(N)\), dominated by find()
, and the potential \(O(N)\) element shift when an early element is deleted.
Amortized Analysis
We just stated that the worst-case runtime complexity of the add()
method is \(O(N)\); however, this does not provide a very good summary of its “typical” performance. When the backing array is not full, an \(O(N)\) resize is not needed. Moreover, since the new element is inserted at the end of the array, no shifts are needed and the add()
call executes in \(O(1)\) time. Almost always, add()
will have this \(O(1)\) runtime, since resizes are infrequent (and become exponentially more infrequent as the size of the list grows). A sketch of the runtime, visualized as a histogram, is shown below.
The optimal worst-case bound for this runtime complexity is linear, since the heights of the tall “resizing” bars grows as a linear function in \(N\). However, the very small runtimes of all the other bars can “average out” these infrequent “blips” to give a more reasonable notion of runtime complexity across multiple calls to add()
. This is the idea behind an amortized analysis.
In an amortized worst-case time complexity analysis of a method, we compute the total worst-case time complexity of a sequence of method calls and divide this by the number of method calls, giving a notion of a "long-term average" runtime of the method.
Let’s consider the total work performed over the first \(N\) calls to add()
(on lists of sizes \(0,1,\dots,N-1\)) for increasing values of \(N\).
- When \(N < 10\) each call requires only “1 unit” (i.e., \(O(1)\)) work, so we perform a total of \(N\) units of work over these \(N\) calls for an amortized 1 unit of work per
add()
call. - When \(N = 11\), the first 10
add()
calls require 1 unit of work, and the finaladd()
call requires 11 (i.e., \(O(N)\)) units of work. This gives a total of 21 units of work over 11add()
calls which amortizes to \(<\) 2 units of work peradd()
call. - When \(12 < N < 20\), then \(N-1\) of the
add()
calls require 1 unit of work, and the 11’th call requires 11 units of work, for a total of \(N + 10\) units of work over \(N\) calls. This amortizes to \( \frac{N+10}{N} < 2 \) units of work peradd()
call. - When \(N = 21\), then 19 of the
add()
calls require 1 unit of work, the 11’th call requires 11 units of work and the 21’st call requires 21 units of work. This gives a total of 51 units of work over 21add()
calls which amortizes to \(<\) 3 units of work peradd()
call.
Continuing with this analysis, we’ll find that we’ll never perform more than 3 = \(O(1)\) units of amortized work per add()
call, meaning the amortized worst-case time complexity of the add()
method is \(O(1)\).
We can observe this visually by “rearranging” the bar heights in our runtime plot. Note that we ultimately care about the average height of a bar, and this average height does not change when we move height from one bar to another. Specifically, we can “topple” all of the taller \(O(N)\) bars to their left, leaving one unit of their work in place and moving 2 units of work on each bar proceeding to its left (until all units have been accounted for). The tall bars are spaced out in such a way that no bar will end up with more than 3 units of work after this “toppling”, giving the same amortized \(O(1)\) complexity.
Here, our choice to double the capacity of the backing storage array (or more generally, increase it by a multiplicative factor) during each resize was critical to achieve this \(O(1)\) amortized complexity, as it ensured that the "tall bars" were spaced out enough for the averaging to converge to a constant. An alternate resizing strategy, such as increasing the array capacity by a constant amount, would be insufficient. See Exercise 12.9 for more details.
An amortized time complexity gives us a different picture about the runtime of a method which may be more or less useful in certain circumstances. When we want to understand how we can expect that a method will perform over many invocations (e.g., understanding the performance of a data structure across many insertions and removals), then amortized complexity may be most appropriate. In critical applications (such as flight software or medical equipment), we may be more concerned with the absolute worst-case performance of a single method call. If a resize of a very large data logging array could take place during a critical instant in a rocket’s trajectory, we’d want to be aware of this possibility. In this case, a standard worst-case performance guarantee may be more appropriate.
Main Takeaways:
- An abstract data type (ADT) describes a set of operations that we can perform on a collection of data. We model ADTs using interfaces in Java. One example of an ADT is a list, which is a linearly ordered collection of data whose elements are accessible via their indices.
- A data structure is an implementation of an ADT using a particular state representation. One ADT may be realizable with multiple different data structures, each with its own performance characteristics.
- Generic type parameters enable the parameterization of a class or method on an unknown type that is supplied by the client. Java supports generic types using angle bracket (
<>
) syntax. - A dynamic array is a data structure that uses arrays for the backing storage of its elements, reallocating larger arrays and copying over the data when the backing storage becomes full. The list ADT can be implemented using a dynamic array.
- In amortized time-complexity analysis we report the average runtime of a method taken over a sequence of calls. This often provides a more meaningful summary of a method's "typical" performance than its worst-case time complexity.
Exercises
Consider the following generic class.
|
|
|
|
Cat
, Dog
, and Animal
are types with the following subtype relationships: Cat <: Animal
, and Dog <: Animal
. A client attempts to use this class as follows:
|
|
|
|
You are reviewing a colleague’s code printed out on paper (for some reason). You identify a variable c
of type Collection
, but because the printer was low on ink. You can’t read the generic type parameter E
that c
was declared with. Later, you see the following statement:
|
|
|
|
c
’s illegible parametric type E
?Consider the following method:
|
|
|
|
removeAll()
?Pair
s
Pair
that holds two fields. This can be used to model a variety of things such as coordinates on the 2D plane or an alternative to the Book
record class from Assignment A4.
Implement a class called SamePair
that is generic on T
. Both fields must have the same type. Define methods first()
and second()
to get the value of these fields. Defined methods setFirst()
and setSecond()
to set the values of each element respectively.
|
|
|
|
Implement a class called Pair
that is generic on two type parameters, U
and V
, where U
is the type of the first field and V
is the type of the second field. Implement the same four methods as in part a.
|
|
|
|
Bag
ADT
Bag
or a Multiset
. A Bag
is a collection of unordered items that can contain duplicates. We can model a Bag
ADT with the following interface:
|
|
|
|
DynamicArrayBag
that implements the Bag
ADT using a dynamically sized array.
Bag
and List
ADT is the enforcement of an order. Analyze the asymptotic worst-case time complexity of each method in DynamicArrayBag
. Does non-order improve the efficiency of operations? If so, how?
CS2110List
CS2110List
interface. Add method definitions in DynamicArrayList
to override each of the following methods. State the worst-case runtime complexity for each.
Note that NoSuchElementException <: RuntimeException
.
|
|
|
|
|
|
|
|
|
|
|
|
SortedList
CS2110List
ADT is with a sorted list data structure. This data structure enforces a sorted order invariant on the elements in the array.
|
|
|
|
Implement the class by overriding all methods defined in CS2110List
, including the ones added in Exercise 12.4. State the worst-case time complexity of each method. The frequencyOf()
method should run in \(O(\log n)\) time. As a hint, view Exercise 5.5 in Lecture 5.
In your implementation, you may suppose you have access to this method that determines the order of two objects of type T
. This is similar to the Comparator
interface that will be introduced in a later lecture.
|
|
|
|
Integer
s as a field for a class. In what scenarios would you choose to use a DynamicArrayList
over SortedList
and vice versa?
size
and capacity
satisfy to justify shrinking? Keep in mind that we want to maintain the same (amortized) runtime complexities for all the methods in DynamicArrayList
after adding this shrinking behavior.
delete()
and remove()
method so that the backing array is properly shrunk.
CS2110List
CS2110List
implementations (that conforms to its specifications). For both DynamicArrayList
and SortedList
(defined in Exercise 12.5), state the worst-case time complexity for each method, considering the case where the input parameter could have either dynamic type.
|
|
|
|
|
|
|
|
Again, assume you have access to the compare()
method defined in Exercise 12.5.a.
|
|
|
|
<T extends Account>
, which means that any generic type T
must satisfy T <: Account
. We can substitute Account
with any class or interface. Note that with interfaces, we still use extends
.
Recall the Point
record class from the previous lecture. Let's make this record class generic on T extends Number
. Number
is an abstract
class that is a supertype of wrapper classes, such as Integer
and Double
.
|
|
|
|
Implement the method distanceTo()
. Sift through the Number
API to find an appropriate method to use.
|
|
|
|
double
.
DynamicArrayList
increases the size of the backing array by \( 10 \) each time instead of doubling.
DynamicArrayList
each iteration. With an initial capacity of \( 10 \), after how many iterations must the backing array resize?
add()
in this scenario?
insertLeft()
DynamicArrayList
called insertLeft()
, defined below, that prepends an element into the list. We'll analyze the asymptotic runtime complexity of this method.
|
|
|
|