14. Iterating over Data Structures

In our introductions of loop invariants and complexity, we considered a frequencyOf() method that determined the number of occurrences of a particular element in an array. We wrote a loop that made a single pass over the array’s elements to compute this frequency. Using our newer concept of generic types, we can adapt this code to work for an array storing any type of elements.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Returns the number of occurrences of `key` among the elements in `a`.
 */
static <T> int frequencyOf(T key, T[] a) {
  int i = 0; // next index of `a` to check
  int count = 0; 
  /* Loop invariant: `count` = number of occurrences of `key` in `a[..i)` */
  while (i < a.length) {
    if (a[i].equals(key)) {
        count++;
    }
    i++;
  }
  return count;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Returns the number of occurrences of `key` among the elements in `a`.
 */
static <T> int frequencyOf(T key, T[] a) {
  int i = 0; // next index of `a` to check
  int count = 0; 
  /* Loop invariant: `count` = number of occurrences of `key` in `a[..i)` */
  while (i < a.length) {
    if (a[i].equals(key)) {
        count++;
    }
    i++;
  }
  return count;
}

If we use \(N\) to denote the length of a, then this code will require \(O(N)\) time since each iteration of the loop runs in \(O(1)\) time (if equals() is an \(O(1)\) computation, which is a reasonable assumption since the time required to compare elements should not depend on the length of the array in which they are stored).

We can adapt this method to compute an element’s frequency within a list (in particular, a CS2110List). To do this, we’ll need to switch out array-specific operations (the length field and element access with the square-brackets syntax) for their list-analogues. Give this a try.

frequencyOf definition for CS2110Lists

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int i = 0; // next index of `list` to check
  int count = 0; 
  /* Loop invariant: `count` = number of occurrences of `key` among the first 
   * `i` elements of `list`. */
  while (i < list.size()) {
    if (list.get(i).equals(key)) {
        count++;
    }
    i++;
  }
  return count;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int i = 0; // next index of `list` to check
  int count = 0; 
  /* Loop invariant: `count` = number of occurrences of `key` among the first 
   * `i` elements of `list`. */
  while (i < list.size()) {
    if (list.get(i).equals(key)) {
        count++;
    }
    i++;
  }
  return count;
}

The runtime (i.e., worst-case time complexity) of this frequencyOf() method will depend on the dynamic type of (the object referenced by) the list parameter. If this list is a DynamicArrayList, then both the size() and get() methods run in \(O(1)\) time (due to the random access guarantee of the backing storage array), so we retain the \(O(N)\) runtime. What about for a SinglyLinkedList? Last lecture, we saw that the get() method for linked-lists runs in worst-case \(O(N)\) time (or more specifically, get(i) runs in \(O(i)\) time) because we must traverse the linked chain to reach the desired element. Summing this over the \(N\) iterations of the loop gives an \(O(N^2)\) runtime.

This is not good. Doing a traversal over the elements of a list is a fairly common operation, and an \(O(N^2)\) runtime becomes prohibitively expensive even for moderate list sizes. The poor performance stems from our use of get() within the while loop. We must “restart the traversal” from the beginning in each loop iteration even though, in the previous iteration, we were at the position just before where we need to be. To improve the performance of traversals over linked lists (and other linked structures), we’d like a way to “save the state” of our traversal so we can pick up where we left off. This functionality would allow us to traverse each link in the SinglyLinkedList only once during the frequencyOf() calculation, so it would reduce its runtime to the desired \(O(N)\). Iterators are objects that support such traversals.

`Iterable`s and `Iterator`s

Definition: Iterator

An iterator is a type that enables the iteration over (i.e., traversal of) a data structure.

Java models iterators using the Iterator interface that we will discuss more shortly. Notably, Java extracts the functionality of iterating over a data structure into a separate object from the data structure itself. Iterators are “single-use” objects; they guarantee to “visit” each object within the data structure exactly once during their lifetime.

We need a way to connect these Iterator objects with the data structure over which they are iterating. Java defines a separate interface for this purpose. A data structure implements the Iterable interface to support the creation of iterators over its data.

Remark:

The distinction between Iterable and Iterator is confusing to many students. Data structures are Iterable since they can be iterated over. A separate Iterator object actually performs the iteration. Over the course of its lifetime, a single Iterable object will likely produce many different Iterator objects, one for each iteration that is performed.

This Iterable interface includes one (non-default) method, iterator() that returns an Iterator.

Remark:

default methods are something that we will not dive too deeply into in CS 2110, but they provide a way to include a "base" definition of an interface method. Implementing classes do not need to include their own definitions of default methods. For this reason, we will typically focus only on the non-default methods of the interfaces that we consider.

Making Classes `Iterable`

Let’s add support for iterators to our CS2110List classes. We want to guarantee that any class implementing CS2110List will also implement Iterable and provide an iterator() method. To do this, we can make Iterable a super-interface of CS2110List using the extends keyword:

CS2110List.java

1

public interface CS2110List<T> extends Iterable<T> { ... }

1

public interface CS2110List<T> extends Iterable<T> { ... }

The rest of the interface definition remains unchanged. By extending Iterable, the CS2110List interface adds the iterator() method to its list of guaranteed behaviors. Now, any class implementing CS2110List (such as our DynamicArrayList and SinglyLinkedList classes) will need to provide an iterator() definition. The iterator() method returns the interface type Iterator, which cannot be instantiated directly. Instead we will need to write classes that implement the Iterator interface and then return instances of these classes. Since these classes only make sense in the context of their Iterable and may need to access its internal state to carry out the iteration, it will make sense to define these Iterator classes as inner classes (i.e., non-static nested classes). Let’s add stubs for these classes now and fill in their definitions later.

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class DynamicArrayList<T> implements CS2110List<T> {
  // ... other fields and method declarations 
  
  @Override
  public Iterator<T> iterator() {
    return new DynamicArrayListIterator();
  }

  private class DynamicArrayListIterator implements Iterator<T> { ... }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class DynamicArrayList<T> implements CS2110List<T> {
  // ... other fields and method declarations 
  
  @Override
  public Iterator<T> iterator() {
    return new DynamicArrayListIterator();
  }

  private class DynamicArrayListIterator implements Iterator<T> { ... }
}

SinglyLinkedList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class SinglyLinkedList<T> implements CS2110List<T> {
  // ... other fields and method declarations 
  
  @Override
  public Iterator<T> iterator() {
      return new SinglyLinkedListIterator();
  }

  private class SinglyLinkedListIterator implements Iterator<T> { ... }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class SinglyLinkedList<T> implements CS2110List<T> {
  // ... other fields and method declarations 
  
  @Override
  public Iterator<T> iterator() {
      return new SinglyLinkedListIterator();
  }

  private class SinglyLinkedListIterator implements Iterator<T> { ... }
}

Remark:

We do not need to parameterize these inner Iterator classes on a generic type, since they can use the generic type parameter T of their outer class.

Defining `Iterator` Classes

An Iterator is an object that guarantees to “visit” each element of its Iterable exactly once over the course of its lifetime. To achieve this, an iterator must maintain an internal state that allows it to track which elements it has visited and which elements it still needs to visit.

Remark:

The guarantee to visit each element exactly once becomes unclear when the collection is modified during the lifetime of the iterator. If an unvisited element is deleted from the collection before it is visited, we'll be out of luck. If an already-visited element is moved to a later position in the structure that hasn't been reached yet by the iterator, should it be re-visited? To avoid these considerations, the iterator's guarantee only applies as long as no modifications have been made to the data structure during its lifetime; this is an implicit pre-condition on an iterator's methods (see Section 14.2.2 for further discussion of this).

From the documentation, we see that a class implementing the Iterator interface must define two methods.

The hasNext() method returns a boolean to signal whether there are elements of the data structure that it has not yet “visited”.
The next() method returns an element that has not yet been “visited” (i.e., an element that was not returned by a previous call to next() on this iterator object).

While the Iterator interface does not specify anything about the order of the iteration (this may be different for different Iterable data structures), we can add this guarantee by refining its specifications. In the case of lists, it makes sense (both intuitively and practically) for the iteration to happen in increasing index order.

Let’s start by defining the DynamicArrayListIterator. What state will it need to keep track of to manage its iteration? An index will be sufficient, as the random access guarantee will allow us to efficiently access the next element given this index. Therefore, we’ll add an index field to this class that represents the index of the next element that will be returned (explicitly documenting this will help avoid off-by-one errors; are we storing the next element we will visit or the most recent element that we visited?). We should initialize index = 0 within the DynamicArrayListIterator constructor; the first element we will visit is at index 0.

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


private class DynamicArrayListIterator implements Iterator<T> {
  /** The index of the next element that will be returned by `next()`. */
  private int index;

  /** Constructs a new iterator object beginning at the start of this list. */
  public DynamicArrayListIterator() {
    index = 0;
  }

  // ... Iterator methods
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


private class DynamicArrayListIterator implements Iterator<T> {
  /** The index of the next element that will be returned by `next()`. */
  private int index;

  /** Constructs a new iterator object beginning at the start of this list. */
  public DynamicArrayListIterator() {
    index = 0;
  }

  // ... Iterator methods
}

Take some time to complete the definitions of the hasNext() and next() methods according to their specifications.

complete DynamicArrayListIterator definition

DynamicArrayList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


private class DynamicArrayListIterator implements Iterator<T> {
  /** The index of the next element that will be returned by `next()`. */
  private int index;

  /** Constructs a new iterator object beginning at the start of this list. */
  public DynamicArrayListIterator() {
    index = 0;
  }

  @Override
  public boolean hasNext() {
    return index < size;
  }

  @Override
  public T next() {
    T elem = storage[index];
    index++;
    return elem;
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


private class DynamicArrayListIterator implements Iterator<T> {
  /** The index of the next element that will be returned by `next()`. */
  private int index;

  /** Constructs a new iterator object beginning at the start of this list. */
  public DynamicArrayListIterator() {
    index = 0;
  }

  @Override
  public boolean hasNext() {
    return index < size;
  }

  @Override
  public T next() {
    T elem = storage[index];
    index++;
    return elem;
  }
}

In hasNext(), we compare index to size. Once index == size, we know that we have visited all of the elements in this list. In next(), we must temporarily store the return value while we increment index, updating the iterator's state for the subsequent call to next(). It is a common bug to forget this update within the next() call, which causes the iterator to violate its class invariant. We can achieve this same behavior using the postfix behavior of the increment operator, writing "return storage[index++];", but we usually discourage using increments as expressions because of their unintuitive semantics.

Now, let’s turn our attention to the SinglyLinkedListIterator. What state should it keep track of? A first thought is to again keep track of the index, paralleling the DynamicArrayListIterator. However, as we noted before, accessing an element by index is a worst-case \(O(N)\) operation in a linked list. This iterator would not provide any performance advantage to the client. Instead, let’s allow the iterator to “save its place” by storing a reference to its current position in the linked chain. We’ll define a Node field current that references the next node that will be returned by the iterator.

SinglyLinkedList.java

1
2
3
4
5
6


private class SinglyLinkedListIterator implements Iterator<T> {
  /** The Node containing the next element that will be returned by `next()`. */
  private Node<T> current;

  // .. constructor and Iterator methods
}

1
2
3
4
5
6


private class SinglyLinkedListIterator implements Iterator<T> {
  /** The Node containing the next element that will be returned by `next()`. */
  private Node<T> current;

  // .. constructor and Iterator methods
}

Take some time to finish defining SinglyLinkedListIterator by writing its constructor, hasNext() and next() methods.

complete SinglyLinkedListIterator definition

SinglyLinkedList.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


private class SinglyLinkedListIterator implements Iterator<T> {
  /** The Node containing the next element that will be returned by `next()`. */
  private Node<T> current;

  /** Constructs a new iterator object beginning at the start of this list. */
  public SinglyLinkedListIterator() {
    current = head;
  }

  @Override
  public boolean hasNext() {
    return current != tail;
  }

  @Override
  public T next() {
    T elem = current.data;
    current = current.next;
    return elem;
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


private class SinglyLinkedListIterator implements Iterator<T> {
  /** The Node containing the next element that will be returned by `next()`. */
  private Node<T> current;

  /** Constructs a new iterator object beginning at the start of this list. */
  public SinglyLinkedListIterator() {
    current = head;
  }

  @Override
  public boolean hasNext() {
    return current != tail;
  }

  @Override
  public T next() {
    T elem = current.data;
    current = current.next;
    return elem;
  }
}

The constructor initializes current = head (the first node in the list). We know that there are more nodes to visit as long as current != tail (since tail signifies the end of the list). In the next() method, we store the element (current.data) to return, advance current by reassigning the field to current.next, and then return the element.

The release code includes additional test cases that verify the functionality of these iterators.

Using `Iterator`s as a Client

Now that we have added support for Iterators to our CS2110List classes, let’s see how we can use iterators as a client. Iterators define three methods that are natural analogues of the three components in a loop.

The Iterator’s constructor sets up its state at the start of its iteration. This is analogous to the initialization of the loop variable(s).
The Iterator’s hasNext() method determines whether there are more elements that have not yet been visited. This is analogous to the loop guard, which determines whether an additional loop iteration is warranted.
The Iterator’s next() method has the dual responsibility of producing the next element of the data structure and advancing the state of the iterator. This is analogous to the work performed by the loop body.

We can leverage Iterators in our code by using loops with the following structures (where T is replaced with the type parameter value that is known by the client and ds references any Iterable data structure).

`while`-loop:

1
2
3
4
5


Iterator<T> it = ds.iterator();
while (it.hasNext()) {
  T elem = it.next();
  // ... do something with elem
}

1
2
3
4
5


Iterator<T> it = ds.iterator();
while (it.hasNext()) {
  T elem = it.next();
  // ... do something with elem
}

`for`-loop:

1
2
3
4


for (Iterator<T> it = ds.iterator(); it.hasNext(); ) {
  T elem = it.next();
  // ... do something with elem
}

1
2
3
4


for (Iterator<T> it = ds.iterator(); it.hasNext(); ) {
  T elem = it.next();
  // ... do something with elem
}

We can use this structure to give a new definition of our frequencyOf() method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int count = 0; 
  Iterator<T> it = list.iterator();
  /* Loop invariant: `count` = number of occurrences of `key` among the 
   * elements that have been returned by `it`. */
  while (it.hasNext()) {
    if (it.next().equals(key)) {
        count++;
    }
  }
  return count;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int count = 0; 
  Iterator<T> it = list.iterator();
  /* Loop invariant: `count` = number of occurrences of `key` among the 
   * elements that have been returned by `it`. */
  while (it.hasNext()) {
    if (it.next().equals(key)) {
        count++;
    }
  }
  return count;
}

Since both of our iterators have hasNext() and next() methods that run in \(O(1)\) time, and since the number of iterations of the loop remains \(N\), this modified frequencyOf() implementation has a worst-case \(O(N)\) runtime for both the DynamicArrayList and the SinglyLinkedList!

Enhanced `for`-Loops

Since looping over data-structures using Iterators is ubiquitous and always involves the loop structure that we introduced above, Java provides a special shorthand syntax known as an enhanced for-loop.

Definition: Enhanced for-loop

An enhanced for-loop uses special syntax to describe an iteration over the elements of an Iterable data structure.

The syntax for an enhanced for-loop is

1

for (T ⟨variable name⟩ : ⟨expression with Iterable<T> type⟩) { ... }

1

for (T ⟨variable name⟩ : ⟨expression with Iterable<T> type⟩) { ... }

For example, we can use an enhanced for-loop to, once again, rewrite our frequencyOf() method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int count = 0; 
  for (T elem: list) {
    if (elem.equals(key)) {
        count++;
    }
  }
  return count;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/** 
 * Returns the number of occurrences of `key` among the elements in `list`.
 */
static <T> int frequencyOf(T key, CS2110List<T> list) {
  int count = 0; 
  for (T elem: list) {
    if (elem.equals(key)) {
        count++;
    }
  }
  return count;
}

We read the enhanced for-loop’s declaration, “for each T elem in list,” which is reminiscent of for-loops in Python. For this reason, some people refer to enhanced for-loops as “for-each” loops. The above code has the same behavior as our implementation using an explicit iterator. Enhanced for-loops help make code readable and decrease the burden on the programmer by hiding the iterator “behind the scenes”. Therefore, they are often preferable when iterating over data structures.

Remark:

Let's pause here to appreciate the elegance of enhanced for-loops. By extracting the common pattern of iteration into two interfaces (Iterable and Iterator), Java was able to introduce a new helpful syntax feature to its users. This is a great demonstration of the power of interfaces. We'll see another one later in the lecture.

In some cases, we may need to call hasNext() within the body of the loop, so we must fall back on loops that explicitly manage an Iterator. For example, consider the following method to print the contents of a list, which uses hasNext() to identify when a comma is necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Prints the contents of `list`, separated by commas and enclosed in 
 * square brackets. 
 */
static <T> void print(CS2110List<T> list) {
  Iterator<T> it = list.iterator();
  System.out.print("[");
  while (it.hasNext()) {
    System.out.print(it.next());
    if (it.hasNext()) {
      System.out.print(",");
    }
  }
  System.out.print("]");
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


/** 
 * Prints the contents of `list`, separated by commas and enclosed in 
 * square brackets. 
 */
static <T> void print(CS2110List<T> list) {
  Iterator<T> it = list.iterator();
  System.out.print("[");
  while (it.hasNext()) {
    System.out.print(it.next());
    if (it.hasNext()) {
      System.out.print(",");
    }
  }
  System.out.print("]");
}

Concurrent Modification

Suppose we wish to define a method deleteNegativeEntries() that deletes all of the negative entries from a list of Integers. We might consider doing this with an enhanced for-loop, writing something like:

1
2
3
4
5
6
7
8


/** Removes all negative entries from `list`. */
static void deleteNegativeEntries(CS2110List<Integer> list) { 
  for (int i : list) { // auto-unboxing
    if (i < 0) {
      list.delete(i);
    }
  }
}

1
2
3
4
5
6
7
8


/** Removes all negative entries from `list`. */
static void deleteNegativeEntries(CS2110List<Integer> list) { 
  for (int i : list) { // auto-unboxing
    if (i < 0) {
      list.delete(i);
    }
  }
}

If we execute the following client code, we’d expect it to print “[1,4]”.

1
2
3
4
5
6
7


CS2110List<Integer> list = new DynamicArrayList<>();
list.add(1);
list.add(-2);
list.add(-3);
list.add(4);
deleteNegativeEntries(list);
print(list);

1
2
3
4
5
6
7


CS2110List<Integer> list = new DynamicArrayList<>();
list.add(1);
list.add(-2);
list.add(-3);
list.add(4);
deleteNegativeEntries(list);
print(list);

However, it actually prints “[1,-3,4]”. Why is this? The following animation traces through the execution of deleteNegativeEntries().

The discrepancy arose because of the call to list.delete() within the second loop iteration. This updated the contents of list, but did not allow the iterator to update its state to reflect this change. We violated the (implicit) pre-conditions on the iterator methods by doing this concurrent modification.

Definition: Concurrent Modification

A concurrent modification is a call to a mutating method of a data structure that takes place during the lifetime of an iterator.

Therefore, we reached a state of undefined behavior. To avoid this in your code, you should never modify the contents of a data structure within an enhanced for-loop.

Remark:

Some iterators make explicit what will happen in the case of concurrent modifications. For example, fail-fast iterators detect concurrent modification and throw exceptions from the iterator's methods when they are encountered. For more information, see Exercise 14.4.

A desire to remove elements from a data structure while iterating over it is reasonable and arises frequently. For this reason, many Java Iterators include a remove() method that allows the iterator to handle the removal in a way that will allow it to update its state and remain valid. For more information, see Exercise 14.5.

The Iteratee Pattern

Consider the following method, which takes in a list of Integers and adds 1 to each of these integers.

1
2
3
4
5
6


/** Adds 1 to each integer in `list`. */
public static void incrementAll(CS2110List<Integer> list) {
  for (int i = 0; i < list.size(); i++) {
    list.set(i, list.get(i) + 1);
  }
}

1
2
3
4
5
6


/** Adds 1 to each integer in `list`. */
public static void incrementAll(CS2110List<Integer> list) {
  for (int i = 0; i < list.size(); i++) {
    list.set(i, list.get(i) + 1);
  }
}

We are in a similar situation as before. If list references a DynamicArrayList object, then this method will run in \(O(N)\) time because of the array’s random access guarantee. However, if list references a SinglyLinkedList object, then this method will require \(O(N^2)\) time since calls to get() and set() require \(O(N)\) time each. Can we use an iterator to fix this?

Unfortunately, we cannot. Iterators grant the client access to the values stored in the data structure, but not the structure itself. Therefore, the client has no way to use the iterator to update the data field of the Node where the iterator is currently positioned. Instead, they must fall back on the set() method to perform this modification. set() must begin its traversal to the correct node at the head of the list since it has no knowledge of the position of the iterator. To address this, we’ll need a way for the client to specify actions that should be carried out within the data structure during the iteration. This is sometimes referred to as the iteratee pattern.

Definition: Iteratee Pattern

In the iteratee pattern, the client specifies a block of code that should be executed under the implementer's control during an iteration.

In our example, we want a way to express “please iterate over the entries of this list, and while you are visiting each one, add 1 to it.” In other words, we need a way to model a set of actions (i.e., lines of code) that we can pass off to be executed elsewhere. This ability is afforded by functional interfaces.

Functional Interfaces

Definition: Functional Interface

A functional interface is any interface that includes exactly one (non-static, non-default) method.

To implement a functional interface, one needs to specify exactly one method. By the compile-time reference rule, this method will be the only one available to a variable with the functional interface type. For this reason, you can think about a functional interface as a type that wraps one particular action (the action “invoke my only method”). For example, consider the following functional interface, Transformation (which has the same semantics as Java’s UnaryOperator interface).

1
2
3
4
5
6
7
8
9


/** 
 * Represents a function that takes one value of type `T` as its input and 
 * produces another value of type `T` as its output. 
 */
@FunctionalInterface
public interface Transformation<T> {
  /** Transforms the `T` value `x` to its corresponding output `T` value. */
  public T transform(T x);
}

1
2
3
4
5
6
7
8
9


/** 
 * Represents a function that takes one value of type `T` as its input and 
 * produces another value of type `T` as its output. 
 */
@FunctionalInterface
public interface Transformation<T> {
  /** Transforms the `T` value `x` to its corresponding output `T` value. */
  public T transform(T x);
}

We can model our increment transformation as a class implementing Transformation (note the auto-unboxing and auto-boxing happening within the body of its transform() method).

1
2
3
4
5
6
7


/** Wraps the "+1" function on Integers */
public class Increment implements Transformation<Integer> {
  @Override
  public Integer transform(Integer x) {
    return x + 1;
  }
}

1
2
3
4
5
6
7


/** Wraps the "+1" function on Integers */
public class Increment implements Transformation<Integer> {
  @Override
  public Integer transform(Integer x) {
    return x + 1;
  }
}

We can use a functional interface to realize the iteratee pattern. To do this, we’ll write a method whose parameter type is a functional interface and whose behavior is to invoke the interface method for each element during an iteration. For example, we can add a method transformAll() to our CS2110List interface, with the following specifications.

CS2110List.java

1
2
3
4


/**
 * Applies `f.transform()` to each element in this list.
 */
void transformAll(Transformation<T> f);

1
2
3
4


/**
 * Applies `f.transform()` to each element in this list.
 */
void transformAll(Transformation<T> f);

We can define this method in the DynamicArrayList and SinglyLinkedList classes as follows:

DynamicArrayList.java

1
2
3
4
5
6


@Override
public void transformAll(Transformation<T> f) {
  for (int i = 0; i < size; i++) {
    storage[i] = f.transform(storage[i]);
  }
}

1
2
3
4
5
6


@Override
public void transformAll(Transformation<T> f) {
  for (int i = 0; i < size; i++) {
    storage[i] = f.transform(storage[i]);
  }
}

SinglyLinkedList.java

1
2
3
4
5
6
7
8


@Override
public void transformAll(Transformation<T> f) {
  Node<T> current = head;
  while (current != tail) {
    current.data = f.transform(current.data);
    current = current.next;
  }
}

1
2
3
4
5
6
7
8


@Override
public void transformAll(Transformation<T> f) {
  Node<T> current = head;
  while (current != tail) {
    current.data = f.transform(current.data);
    current = current.next;
  }
}

The client can use transformAll() to realize the iteratee pattern. For example, if they call transformAll() on a list of Integers, passing an Increment object as the argument, it will have the effect of adding 1 to each of these integers. The client code,

1
2
3
4
5
6


DynamicArrayList<Integer> list = new DynamicArrayList<Integer>();
list.add(1);
list.add(2);
list.add(3);
list.transformAll(new Increment());
print(list);

1
2
3
4
5
6


DynamicArrayList<Integer> list = new DynamicArrayList<Integer>();
list.add(1);
list.add(2);
list.add(3);
list.transformAll(new Increment());
print(list);

prints “[2,3,4]”, as we’d expect. Moreover, since we have handled the iteration from within the list classes, where we could keep track of our progress, our code achieves the desired runtime. As long as transform() is an \(O(1)\) operation, transformAll() will run in \(O(N)\) time.

Remark:

We are just barely scratching the surface of functional interfaces in this lecture. Being able to pass around functions as arguments to other functions is very powerful and forms the basis for the functional programming paradigm (an alternative to object-oriented programming). Functional programming is a focus of CS 3110.

Remark:

Java's Iterable interface includes a default forEach() method that accepts an instance of the Consumer functional interface which models a void method accepting a parameter of the data structure's type. This also realizes the iteratee pattern, but won't allow, for example, modification of the underlying data structure.

Lambda Expressions

While functional interfaces and the iteratee pattern are powerful tools to have in your programming toolbox, the syntax in the previous section is a bit cumbersome for the client. For example, to use the iteratee pattern to add 1 to each value in our list, we needed to define a separate Increment class that wrapped this behavior. In all, this class included only a single line of relevant code “return x+1;”, so it would be nice if this (or something similar) was all that the client needed to write. Lambda expressions make this possible.

Definition: Lambda Expression

A lambda expression is a syntactic short-hand to represent an implementation of a functional interface.

The syntax of a lambda expression for a functional interface of the form

1
2
3
4
5
6


@FunctionalInterface
public interface ⟨Interface Name⟩ {
  public ⟨Return Type⟩ ⟨Method Name⟩(⟨Parameters⟩) {
    ⟨Method Body⟩
  }
}

1
2
3
4
5
6


@FunctionalInterface
public interface ⟨Interface Name⟩ {
  public ⟨Return Type⟩ ⟨Method Name⟩(⟨Parameters⟩) {
    ⟨Method Body⟩
  }
}

is “(⟨parameters⟩) -> {⟨Method Body⟩}”. So we can simplify our use of the Increment class in our client code to:

1
2
3


list.transformAll((Integer x) -> {
  return x + 1;
});

1
2
3


list.transformAll((Integer x) -> {
  return x + 1;
});

In fact, we can shorten this even more, removing the static type declaration of the parameter(s) and their surrounding parentheses.

1
2
3


list.transformAll(x -> {
  return x + 1;
});

1
2
3


list.transformAll(x -> {
  return x + 1;
});

In this case, since the method body consists of a single return statement, we can simplify the “right hand side” of the lambda expression to just the returned expression (and remove the curly brackets).

1

list.transformAll(x -> x + 1);

1

list.transformAll(x -> x + 1);

When we write a lambda expression like this, the Java compiler does work behind the scenes to infer its meaning. It works out:

The client is calling the transformAll() method, which takes in a generic Transformation<T> object as its argument.
The generic type T of the transformAll() parameter is shared with the CS2110List interface. In this case, list was declared with static type CS2110List<Integer>, so T must refer to Integer.
Transformation is a functional interface. By supplying a lambda expression, the client wants me to take care of the setup for them. Behind the scenes, I’ll declare a new class that implements Transformation.
Within this new class, the transform() method takes in one argument named x (the LHS of the lambda expression) of type T, which I now know is integer.
The RHS of the lambda expression is a single expression, which must be the return value, so I’ll add this to the transform() return statement. Since x is an Integer, x+1 will also be an Integer, which agrees with the method signature, since transform() is declared to return type T.
Since everything checked out, I can instantiate this new “behind the scenes” class and pass a reference to the transformAll() method to take care of the computation.

Remark:

Lambda expressions and functional interfaces are perhaps the best illustration of the power of interfaces. The work that we did as the implementer to declare a functional interface and leverage it in our CS2110List code, enables our client to express powerful computations using minimal syntax. Java takes care of a lot of the hard work in the background.

We’ve already seen lambda expressions once before in the course, when we use JUnit’s assertThrows() method. Here, the second argument has type Executable, a functional interface that allowed us to pass a block of code to JUnit that it could run “on its own terms” once it was ready to detect any thrown exceptions. As we proceed in the course, we will see other use cases for lambda expressions.

Main Takeaways:

An Iterator is a separate object that guarantees to produce each element of a data structure exactly once during its lifetime. Iterators are modeled with an interface in Java and include hasNext() and next() methods.
A data structure implements the Iterable interface when it can return an Iterator over its elements. Java's enhanced for-loops provide a convenient syntax for iterating over the elements of an Iterable.
For iterators to work correctly, you must not modify the contents of their underlying data structure during their lifetime (a concurrent modification). This means you should never modify a data structure within an enhanced for-loop over its elements.
In the iteratee pattern, a client passes code to a data structure that is executed on each element during an iteration. It can improve the performance of modifying methods that cannot be realized using iterators from the client side.
Functional interfaces wrap a block of code in their single declared method. Lambda expressions provide shorthand syntax for instantiating functional interfaces.

Exercises

Exercise 14.1: Check Your Understanding

(a)

Which of the following are benefits of making a data structure Iterable?

Check Answer

(b)

Consider the following Java code snippet involving two types S and T:

1
2
3
4
5


S s = new S();
// Code to update the state of s
for (T t : s) {
  // Loop body
}

1
2
3
4
5


S s = new S();
// Code to update the state of s
for (T t : s) {
  // Loop body
}

Which of the following must be true for this code to compile?

Check Answer

(c)

A website admin wants to track which pages on the site are the most popular. For each webpage, they record which user visits that page by their user ID. The data is stored in the following variable:

1

CS2110List<Bag<Integer>> pageVisits;

1

CS2110List<Bag<Integer>> pageVisits;

To analyze the data, the admin starts by writing the following loop:

1
2
3


for (T x: pageVisits) {
  // Do something with `x`
}

1
2
3


for (T x: pageVisits) {
  // Do something with `x`
}

What should be written for x’s type T?

Check Answer

(d)

Suppose we define the following interface and a class implementing this interface.

1
2
3
4
5
6
7
8


/** Represents an invertible transformation. */
public interface InvertibleTransformation<T> extends Transformation<T> {
  /** Reverses the transformation. */
  T inverse(T y);
}

/** Represents a scaling factor transformation. */
public class ScalingTransformation<T extends Number> implements InvertibleTransformation<T> { ... }

1
2
3
4
5
6
7
8


/** Represents an invertible transformation. */
public interface InvertibleTransformation<T> extends Transformation<T> {
  /** Reverses the transformation. */
  T inverse(T y);
}

/** Represents a scaling factor transformation. */
public class ScalingTransformation<T extends Number> implements InvertibleTransformation<T> { ... }

Which of the following could be passed as an argument to a method apply(InvertibleTransformation<T> t)?

Check Answer

Exercise 14.2: Iterator for DoublyLinkedList

Now that CS2110List<T> extends Iterable<T>, we'll need to add an iterator() method in DoublyLinkedList<T> (see Exercise 13.3). We'll begin by writing an iterator that traverses the doubly-linked list from head to tail (both exclusive). Following the pattern established by lecture, we'll define a nested class to aid our implementation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class DoublyLinkedList<T> implements CS2110List<T> {
  // fields, constructors, and methods

  @Override
  public Iterator<T> iterator() {
    return new ForwardDoublyLinkedListIterator();
  }

  private class ForwardDoublyLinkedListIterator implements Iterator<T> { ... }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


public class DoublyLinkedList<T> implements CS2110List<T> {
  // fields, constructors, and methods

  @Override
  public Iterator<T> iterator() {
    return new ForwardDoublyLinkedListIterator();
  }

  private class ForwardDoublyLinkedListIterator implements Iterator<T> { ... }
}

(a)

Within ForwardDoublyLinkedListIterator, define fields and a constructor to initialize them. Make sure to have well-defined class invariants.

(b)

Implement the hasNext() and next() methods.

(c)

Define and implement a new iterator, BackwardDoublyLinkedListIterator that traverses the linked list from tail to head.

Exercise 14.3: Iterator for CircularLinkedList

We'll also develop an iterator for CircularLinkedList (see Exercise 13.4). This iterator, CircularLinkedListIterator should iterate from the head (exclusive) to the node whose next pointer is the head.

(a)

Add a private inner class called CircularLinkedListIterator to CircularLinkedList. Define fields, implement a constructor, override hasNext() and next(), and write specifications.

(b)

Override iterator() in CircularLinkedList.

Exercise 14.4: Fail-fast Iterators

Recall that an Iterator guarantees to visit each element of a collection exactly once during its lifetime. Unfortunately, it’s not uncommon for clients to accidentally violate this part of the contract, and the possibility of arbitrary undefined behavior is not very friendly. Many Iterator implementations promote this to exceptional behavior, giving rise to the concept of a fail-fast iterator.

(a)

Suppose we want to make the DynamicArrayListIterator fail-fast. How can you keep track of whether the list has been modified since the start of the iteration? Consider adding a variable that tracks how many times the list has been modified.

(b)

Add and edit the fields and methods of DynamicArrayList and its iterator to make it fail-fast (and update its specifications accordingly). If any method in the iterator is called after a modification, the method should throw ConcurrentModificationException.

(c)

Add unit tests that verify the fail fast behavior of the DynamicArrayList iterators.

(d)

Make SinglyLinkedListIterator fail-fast, following similar steps as above.

(e)

Make DoublyLinkedListIterator and CircularLinkedListIterator fail-fast.

Exercise 14.5: Iterator’s remove() Method

So what if we want to remove items in a collection while we're iterating through it? Iterator provides this functionality through its remove() method. Note that this method is default, so implementations of Iterator need not override the method. The default implementation throws an exception and performs no other action.

(a)

Study the specification of remove() in the Iterator API. Take particular note when IllegalStateException is thrown. Override remove() in DynamicArrayListIterator. You may need to add an additional field to the DynamicArrayListIterator class.

(b)

Override remove() in SinglyLinkedListIterator.

(c)

Add unit tests to verify that your remove() definitions conform to their specifications.

Exercise 14.6: Iterating on Iterables

Implement the following methods according to their specifications using only enhanced for-loops; do not call other CS2110List methods on the parameters.

(a)

1
2


/** Returns whether `x` is in `list`. */
static <T> boolean contains(CS2110List<T> list, T x) { ... }

1
2


/** Returns whether `x` is in `list`. */
static <T> boolean contains(CS2110List<T> list, T x) { ... }

(b)

1
2
3
4
5


/** 
 * Returns the index of `x` in `list`. Throws a `NoSuchElementException` if 
 * `x` is not in `list`. 
 */
static <T> int indexOf(CS2110List<T> list, T x) { ... }

1
2
3
4
5


/** 
 * Returns the index of `x` in `list`. Throws a `NoSuchElementException` if 
 * `x` is not in `list`. 
 */
static <T> int indexOf(CS2110List<T> list, T x) { ... }

(c)

Given a list of lists, a row-major ordering of the elements in the inner lists consists of all elements of the first inner list (in order), followed by all elements of the second inner list (in order), etc.

1
2


/** Returns a list of all elements in `matrix` in row major order. */
static <T> CS2110List<T> flatten(CS2110List<CS2110List<T>> matrix) { ... }

1
2


/** Returns a list of all elements in `matrix` in row major order. */
static <T> CS2110List<T> flatten(CS2110List<CS2110List<T>> matrix) { ... }

(d)

A matrix is rectangular if it has at least one row and all of its rows have the same length.

1
2


/** Returns whether `matrix` is rectangular. */
static <T> boolean isRectangular(CS2110List<CS2110List<T>> matrix) { ... }

1
2


/** Returns whether `matrix` is rectangular. */
static <T> boolean isRectangular(CS2110List<CS2110List<T>> matrix) { ... }

(e)

An inner join between two tables is one of the most common operations in relational databases. It considers all pairs of elements from left.get(i) and right.get(i). For this exercise, we join a pair of rows if the pair satisfies left.get(0).equals(right.get(0)). The joined row is formed by concatenating the left row with the right row, excluding the first element. As an example, consider the two tables:

State	Capital
Texas	Houston
California	Sacramento
New York	Albany

State	City
New York	Ithaca
California	Los Angeles
New York	New York City

When joining the tables, we get the following table. Notice that there is no row for Texas since it did not appear in the right table.

State	Capital	City
California	Sacramento	Los Angeles
New York	Albany	Ithaca
New York	Albany	New York City

1
2
3
4
5
6
7


/**
 * Performs an inner join on `left` and `right` with the join condition being
 * `left.get(0).equals(right.get(0))`. Requires `left` and `right` are rectangular
 * and the entries of their outer lists correspond to table rows.
 */
static <T> CS2110List<CS2110List<T>> innerJoin(CS2110List<CS2110List<T>> left,
    CS2110List<CS2110List<T>> right) { ... }

1
2
3
4
5
6
7


/**
 * Performs an inner join on `left` and `right` with the join condition being
 * `left.get(0).equals(right.get(0))`. Requires `left` and `right` are rectangular
 * and the entries of their outer lists correspond to table rows.
 */
static <T> CS2110List<CS2110List<T>> innerJoin(CS2110List<CS2110List<T>> left,
    CS2110List<CS2110List<T>> right) { ... }

Exercise 14.7: Iterators that Skip

Add a method skipIterator() to the CS2110List interface with the following specifications.

1
2
3
4
5


/**
  * Returns an iterator that produces every other element of this list, beginning
  * with the first element. For example, on the list [1,2,3,4,5], a skip iterator
  * would return only elements 1, 3, and 5 (in that order).
  */

1
2
3
4
5


/**
  * Returns an iterator that produces every other element of this list, beginning
  * with the first element. For example, on the list [1,2,3,4,5], a skip iterator
  * would return only elements 1, 3, and 5 (in that order).
  */

Implement this method for the DynamicArrayList, SinglyLinkedList, DoublyLinkedList, and CircularLinkedList classes.

Exercise 14.8: Writing Lambda Expressions

Use the transformAll() method to carry out the following modifications to the described lists. The argument to transformAll() should be a lambda expression.

(a)

Given a CS2110List<String> list, convert all of the Strings to contain only uppercase characters.

(b)

Given a CS2110List<Double> inputs, apply the \(\textrm{ReLU}()\) function to all of its elements.

\[ \textrm{ReLU}(x)=\begin{cases} 0 & \textrm{if } x < 0 \\ x & \textrm{if } x \ge 0 \end{cases} \]

This is an important operation for many neural networks.

(c)

Given a CS2110List<CS2110List<Integer>> matrix, multiply all the elements of all of its inner lists by 3.

Exercise 14.9: map()

Much like transformAll(), the map() method applies an operation on each element of a list. However, the output may be of a different type than the original. To support this method, we define a functional interface Converter, where T is the input type and R is the output type.

1
2
3
4
5
6
7
8
9


/** 
 * Represents a function that takes one value of type `T` as its input and 
 * produces another value of type `R` as its output.
 */
@FunctionalInterface
public interface Converter<T, R> {
  /** Converts the `T` value `x` to its corresponding output `R` value. */
  public R convert(T x);
}

1
2
3
4
5
6
7
8
9


/** 
 * Represents a function that takes one value of type `T` as its input and 
 * produces another value of type `R` as its output.
 */
@FunctionalInterface
public interface Converter<T, R> {
  /** Converts the `T` value `x` to its corresponding output `R` value. */
  public R convert(T x);
}

(a)

Implement this map() method.

1
2


/** Applies `f` on all elements in `list`. */
static <T, R> CS2110List<R> map(CS2110List<T> list, Converter<T, R> f) { ... }

1
2


/** Applies `f` on all elements in `list`. */
static <T, R> CS2110List<R> map(CS2110List<T> list, Converter<T, R> f) { ... }

(b)

Suppose you have a CS2110List<String> list. Use map() to produce a CS2110List<Integer> of their lengths.

(c)

Implement this mapMatrix() method by invoking map().

1
2
3


/** Applies `f` on all elements in `matrix`. */
static <T, R> CS2110List<CS2110List<R>> mapMatrix(CS2110List<CS2110List<T>> matrix, 
    Converter<T, R> f) { ... }

1
2
3


/** Applies `f` on all elements in `matrix`. */
static <T, R> CS2110List<CS2110List<R>> mapMatrix(CS2110List<CS2110List<T>> matrix, 
    Converter<T, R> f) { ... }

Exercise 14.10: filter()

Another common operation on lists is to filter the list on some predicate. Only elements that fulfill the predicate will be in the resulting array. We'll use the functional interface Predicate to implement filter().

1
2
3
4
5
6


/** Represents a predicate (boolean-valued function) of an argument of type `T`. */
@FunctionalInterface
public interface Predicate<T> {
  /** Evaluates this predicate on `t`. */
  boolean test(T t);
}

1
2
3
4
5
6


/** Represents a predicate (boolean-valued function) of an argument of type `T`. */
@FunctionalInterface
public interface Predicate<T> {
  /** Evaluates this predicate on `t`. */
  boolean test(T t);
}

(a)

Implement this filter() method.

1
2


/** Filters `list` on predicate `p`. */
static <T> CS2110List<T> filter(CS2110List<T> list, Predicate<T> p) { ... }

1
2


/** Filters `list` on predicate `p`. */
static <T> CS2110List<T> filter(CS2110List<T> list, Predicate<T> p) { ... }

(b)

Given a CS2110List<Integer> allNums, use the filter() method to initialize lists evenNums and oddNums that contain only the even and odd (respectively) entries of allNums.

(c)

In A5, the method availableWeapons does a filter on the Weapon[] to select those that are not equipped. We’ve refactored the class to use CS2110List<Weapon> instead. As reference, consider this partial class definition of Weapon.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/** A weapon that a Fighter can equip. */
public class Weapon {
  /**
   * `true` if this weapon is currently equipped by a player, `false` otherwise.
   */
  private boolean equipped;

  /** Returns whether this weapon is currently equipped by a player. */
  public boolean isEquipped() {
      return equipped;
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/** A weapon that a Fighter can equip. */
public class Weapon {
  /**
   * `true` if this weapon is currently equipped by a player, `false` otherwise.
   */
  private boolean equipped;

  /** Returns whether this weapon is currently equipped by a player. */
  public boolean isEquipped() {
      return equipped;
  }
}

Implement the availableWeapons() to use filter() and a lambda expression.

1
2
3
4
5
6
7
8
9


public class GameEngine {
  /** The weapons that fighters can equip during this game simulation. */
  private CS2110List<Weapon> weapons;

  /**
   * Returns a list of the weapons that are currently not equipped by any player.
   */
  public CS2110List<Weapon> availableWeapons() { ... }
}

1
2
3
4
5
6
7
8
9


public class GameEngine {
  /** The weapons that fighters can equip during this game simulation. */
  private CS2110List<Weapon> weapons;

  /**
   * Returns a list of the weapons that are currently not equipped by any player.
   */
  public CS2110List<Weapon> availableWeapons() { ... }
}

Exercise 14.11: fold()

The fold() method combines all elements from a list into a single result, step by step. You start with some initial value, then continuously incorporate another element in the list to generate some intermediate results to eventually get a final result. We'll be using the functional interface Accumulator to enable us to implement fold().

1
2
3
4
5
6


/** An accumulator over elements of type `T` to generate a result of type `R`. */
@FunctionalInterface
public interface Accumulator<R, T> {
  /** Combines the current accumulated value `acc` with the next element `elem`. */
  R accumulate(R acc, T elem);
}

1
2
3
4
5
6


/** An accumulator over elements of type `T` to generate a result of type `R`. */
@FunctionalInterface
public interface Accumulator<R, T> {
  /** Combines the current accumulated value `acc` with the next element `elem`. */
  R accumulate(R acc, T elem);
}

(a)

Implement the fold() method.

1
2
3
4
5


/** 
 * Performs a left fold over the given `list` with initial value `init` and 
 * accumulator `f`. 
 */
public static <R, T> R fold(CS2110List<T> list, R init, Accumulator<R, T> f) { ... }

1
2
3
4
5


/** 
 * Performs a left fold over the given `list` with initial value `init` and 
 * accumulator `f`. 
 */
public static <R, T> R fold(CS2110List<T> list, R init, Accumulator<R, T> f) { ... }

In the specs, we note that this method is a left fold. This is important when the accumulator f is not associative. To perform a left fold, consider a list of integers [1,2,3,4], with an accumulator \(f(x, y)\), and an initial value of init. The returned value of fold() should be \[f(f(f(f(\texttt{init}, 1), 2), 3), 4)\]

(b)

Implement the sum() method with fold() and a lambda expression.

1
2


/** Returns the sum of elements in `list`. */
public static Integer sum(CS2110List<Integer> list) { ... }

1
2


/** Returns the sum of elements in `list`. */
public static Integer sum(CS2110List<Integer> list) { ... }

(c)

Can you implement filter() and map() with fold()?

14. Iterating over Data Structures

Iterables and Iterators

Making Classes Iterable

Defining Iterator Classes

Using Iterators as a Client

while-loop:

for-loop:

Enhanced for-Loops

Concurrent Modification

The Iteratee Pattern

Functional Interfaces

Lambda Expressions

Main Takeaways:

Exercises

On this page:

`Iterable`s and `Iterator`s

Making Classes `Iterable`

Defining `Iterator` Classes

Using `Iterator`s as a Client

`while`-loop:

`for`-loop:

Enhanced `for`-Loops