Discussion 7: Linked Lists

Download Handout download

In lecture, we introduced several sorting algorithms implemented on arrays, each with its own benefits and drawbacks. However, arrays are not the only linear data structure we may want to sort! Today, we will focus on developing a merge sort implementation for singly-linked lists.

Unlike arrays, linked lists do not provide an \(O(1)\) random access guarantee. This limitation makes sorting algorithms, such as Quicksort, that depend on direct access to specific positions in the structure fairly cumbersome. Merge sort, however, is well-suited to linked lists because of its divide-and-conquer structure. As a reminder, the algorithm consists of three main steps:

  1. Split the list into two halves.
  2. Recurse on each half to sort them independently.
  3. Merge the two sorted halves back together into a single sorted list.

While array-based merge sort requires creating temporary arrays during the merge step, linked lists allow us to split and re-link nodes directly. In this discussion, you’ll develop an algorithm that leverages these properties while strengthening your ability to traverse and manipulate linked structures.

Learning Outcomes

Reminder: Discussion Guidelines

The work that you complete in discussion serves as a formative assessment tool, meaning it offers the opportunity to assess your understanding of the material and for our course staff to get a “pulse” on how things are going so we can make adjustments in future classes. Your grade in discussion is based entirely on attendance and participation; if you show up and you are actively engaged with the activity (working on the activity on your computer, discussing the activity with your group, asking and answering questions, etc.) for the entire 50-minute section period, you will earn full credit. If you complete the activity early, helping other students is a great way to further your own understanding. You do not need to submit any of the work that you complete during discussion.

Since discussion activities are not graded for correctness, we do not place any restrictions on resources that you may use to complete them, which include notes, books, unrestricted conversations with other students, internet searches, and the use of large language models or other generative AI. We advise you to be pragmatic about your use of these resources and think critically about whether they are enhancing your learning. Discussion activities are intended to serve as “strength training” for programming tasks we will expect on assignments and exams (and that you will encounter in future courses and careers), and their main benefit comes from critically thinking to “puzzle” them out.

Working together in small groups is encouraged during discussion!

Exercise 1: The split() Method
The first step of the Merge Sort algorithm is to split the list into two equal-sized halves, which we will do with a split() helper method.
1
2
3
4
5
/**
 * Splits the linked list starting at `head` into two halves. Returns the head of the second
 * half. If the list has an odd number of nodes, the extra node goes in the first half.
 */
static Node<Integer> split(Node<Integer> head) { ... }
1
2
3
4
5
/**
 * Splits the linked list starting at `head` into two halves. Returns the head of the second
 * half. If the list has an odd number of nodes, the extra node goes in the first half.
 */
static Node<Integer> split(Node<Integer> head) { ... }
Notice that this method has a Node as a parameter, not an IntLinkedList. Since we are within the IntLinkedList class, we can operate directly on the underlying linked chain, rather than an encapsulated list. This lets us avoid having to construct many new list objects for the recursive calls, but it also means we lose out on some of the nice CS2110List methods. In particular, we can't easily (in \(O(1)\) time) determine the size() of the chain beginning at head. Instead, we'll locate the midpoint of the list in a clever way:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
/**
 * Splits the linked list starting at `head` into two halves. Returns the head of the second
 * half. If the list has an odd number of nodes, the extra node goes in the first half.
 */
static Node<Integer> split(Node<Integer> head) {
  Node<Integer> slow = head;
  Node<Integer> fast = head.next;

  while (fast.data != null) {
    fast = fast.next;
    if (fast.data != null) {
      fast = fast.next;
      slow = slow.next;
    }
  }

  Node<Integer> mid = slow.next;
  slow.next = new Node<>(null, null);

  return mid;
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
/**
 * Splits the linked list starting at `head` into two halves. Returns the head of the second
 * half. If the list has an odd number of nodes, the extra node goes in the first half.
 */
static Node<Integer> split(Node<Integer> head) {
  Node<Integer> slow = head;
  Node<Integer> fast = head.next;

  while (fast.data != null) {
    fast = fast.next;
    if (fast.data != null) {
      fast = fast.next;
      slow = slow.next;
    }
  }

  Node<Integer> mid = slow.next;
  slow.next = new Node<>(null, null);

  return mid;
}
Trace through the split() method on the following example list:
(a)
For how many iterations will the while loop run? To which Nodes will fast and slow point at the end of each of these iterations?
(b)
Explain why this method works. How is it locating the midpoint of the list?
Exercise 2: The merge() Method
Now that we can split a list in half, the next step is to merge two sorted halves back into a single sorted list. We will write a second helper method:
1
2
3
4
5
/**
 * Returns the head of the merged list containing all nodes from the lists `head1` 
 * and `head2`, in sorted order. Requires that `head1` and `head2` are sorted.
 */
static Node<Integer> merge(Node<Integer> head1, Node<Integer> head2) { ... }
1
2
3
4
5
/**
 * Returns the head of the merged list containing all nodes from the lists `head1` 
 * and `head2`, in sorted order. Requires that `head1` and `head2` are sorted.
 */
static Node<Integer> merge(Node<Integer> head1, Node<Integer> head2) { ... }
(We will focus on implementing this iteratively.) Using the same list from Exercise 1, suppose that we've successfully sorted both halves and we now wish to merge them together:
(a)
What local variables will we need to instantiate in order to merge head1 and head2 into a single list?
(b)
In the given example, should head1 or head2 become the head of the merged list? How do we determine this in general? After we choose the head, how do we decide which node to add next from the two lists as we continue building the merged list?
(c)
In the given example, does head1 or head2 run out of nodes first? How can we attach all of the remaining nodes to the merged list?
(d)
What should we return at the end of the merge() method?
(e)
Use your answers to complete the definition of the merge() method in IntLinkedList.java from the starter code. Use the provided tests to check that your implementation is correct.
(f)
(Bonus Challenge): Implement the merge operation recursively by completing the mergeRecursive() method. What are the base and recursive cases?
Exercise 3: The mergeSort() Method
Finally, it's time to put everything together:
1
2
3
4
/**
* Sorts the entries of `head` using the merge sort algorithm. 
*/
static void mergeSort(Node<Integer> head) { ... }
1
2
3
4
/**
* Sorts the entries of `head` using the merge sort algorithm. 
*/
static void mergeSort(Node<Integer> head) { ... }
(a)
What are the base cases for this recursive method? What should the method return in these cases?
(b)
How can you use the split() and merge() helper methods to define the recursive step?
(c)
Use your answers to complete the definition of the mergeSort() method and use the provided tests to check that your implementation is correct.
Exercise 4: Complexity Analysis
The time complexity of merge sort on linked lists follows the same breakdown as the array-based implementation, giving a total runtime of \(O(N \log N)\). However, the space complexity differs!

What is the space complexity of merge sort on linked lists, and why is it different from the array-based implementation?
(Hint: What dominates the space complexity in the latter?)