Discussion 9: Heapsort
By this point in the semester, you should be familiar with multiple sorting algorithms, each with its benefits and drawbacks. Insertion sort has great adaptivity and is stable, but has very poor average and worst-case performance. Quicksort is very fast, but is not stable and still has bad worst-case time complexity. Merge sort guarantees better performance bounds, but uses more space and sacrifices adaptivity.
In today’s discussion, we will use the binary heap data structure from lecture to explore the implementation of another sorting algorithm, named heapsort. This sorting algorithm sorts an array with the help of a binary heap, and exhibits a different set of tradeoffs we’ll analyze.
Learning Outcomes
- Translate between the tree and array representations of a heap.
- Implement operations on a heap and determine their time/space complexities.
- Use a heap to implement a priority queue and analyze its performance.
Reminder: Discussion Guidelines
The work that you complete in discussion serves as a formative assessment tool; it offers the opportunity to assess your understanding of the material and for our course staff to get a “pulse” on how things are going, so we can make adjustments in future classes. Your grade in discussion is based on your attendance, active participation, and completion of that day’s activity. More information about the grading and expectations can be found in the syllabus.
Overview
Heapsort is a sorting algorithm split into two phases. First, all the elements to be sorted are placed into a binary (max) heap. Then, the algorithm repeatedly polls the top element of the heap to recover the final sorted order. In the next few exercises, you’ll be provided an implementation of a binary heap and will be using it to write multiple implementations of heapsort.
For today’s discussion, write all code by hand and submit your final write-up. This is meant to mimic the style of an exam, to provide you with practice building simple programs based on the given specs without the help of a compiler or IDE.
Simple Implementation
Assume you are given access to a class named IntegerBinaryMaxHeap, which models a priority queue using a binary max heap and has the following methods:
|
|
|
|
Coding
Using this heap class, write a simple heapsort on an array of integers. Your code should consist of two loops: first to add every element from the array into the heap, and a second to repeatedly poll from the heap and re-insert the elements back into the array in sorted order.
Remember, the heap provided is a max-heap. Consider what order this means the elements can be polled from this heap.
|
|
|
|
Loop Invariants
Draw a loop-invariant diagram for the state of the array arr for both of the above loops. Include in your invariant what values are currently inside the heap (but it is not necessary to draw a diagram for the heap itself).
Complexity Analysis
Assuming that the creation of the empty heap can be done in \(O(1)\) time, what is the time complexity of each of your loops relative to the length of the input array \(N\)?
What is the overall worst-case runtime complexity of your sorting algorithm?
What is its space complexity?
|
|
|
|
heapify()
Start by writing a method that takes an array and rearranges its elements to satisfy the invariants of a max-heap.
Hint: Calling bubbleDown() on the root of a subtree will leave the subtree in the correct state as long as all of the children of the subtree’s root have already been bubbled down into their correct positions.
Second Hint: Start at the back of the array and move forward. The body of your loop should be a single method call.
|
|
|
|
Efficient Heapsort
Now, with the array as a max-heap, loop through the array and poll from your heap while building up the sorted region of the array. Use the following invariant for your loop:
At each iteration, consider what a standard implementation of remove() would do to the element at arr[0]. Recall that this is a max-heap. Where does that element end up? What does the invariant require of that resulting location?
Hint: The body of your loop should make two calls to the helper methods listed above.
|
|
|
|
Complexity Analysis
We can argue that this heapify() definition has an \(O(N)\) runtime:
heapify() by the total number of swaps that it does during its bubbleDown() calls, as these dominate its runtime.At most \(\frac{N}{2}\) of the entries will begin at level \(\textrm{height} - 1\) in the tree, and they will require at most one swap when they are bubbled down.
Similarly, at most \(\frac{N}{4}\) of the entries will begin at level \(\textrm{height} - 2\) in the tree, and they will require at most two swap when they are bubbled down.
Continuing this pattern, we find that the total number of swaps can be upper bounded by:
\[ \frac{N}{2} + 2 \cdot \frac{N}{4} + 3 \cdot \frac{N}{8} = N \sum_{i=1}^{\infty} \frac{i}{2^i} = 2N = O(N). \]Including the complexity of heapify() you found earlier, what is the total time complexity of your heapsortEff() method? Which of the two steps (heapifying or iterated removing) dominates this complexity?
What is the space complexity of your heapsortEff() method?
Optional: Coding Exercise
Code up your written implementations and use the provided test cases to check them for correctness.
Submission
At the end of class, your group’s primary author should create a PDF with your written answers and upload this to the “Discussion 9” Gradescope assignment, tagging all other members of the group onto the submission.