CS 410, Summer 1998 Dan Grossman Lecture 10 Outline Goals: * Heaps Reading: CLR Chapter 7 ADT: Priority Queue * Insert * Delete-min (or max -- just flip every < or >, but not both) (Note: The order of deletion between ties may be unspecified.) Implementations: Lists/Arrays: O(1) for one and O(n) for the other Balanced Trees from last week: O(log n) for both If priorities known in advance and there are m of them : array of queues O(1) for insert and O(m) for delete-min. (For insert, just enqueue on the right one. For delete-min must find the highest priority queue that is not empty.) Heap: O(log n) for both with much better constants than trees. Hence this is a good example of making a special-purpose data structure when the operations are fewer and constants matter. A heap is a binary tree with the heap property: * Parent less (more for delete-max) than its children * Totally balanced with left children all the way to the left The shape to remember is: /\ / \ / \ / ___\ /___| Heaps for a particular set of keys are not unique. For example, the following are both legal heaps: 0 0 / \ / \ 3 5 7 3 / \ / \ / \ / \ 6 7 8 9 8 9 5 6 Insert: Put element in last position, bubble up: x = "last position" put value there while (x.key < x.parent.key) { swap (x, x.parent); // such that parent switches } Delete-min: Take out root, put last thing at root, bubble down. answer = root; replace root with thing in last position // the rest is called heapify: i = root.key; x = root; while (x != smallest) { smallest = x; if (x.left != null && x.left.key < smallest.key) smallest = x.left; if (x.right != null && x.right.key < smallest.key) smallest = x.right; swap (x, smallest); } (Note: Finding new last thing might be a pain, as we'll see in a minute it's not worth discussing) Example: 0 0 2 3 / \ insert 2 / \ delete / \ delete / \ 3 5 ======> 2 5 =====> 3 5 =====> 6 5 / \ / \ / \ / \ / \ / \ / \ / 6 7 8 9 3 7 8 9 6 7 8 9 9 7 8 / 6 As we've done it, it's not that much faster than a BST. After all, we glossed over "find the last position". How everybody does it: We exploit the "totally balanced" to effectively use an array: root at index 1 left child at j*2 right child at j*2 + 1 keep the current size in a separate variable called heapSize. Notes: * We waste the zeroth position to make things simpler and faster. * If the heap size is n, then exactly the first n places in the array are full. * The last position increments on insert and decrements on delete-min * We can resize the array as necessary. * parent of element at position j is at j/2 (integer division) This is really slick! To see how a heap drawn as a tree fits in an array, just write the nodes down top-to-bottom and left-to-right! Examples from above become 0 3 5 6 7 8 9 and 0 7 3 8 9 5 6 We can write down our operations without pseudocode. (Although we'll be sloppy and forget that data needs to be associated with each key. Adding this is not hard.): Insert(key): heapSize++; arr[heapSize] = obj; int i = heapSize; while (i > 1 && arr[i] < arr[i/2]) { temp = arr[i]; arr[i] = arr[i/2]; arr[i/2] = temp; i = i/2; } Delete-min(): answer = arr[1]; arr[1] = arr[heapSize]; heapSize--; Heapify(1); Heapify(int i): smallest = i; if (i*2 <= heapSize && arr[i*2] < arr[smallest]) smallest = i*2; if (i*2 + 1 <= heapSize && arr[i*2+1] < arr[smallest]) smallest = i*2 + 1; if (smallest != i) { temp = arr[i]; arr[i] = arr[smallest]; arr[smallest] = arr[i]; Heapify(smallest); } Notice these are quite efficient. Multiplying and dividing by two is cheap, as is swapping two array elements. They are clearly O(log n), since a heap is balanced and all operations walk up or down the tree only once. Actually, we can eliminate the swapping too. Rather than just repeatedly swapping when one of the elements is the same, we can just move the other elements (down for insert, up for heapify) and then put the "far-moving" element in when we're done: Insert(key): heapSize++; arr[heapSize] = key; i = heapSize; while (i > 1 && arr[i/2] < key) { arr[i] = arr[i/2]; i = i/2; } arr[i] = obj; Heapify(int i, key): smallest_val = key; int smallest_index; if (i*2 <= heapSize && arr[i*2] < smallest_val) smallest_val = arr[i*2]; smallest_index = i*2; if (i*2 + 1 <= heapSize && arr[i*2+1] < smallest_val) smallest_val = arr[i*2 + 1]; smallest_index = i*2 + 1; if (smallest_val != key) { arr[i] = smallest_val; Heapify(smallest_index, key); } else { i = key; } Each iteration is now a couple math operations and an assignment or two. That's it. Build-Heap: Build-heap is an operation which takes an unsorted array (of size n, and ignoring the zeroth element) and turns it into a heap. A naive implementation would just essentially use n insert operations, as follows: BuildHeapNaive(array) { for (int heapSize = 0; heapSize < array.length; heapSize++) { heapSize++; bubble up array[heapSize] as we did in insert } } In the worst case, every one of the bubble ups goes all the way to the root. One of them starts at the root, two of them one level away, four of them two levels away, etc., so the total running time is 1*0 + 2*1 + 4*2 + 8*3 + ... + (n/2)(log n) where the last term indicates that roughly half the values have to bubble up the height of the final tree. Just by looking at the last term, we can see this method is Omega(nlog n). A better solution builds the heap "bottom up" by making many heaps of height one, then height two, then height three, until there is only one heap of height log n. To make a larger heap, we take two smaller heaps (which have full bottom levels), put a new value at the root and bubble down as in delete: BuildHeap(array) { for (int i = array.length; i > 0; i--) { Heapify(i); } } In fact, the calls at the bottom level are trivial, so we could actually start the for loop with i = array.length/2. In the worst case, every one of the bubble downs (er, heapify) goes all the way to the leaves. In this case, the total running time is (n/2)*0 + (n/4)*1 + (n/8)*2 + (n/16)*3 + ... + 1*(log n) Notice that comparted to the previous solution we've avoiding multiplying the larger values with each other. The sum is n *(sum from h=0 to log n) (h/(2^h)) Using equation 3.6 of CLR with x = 1/2, we know the sum that went to infinity instead of log n is only 2. So our running time is < 2n. That is, O(n), an asymptotic improvement. Next time we will see some other applications of heaps and then move on to hash tables.