Quicksort --------- Today we'll look at sorting and quicksort. We'll discuss randomization as an algorithmic technique and average-case complexity. ------------------------------------------------------------------ Suppose we want to sort N numbers -- Say, in a vector. -- Put them in increasing order. We know this can be done in O(n log n) time using mergesort or heapsort. These programs are pretty complicated, though. You probably also know some simpler code, which is O(n^2) (bubblesort) ------------------------------------------------------------------ FACT: *Every* comparison-based sorting algorithm must make at least n log n comparisons on inputs of length n in the worst case. That is because the algorithm must distinguish between n! ~ 2^(n log n) possible input permutations, and a binary-branching decision tree generated by a comparison-based algorithm must have depth at least n log n to have that many leaves. ------------------------------------------------------------------ Here's one of the fastest sorting algorithms around: QUICKSORT * But worst case its O(n^2); so why is it called quick? * It is O(n log n) *expected* time. * It's got a very small constant Note that big-O notation hides both constant factors and expected times. Idea: Say we want to sort the subrange of vector v between p and r, inclusive. Pick the first element of the subrange (i.e., v[p]) and call it the pivot. Move all elements in the subrange that are <= the pivot to the beginning of the subrange, say between p and q. Move all elements >= the pivot to the end of the range, say between q+1 and r. Recursively sort the subranges between p and q and between q+1 and r. ------------------------------------------------------------- ; sort the subrange of vector v from p to r, inclusive ==> (define (qsort! (v ) (p ) (r )) (when (< p r) (let ((q (partition! v p r))) (qsort! v p q) (qsort! v (+ 1 q) r)))) ; move small elements to beginning of interval ; large elements to end of interval ; return max bound of lower subinterval ==> (define (partition! (v ) (i ) (j )) (letrec ((pivot (vector-ref v i)) (count-down (lambda (k) (if (<= (vector-ref v k) pivot) k (count-down (- k 1))))) (count-up (lambda (k) (if (>= (vector-ref v k) pivot) k (count-up (+ k 1))))) (iter (lambda (i j) (cond ((< i j) (swap! v i j) (iter (count-up (+ i 1)) (count-down (- j 1)))) (else j))))) (iter (count-up i) (count-down j)))) ; swap two elements of an array in place ==> (define (swap ( v ) (i ) (j )) (let ((temp ) (vector-ref v i))) (begin (vector-set! v i (vector-ref v j)) (vector-set! v j temp )))) ------------------------------------------------------------------ Example. Suppose we're trying to sort v = 3 5 4 7 0 8 2 1 9 6 ^ ^ p r We pick v[p]=3 as the pivot. Now we start at i=p (respectively, j=r) and count up (respectively, down) until we find an element that is at least as big (respectively, small) as the pivot. 3 5 4 7 0 8 2 1 9 6 ^ ^ i j Now we switch those two elements in place. 1 5 4 7 0 8 2 3 9 6 ^ ^ i j Now we increment i and decrement j 1 5 4 7 0 8 2 3 9 6 ^ ^ i j and repeat. Find the first element starting at i and moving right that is >= the pivot and the first element starting at j and moving left that is <= the pivot 1 5 4 7 0 8 2 3 9 6 ^ ^ i j and exchange them. 1 2 4 7 0 8 5 3 9 6 ^ ^ i j Repeat. 1 2 4 7 0 8 5 3 9 6 ^ ^ i j 1 2 0 7 4 8 5 3 9 6 ^ ^ i j 1 2 0 7 4 8 5 3 9 6 ^ ^ j i We're done when j is to the left of i. Everything from j left is <= the pivot, and everything from j+1 right is >= the pivot. 1 2 0 7 4 8 5 3 9 6 ^ ^ j j+1 We recursively sort these two subarrays. 0 1 2 3 4 5 6 7 8 9 ^ ^ ^ ^ p j j+1 r ------------------------------------------------------------------ partition! selects a pivot element v[p] from v[p]...v[r], Then swaps the elements of v around until: Every element v[p]...v[q] <= pivot Every element v[q+1]...v[r] >= pivot An invariant of the loop is that everything strictly to the right of j is >= the pivot, and everything strictly to the left of i is <= the pivot. You can use this to show that i and j can never point outside the interval. First, the running time of partition! is O(r-p), it examines each element at most once. qsort calls partition until it's trying to sort 1-element subvectors. The running time depends on how balanced the partitions are. BEST CASE: If they're balanced, we cut the array in half on each iteration. T(n) = O(n) + 2T(n/2) = O(n log n) If they're unbalanced, though, it can get bad. WORST CASE: Suppose the pivot is always the smallest element. The partition is, 1 thing on the left and n-1 on the right. For example, when trying to sort [1,2,3,4,5]. T(n) = T(n-1) + O(n) (for partition and small case) T(1) = O(1) Time = O(n^2) We have recursive calls: n --> n-1 --> n-2 --> ... --> 2 --> 1 | | | | | | | | | | V V V V V 1 1 1 1 1 That is bad. ------------------------------------------------------------------ Those are the extreme cases. What's the "average case?" * Problem -- it's very hard to tell what an average case is in practice. Suppose that the partition produces a 9:1 split -- 90% in one half, 10% in the other. That sounds pretty unbalanced, but actually it's still O(n log n). T(n) = T(0.9 n) + T(0.1 n) + O(n) = O(n log n) (details later). As long as we split off a constant *FRACTION* of the input, we are OK. So, this lets us say something: If the "average case" means a completely random permutation of the input * Any ordering is equally likely * This may NOT be reasonable! Let's assume it for now anyway... Quicksort will occasionally have bad partitionings at some stages, but it's very unlikely to have enough of them to matter. It can be shown that, if we assume the input is random and uniformly distributed (all permutations equally likely), the probability that the partition is *better* than (1-a to a) --- for 0 < a <= 1/2 is 1-2a. For example, if we want a 9:1 or better split, then we compute: * a=0.1 * prob = 1-2(0.1) = 80% So, say, 4 arrays out of every 5 are at least 9:1. Even if the other arrays are utterly useless, this is *still* exponential decay, and we *still* get O(n log n). For intuition, let's pretend that good and bad splits alternate in the tree, with good splits in the best case (n/2) and bad ones in the worst (n-1): n / \ / \ 1 n-1 / \ / \ n/2 n\2 * So every two levels, the array's been cut in half. * That's like splitting it by size sqrt(1/2) at every level (on the average) * Which means, it's still exponential reduction -- O(n log n) ----------------------------------------------------------------- This isn't even vaguely rigorous, but with a bit of probability you can make it so. Here's the formal argument included for the sake of completeness, although I did not go through the details in class. I've included it just to illustrate some of the techniques of probabilistic analysis. THEOREM. The *expected* running time of quicksort, assuming the elements of the input vector are distinct and all permutations are equally likely, is O(n log n). PROOF. The *expected* running time on inputs of length n is the running time averaged over all inputs of length n. Let's denote this by E(T(n)) (read: expected value of T(n)). To get the expected running time, we compute for each input of length n - the time required on that input, times - the probability that that input occurs. Then we add all these up. (That's the definition of average.) Observe that if all the elements of the input vector are distinct, then the running time does not depend on the actual values, but only their order. Thus there are essentially n! different inputs, one for each permutation, and each one is equally likely. Let's assume then for simplicity that p=1, r=n, and the numbers we are sorting are 1,2,3,...,n. There are a couple of shortcuts we can take in computing the expected running time. First, for an input of length n, there are n possible choices for the pivot, and each one is equally likely. If the pivot is 1, the index returned by partition! is 1, and the breakup for the recursive call is 1:n-1. If the pivot is m+1, the index returned by partition! is m, and the breakup is m:n-m. (Walk through the algorithm to see why this is so.) If we know the breakup is m:n-m, then the algorithm takes time cn for some constant c, plus the time for the recursive calls, one on a vector of length m, the other on a vector of length n-m. Now here is a trick: if we somehow knew that the pivot would be 1, then the expected running time would be E(cn+T(1)+T(n-1)). If we somehow knew that the pivot would be m+1, then the expected running time would be E(cn+T(m)+T(n-m)). We don't know what the pivot will be, but we do know the probabilities of each possible pivot, and we can average these expected running times over all choices of m. This is done by summing the expected running time ***given that the pivot is m*** times the probability that the pivot is m. In other words, n-1 E(T(n)) = E(cn+T(1)+T(n-1))/n + sum E(cn+T(m)+T(n-m))/n m=1 Each term is divided by n because 1/n is the probability that any given number between 1 and n is the pivot. Now it turns out that the expectation operator E( ) is always a linear function. This means we can write E(T(n)) = cn/n + E(T(1))/n + E(T(n-1))/n n-1 n-1 n-1 + sum cn/n + sum E(T(m))/n + sum E(T(n-m))/n m=1 m=1 m=1 = cn + c' + E(T(n-1))/n n-1 n-1 + sum E(T(m))/n + sum E(T(n-m))/n m=1 m=1 where c' is a fixed constant. By renumbering the last sum, E(T(n)) = cn + c' + E(T(n-1))/n n-1 n-1 + sum E(T(m))/n + sum E(T(m))/n m=1 m=1 n-1 = cn + c' + E(T(n-1))/n + 2 sum E(T(m))/n m=1 Now we can solve this recurrence directly for E(T(n)). We will get E(T(n)) <= dn log n for some constant d. We get to choose d as big as we like to make this work. We can prove that dn log n is a solution to the recurrence by induction on n. For the basis, take n=2; then surely d can be chosen big enough that E(T(2)) <= d 2log2 = 2d. Now suppose E(T(m)) <= dm log m for all m < n. We have n-1 E(T(n)) = cn + c' + E(T(n-1))/n + 2 sum E(T(m))/n m=1 n-1 <= cn + c' + d(n-1)log(n-1)/n + 2 sum (dm log m)/n m=1 by the induction hypothesis n-1 <= cn + c' + d log n + 2d(sum m log m)/n m=1 n <= cn + c' + d log n + 2d(Integral m log m dm)/n 1 estimating the sum by a definite integral-- draw the graph! Now integrate by parts: = cn + c' + d log n + 2d(n^2 log n/2 - n^2 + 1)/n = cn + c' + d log n + dn log n - 2dn + 2d/n and we must show this is <= dn log n. We still get to choose d as big as we like (although d must be independent of n). Let's assert this last expression is > dn log n and show that we can pick d big enough to get a contradiction. cn + c' + d log n + dn log n - 2dn + 2d/n > dn log n Add 2dn - dn log n to both sides: cn + c' + d log n + 2d/n > 2dn Divide by d: cn/d + c'/d + log n + 2/n > 2n By picking d >= 2c and d >= 2c', we have n/2 + 1/2 + log n + 1 > 2n 2log n > 3(n-1) which is false for n >= 2. ------------------------------------------------------------------ OK, how reasonable is it to presume that the input is random? * Not very. Notice that the worst case is SORTED input: * Choosing v[p] as the pivot -- if the array is sorted -- guarantees a 1:n-1 split. Almost-sorted input is almost as bad. A very frequent application of sorting: * Take a huge mailing list (already sorted) * Add some names to it (unsorted) * Sort the whole thing. * Now, there *are* better algorithms (sort the new stuff, merge with the old stuff), * but you can't expect bulk-mailers to be very smart... Anyway, there are a lot applications which require sorting almost-sorted input. ------------------------------------------------------------------ So we *want* random input, but we don't *have* it. Answer: Make it random We call an algorithm RANDOMIZED if its behavior is determined by (1) the input (2) a random number source. (random n) returns a number between 0 and (n-1), - uniformly distributed, - pretty bad actually. One simple solution is, randomly scramble the input. * This will take us from whatever case we're in, to the average case. * So it's O(n log n) ;; (scramble! v) permutes the elements of v randomly. ; scramble a vector ==> (define (scramble! (v )) (letrec ((scram! (lambda (i) (cond ((< i (vector-length v)) (begin (swap! v i (random ( + 1 i)))) (scram! (+ 1 i))))))) (scram! 0))) (* Idea -- put each element in a random place, making sure we hit each element at least once. For those who are counting, we're rolling a one-sided die, then a two-sided die, then a three-sided die,... That's a total of n! choices, which is the same number of permutations...) If we put a call to scramble! in qsort, we ensure that our average-case analysis is correct: * We will occasionally get worst-case behavior * But very very rarely -- and MOST IMPORTANTLY, it won't depend on the input, which is much more likely to elicit bad running time Randomization is a powerful tool for algorithms, * Lets you get expected cases which are simple enough to understand. * Often it's hard to give an deterministic algorithm for something, * but randomization sometimes makes it much easier