Let's take a look at a useful algorithm in more detail and show that it is not only correct but that its worst-case performance is O(n lg n). The algorithm we'll look at is merge sort, a recursive algorithm for sorting a list of items. Merge sort is an example of a divide-and-conquer algorithm. It sorts a list by dividing it into two smaller sublists, recursively sorting the sublists, and then merging the two sorted lists together to produce the final result. Merging two lists is pretty simple if they themselves are already sorted. To prove the correctness and run time of merge sort we will want a stronger proof technique: strong induction.
Strong induction has the same 5 steps as ordinary induction, but the induction hypothesis is a little different:
It is often easier to prove asymptotic complexity bounds using strong induction than it is using ordinary induction, because you have a stronger induction hypothesis to work with when trying to prove P(n+1).
(* split(xs) is a pair (ys,zs) where half (rounding up) of the elements of xs are found in ys and the rest are in zs. *) fun split (xs: int list): int list * int list = let fun loop(xs:int list, left:int list, right:int list):int list * int list = case xs of nil => (left, right) | x::nil => (x::left, right) | x::y::rest => loop(rest, x::left, y::right) in loop(xs, [], []) end (* A simpler way to write split. Recall the definition of foldl. What is the asymptotic performance of foldl f lst0 lst where f is an O(1) function and lst is an n-element list? O(n). *) fun split2(xs:int list) : int list * int list = foldl (fn (x, (left,right)) => (x::right,left)) ([],[]) xs (* merge(left,right) is a sorted list (in ascending order) * containing all the elements of left and right. * Requires: left and right are sorted lists *) fun merge (left: int list, right: int list): int list = case (left, right) of (nil,_) => right | (_,nil) => left | (x::left_tail, y::right_tail) => (if x > y then y::(merge(left, right_tail)) else x::(merge(left_tail, right))))
How do we know that merge
works? By induction on the sum of the
length of the two input lists (i.e., length(left)+length(right)
).
Clearly if that minimum length is zero, the function works because one of the
first two cases are used and they are trivially correct. What about the general
case? We are trying to show that merge works on lists left
and right
whose total length is n+1, and we are
allowed to assume that it works on lists left
and right
whose total length is n or less. If one of
the two lists is empty the function works. What if both lists are non-empty? By
the precondition (requires clause) we know that x
is the
smallest element of left
and y
the smallest element of
right
, and that rest_left
and rest_right
are sorted lists. Our inductive hypothesis lets us assume that merge
works correctly in the recursive calls because the total length of the two lists
is smaller than the total length of left and right, and the precondition of
merge in the recursive calls is satisfied (it is being applied to sorted lists).
If the then branch executes, y
must be smaller than any
element in either list; therefore, y::(merge(left, rest_right))
is
a sorted list. Conversely, in the else branch x
is smaller
than or equal to any element in either list; therefore x::(merge(rest_left,
right))
is also a sorted list. And we can see that merge
doesn't "lose" any elements of left
or right
assuming that the recursive calls don't either.
Now we can write the merge-sort function itself. Note how we explicitly
separate the specification of the function from the description of the
algorithm that implements it. With merge
and split
specified as above, we don't really need even this much description of how merge_sort
works.
(* merge_sort(xs) is a list containing the same elements as xs but in * ascending (nondescending) sorted order. * * Implementation: lists of size 0 or 1 are already sorted. Otherwise, * split the list into two lists of equal size, recursively sort * them, and then merge the two lists back together. *) fun merge_sort (xs: int list) : int list = case xs of [] => [] | [x] => [x] | _ => let val (left, right) = split xs in merge (merge_sort(left), merge_sort(right)) end
Again, we can see by induction on the length of the input list that this
function works. For lists of length 0 or 1 it clearly works. For larger lists we
observe from the specification for split
that both left
and right
must contain some elements and together they contain all
the elements of xs
. By the inductive hypothesis, merge_sort
applied to each of these lists results in sorted lists. From the specification
for merge
the result must be a sorted list containing all the
elements of xs. Therefore the merge_sort
function will work
correctly.
Now let's show that merge_sort
is not only a correct but
also an efficient algorithm for sorting lists of numbers. We start by
observing without proof that the performance of the split
function
is linear in the size of the input list. This can be shown by the same approach
we will take for merge
, so let's just look at merge
instead.
The merge
function too is linear-time -- that is, O(n)
-- in the total length of the two input lists. We will first find a recurrence
for the execution time. Suppose the total length of the input lists is zero or
one. Then the function must execute one of the two O(1)
arms of the case expression. These take at most some time c0
to execute. So we have
T(0) = c0
T(1) = c0
Now, consider lists of total length n. The recursive call is on lists of total length n-1, so we have
T(n) = T(n-1) + c1
where c1 is an constant upper
bound on the time required to execute the if statement and the operator ::
(which takes constant time for usual implementations of lists). This gives us a
recurrence to solve for T. We can apply the iterative method to
solve the recurrence by expanding out the recurrence inequalities for the first
few steps.
T(0) = c0
T(1) = c0
T(2) = T(1) + c1 = c0 + c1
T(3) = T(2) + c1 = c0 + 2c1
T(4) = T(3) + c1 = c0 + 3c1
...
T(n) = T(n−1) + c1 = c0 + (n-1)c1 = (c0 + c1) + c1n
We notice a pattern which the last line captures. Recall that T(n) is O(n) if for all n greater than some n0, we can find a constant k such that T(n) < kn. For n at least 1, this is easily satisfied by setting k = c0 + 2c1. Or we can just remember that any first-degree polynomial is O(n) and also Q(n). An even simpler way to find the right bound is to observe that the choice of constants c0 and c1 doesn't matter; if we plug in 1 for both of them we get T(1) = 1, T(2)=2, T(3)=3, etc., which is clearly O(n).
Now let's consider the merge_sort
function itself. Again, for
zero- and one-element lists we compute in constant time. For n-element
lists we make two recursive calls, but to sublists that are about half the size,
and calls to split
and merge
that each take Q(n)
time. For simplicity we'll pretend that the
sublists are exactly half the size. The recurrence we obtain has this form:
T(0) = c0
T(1) = c0
T(n) = 2 T(n/2) + c1n + c2n + c3
Let's use the iterative method to figure out the running time of merge_sort
.
We know that any solution must work for arbitrary constants c0
and c4, so again we replace them
both with 1 to keep things simple. That leaves us with the following recurrence
equations to work with:
T(1) = 1
T(n) = 2 T(n/2) + n
Starting with the iterative method, we can start expanding the time equation until we notice a pattern:
T(n) = 2T(n/2) + n
= 2(2T(n/4) + n/2) + n
= 4T(n/4) + n + n
= 4(2T(n/8) + n/4) + n + n
= 8T(n/8) + n + n + n
= nT(n/n) + n + ... + n + n + n
= n + n + ... + n + n + n
Counting the number of repetitions of n in the sum at the end, we see that there are lg n + 1 of them. Thus the running time is n(lg n + 1) = n lg n + n. We observe that n lg n + n < n lg n + n lg n = 2n lg n for n>0, so the running time is O(n lg n). So now we've done the analysis by using the iterative method, let's use induction to verify that the bound is correct. It will be convenient to use a slightly different version of the induction proof technique known as strong or course-of-values induction.
Consider n0 = 2.Property of n to prove:
For n>n0, there exists
T(n) = n lg n + nProof by strong (course-of-values) induction on n
Base case: n = 1
T(1) = 1 = 1 lg 0 + 1Induction Step:
Induction Hypothesis:
T(k) = k lg k + k for all k£n
Property to prove for n+1:
T(n+1) = (n+1) lg (n+1) + (n+1)
Proof:
T(n+1) = 2 T((n+1)/2) + (n+1)
= 2 ((n+1)/2 lg ((n+1)/2) + (n+1)/2) + (n+1) (by induction hypothesis)
= (n+1)(lg ((n+1)/2)) + (n+1) + (n+1)
= (n+1)(lg(n+1) − 1) + 2(n+1)
= (n+1) lg(n+1) + (n+1)
Thus we have shown that merge sort is Q(n
lg n).
Here is another way to compute the asymptotic complexity: guess the answer (In this case, O(n lg n)), and plug it directly into the recurrence. By looking at what happens we can see whether the guess was correct or whether it needs to be increased to a higher order of growth (or can be decreased to a lower order). This works as long as the recurrence equations are monotonic in n, which is usually the case. By monotonic, we mean that increasing n does not cause the right-hand side of any recurrence equation to decrease.
For example, consider our recurrence for merge sort. To show T(n) is O(n lg n), we need to show that T(n) ≤ kn lg n for large n and some choice of k. Define F(n) = n lg n, so we are trying to show that T(n) ≤ kF(n). This turns out to be true if we can plug kF(n) into the recurrence for T(n) and show that the recurrence equations hold as "≥" inequalities. Here, we plug the expression kn lg n into the merge-sort recurrence:
kn lg n ≥ 2k(n/2) lg (n/2) + c4n
= kn lg (n/2) + c4n
= kn (lg n −1) + c4n
= kn lg n − kn + c4n
= kn lg n + (c4−k)n
Can we pick a k that makes this inequality come true for sufficiently large n? Certainly; it holds if k≥c4. Therefore this function is O(n lg n). In fact, we can make the two sides exactly equal by choosing k=c4, which tells us that it is Θ(n lg n) as well.
More generally, if we want to show that a recurrence solution is O(F(n)), we show that we can choose k so that for each recurrence equation with kF(n) substituted for T(n), LHS ≥ RHS for all sufficiently large n. If we want to show that a recurrence is Θ(F(n)), we need to show that there is also a k such that LHS ≤ RHS for all sufficiently large n. In the case above, it happens that we can choose the same k.
Why does this work? It's really another use of strong induction where the proposition to be proved is that T(n) ≤ kF(n) for all sufficiently large n. We ignore the base case because we can always choose a large enough k to make the inequality work for small n. Now we proceed to the inductive step. We want to show that T(n+1) ≤ kF(n+1) assuming that for all m≤n we have T(m) ≤ kF(m). We have
T(n+1) = 2T((n+1)/2) + c4n ≤ 2kF((n+1)/2) + c4n ≤ kF(n+1)
so by transitivity T(n) ≤ F(n). The middle inequality follows from the induction hypothesis T((n+1)/2) ≤ F((n+1)/2) and from the monotonicity of the recurrence equation. The last step is what we showed by plugging kF(n) into the recurrence and checking that it holds for any sufficiently large n.
To see another example, we know that any function that is O(n lg n) is also O(n2) though not Θ(n2). If we hadn't done the iterative analysis above, we could still verify that merge sort is at least as good as insertion sort (asymptotically) by plugging kn2 into the recurrence and showing that the inequality holds for it as well:
kn2 ≥ 2k(n/2)2 + c4n
= ½kn2 + c4n
For sufficiently large n, this inequality holds for any k. Therefore, the algorithm is O(n2). Because it holds for any k, the algorithm is in fact o(n2). Thus, we can use recurrences to show upper bounds that are not tight as well as upper bounds that are tight.
On the other hand, suppose we had tried to plug in kn instead of kn2. Then we'd have:
kn ≥? 2k(n/2) + c4n
= kn + c4n
Because c4 is positive, the inequality doesn't hold for any k ; therefore, the algorithm is not O(n). In fact, we see that the inequality always holds in the opposite direction (<); therefore kn is a strict lower bound on the running time of the algorithm; its running time is more than linear.
Thus, reasonable guesses about the complexity of an algorithm can be plugged into a recurrence and used not only to find the complexity, but also to obtain information about its solution.
The following function sorts the first two-thirds of a list, then the second two-thirds, then the first two-thirds again:
fun sort3(a: int list): int list = case a of nil => nil | [x] => [x] | [x,y] => [Int.min(x,y), Int.max(x,y)] | a => let val n = List.length(a) val m = (2*n+2) div 3 val res1 = sort3(List.take(a, m)) val res2 = sort3(List.drop(res1, n-m) @ List.drop(a, m)) val res3 = sort3(List.take(res1, n-m) @ List.take(res2, 2*m-n)) in res3 @ List.drop(res2,2*m-n) end
Perhaps surprisingly, this algorithm actually does sort the list. We will leave the proof that it sorts correctly as an exercise to the reader. Its run time, on the other hand, we can derive from its recurrence. The routine does some O(n) work and then makes three recursive calls on lists of length 2n/3. Therefore its recurrence is:
T(n) = cn + 3T(2n/3)
Let's try plugging in possible solutions. How about F(n) = n lg n? Substituting into the right side we have
cn + 3kF(2n/3)
= cn + 3k(2n/3) lg (2n/3)
= cn + 2kn lg n − 2kn lg (2/3)
= cn + 2kn lg n + 2kn lg (3/2)
There is no way to choose k to make the left side (kn lg n) larger, so the algorithm is not O(n lg n); we must try a higher order of growth.
By plugging in kn2 and kn3 for T(n), we find that kn2 grows strictly more slowly than T(n) and kn3 grows strictly more quickly. We can solve for the correct exponent x by plugging in knx:
cn + 3T(2n/3)
= cn + 3k(2/3)xnx
This will be asymptotically less than knx as long as 3(2/3)x > 1 , which requires x > lg3/2 3 = 2.7095. Define this as a = lg3/2 3. Then we can see that the algorithm is O(na+ε) for any positive ε. Let's try O(na) itself. The RHS after substituting kna is cn + 3(2/3)akna = cn + kna ≥ kna. This tells us that kna is an asymptotic lower bound on T(n): T(n) is Ω(na). So the complexity is somewhere between Ω(na) and O(na+ε). It is in fact Θ(na).
To show that the complexity is O(na), we need to use a refinement of the substitution method. Rather than trying F(n) = na, we will try F(n) = na + bn where b is a constant to be filled in later. The idea is to pick a b so that bn will compensate for the cn term that shows up in the recurrence. Because bn is O(na), showing T(n) is O(na + bn) is the same as showing that it is O(na). Substituting kF(n)for T(n) in the RHS of the recurrence, we obtain:
cn + 3kF(2n/3)
= cn + 3k((2n/3)a + b(2n/3))
= cn + 3k(2n/3)a + 3kb(2n/3)
= cn + kna + 2kbn
= kna + (3kb+c)n
The substituted LHS of the recurrence is kna + kbn, which is larger than kna + (2kb+c)n as long as kb>2kb+c, or b<−c/k. There is no requirement that b be positive, so choosing k=1, b= −1 satisfies the recurrence. Therefore T(n) = O(na + bn) = O(na), and since T(n) is both O(na) and Ω(na), it is Θ(na).
It turns out that no sorting algorithm can have asymptotic running time lower than O(n lg n), and thus other than constant factors in running time, merge sort is as good an algorithm as we can expect for sorting general data. Its constant factors are also pretty good, so it's a useful algorithm in practice. We can see that O(n lg n) time is needed by thinking about sorting a list of n distinct numbers. There are n! = n×(n−1)×(n−2)×...×3×2×1 possible lists, and the sorting algorithm needs to map all of them to the same sorted list by applying an appropriate inverse permutation. For general data, the algorithm must make enough observations about the input list (by comparing list elements pairwise) to determine which of the n! permutations was given as input, so that the appropriate inverse permutation can be applied and sort the list. Each comparison of two elements to see which is greater generates one bit of information about which permutation was given; at least lg(n!) bits of information are needed. Therefore the algorithm must take at least O(lg(n!)) time. It can be seen easily that n! is O(nn); note that lg nn=n lg n. With a bit more difficulty a stronger result can be shown: lg(n!) is Q(n lg n). Therefore merge sort is not only much faster than insertion sort on large lists, it is actually optimal to within a constant factor! This shows the value of designing algorithms carefully.
Note: there are sorting algorithms for specialized inputs that have better than O(n lg n) performance: for example, radix sort. This is possible because radix sort doesn't work by comparing elements pairwise; it extracts information about the permutation by using the element itself as an index into an array. This indexing operation can be done in constant time and on average extracts lg n bits of information about the permutation. Thus, radix sort can be performed using O(n) time, assuming that the list is densely populated by integers or by elements that can be mapped monotonically and densely onto integers. By densely, we mean that the largest integer in a list of length n is O(n) in size. By monotonically we mean that the ordering of the integers is the same as the ordering of the corresponding data to be sorted. In general we can't find a dense monotonic mapping, so Q(n lg n) is the best we can do for sorting arbitrary data.