CS312 Lecture 13
Reasoning about Complexity

Let's take a look at a useful algorithm in more detail and show that it is not only correct but that its worst-case performance is O(n lg n). The algorithm we'll look at is merge sort, a recursive algorithm for sorting a list of items. Merge sort is an example of a divide-and-conquer algorithm. It sorts a list by dividing it into two smaller sublists, recursively sorting the sublists, and then merging the two sorted lists together to produce the final result. Merging two lists is pretty simple if they themselves are already sorted. To prove the correctness and run time of merge sort we will want a stronger proof technique: strong induction.

Strong induction

Strong induction has the same 5 steps as ordinary induction, but the induction hypothesis is a little different:

  1. State the proposition to be proved in terms of P(n)
  2. Base case: show P(n0) is true
  3. Induction hypothesis: Assume that P(m) is true for all n0mn. This is different from ordinary induction where we only get to assume that P(m) is true for m=n.
  4. Induction step: Using the induction hypothesis, prove P(n+1) is true.
  5. Conclusion:  P(n) is true for all nn0

It is often easier to prove asymptotic complexity bounds using strong induction than it is using ordinary induction, because you have a stronger induction hypothesis to work with when trying to prove P(n+1).

Implementation and correctness of merge sort

(* split(xs) is a pair (ys,zs) where half (rounding up) of the elements of xs are
   found in ys and the rest are in zs. *) 
fun split (xs: int list): int list * int list = 
    let fun loop(xs:int list, left:int list, right:int list):int list * int list =
	case xs of
	    nil => (left, right)
	  | x::nil => (x::left, right)
	  | x::y::rest => loop(rest, x::left, y::right)
    in 
	loop(xs, [], [])
    end

(* A simpler way to write split. Recall the definition of foldl. What is the
   asymptotic performance of foldl f lst0 lst where f is an O(1) function and lst is an
   n-element list? O(n). *)
fun split2(xs:int list) : int list * int list = 
  foldl (fn (x, (left,right)) => (x::right,left)) ([],[]) xs
	
(* merge(left,right) is a sorted list (in ascending order)
 * containing all the elements of left and right.
 * Requires: left and right are sorted lists *)
fun merge (left: int list, right: int list): int list =
  case (left, right) of
    (nil,_) => right
  | (_,nil) => left
  | (x::left_tail, y::right_tail) => 
      (if x > y then y::(merge(left, right_tail))
                else x::(merge(left_tail, right))))

How do we know that merge works? By induction on the sum of the length of the two input lists (i.e., length(left)+length(right)). Clearly if that minimum length is zero, the function works because one of the first two cases are used and they are trivially correct. What about the general case? We are trying to show that merge works on lists left and right whose total length is n+1, and we are allowed to assume that it works on lists left and right whose total length is n or less. If one of the two lists is empty the function works. What if both lists are non-empty? By the precondition (requires clause) we know that x is the smallest element of left and y the smallest element of right, and that rest_left and rest_right are sorted lists. Our inductive hypothesis lets us assume that merge works correctly in the recursive calls because the total length of the two lists is smaller than the total length of left and right, and the precondition of merge in the recursive calls is satisfied (it is being applied to sorted lists). If the then branch executes, y must be smaller than any element in either list; therefore, y::(merge(left, rest_right)) is a sorted list. Conversely, in the else branch x is smaller than or equal to any element in either list; therefore x::(merge(rest_left, right)) is also a sorted list. And we can see that merge doesn't "lose" any elements of left or right assuming that the recursive calls don't either.

Now we can write the merge-sort function itself. Note how we explicitly separate the specification of the function from the description of the algorithm that implements it. With merge and split specified as above, we don't really need even this much description of how merge_sort works.

(* merge_sort(xs) is a list containing the same elements as xs but in
 * ascending (nondescending) sorted order.
 *
 * Implementation: lists of size 0 or 1 are already sorted. Otherwise,
 * split the list into two lists of equal size, recursively sort
 * them, and then merge the two lists back together. *)
fun merge_sort (xs: int list) : int list =

  case xs of
    [] => []   
  | [x] => [x]
  | _ => let val (left, right) = split xs
	 in 
           merge (merge_sort(left), merge_sort(right))
	 end

Again, we can see by induction on the length of the input list that this function works. For lists of length 0 or 1 it clearly works. For larger lists we observe from the specification for split that both left and right must contain some elements and together they contain all the elements of xs. By the inductive hypothesis, merge_sort applied to each of these lists results in sorted lists. From the specification for merge the result must be a sorted list containing all the elements of xs. Therefore the merge_sort function will work correctly.

Merge sort asymptotic timing analysis

Now let's show that merge_sort is not only a correct but also an efficient algorithm for sorting lists of numbers. We start by observing without proof that the performance of the split function is linear in the size of the input list. This can be shown by the same approach we will take for merge, so let's just look at merge instead.

The merge function too is linear-time -- that is, O(n) -- in the total length of the two input lists. We will first find a recurrence for the execution time. Suppose the total length of the input lists is zero or one. Then the function must execute one of the two O(1)  arms of the case expression. These take at most some time c0 to execute. So we have

T(0) = c0
T(1) = c0

Now, consider lists of total length n. The recursive call is on lists of total length n-1, so we have

T(n) = T(n-1) + c1

where c1 is an constant upper bound on the time required to execute the if statement and the operator :: (which takes constant time for usual implementations of lists). This gives us a recurrence to solve for T.  We can apply the iterative method to solve the recurrence by expanding out the recurrence inequalities for the first few steps. 

T(0) = c0
T(1) = c0
T(2) = T(1) + c1 = c0 + c1
T(3) = T(2) + c1 = c0 + 2c1
T(4) = T(3) + c1 = c0 + 3c1
...
T(n) = T(n−1) + c1 = c0 + (n-1)c1 = (c0 + c1) + c1n

We notice a pattern which the last line captures. Recall that T(n) is O(n) if for all n greater than some n0, we can find a constant k such that T(n) < kn. For n at least 1, this is easily satisfied by setting k = c0 + 2c1. Or we can just remember that any first-degree polynomial is O(n) and also Q(n). An even simpler way to find the right bound is to observe that the choice of constants c0 and c1 doesn't matter; if we plug in 1 for both of them we get T(1) = 1, T(2)=2, T(3)=3, etc., which is clearly O(n).

Now let's consider the merge_sort function itself. Again, for zero- and one-element lists we compute in constant time. For n-element lists we make two recursive calls, but to sublists that are about half the size, and calls to split and merge that each take Q(n) time. For simplicity we'll pretend that the sublists are exactly half the size. The recurrence we obtain has this form:

T(0) = c0
T(1) = c0
T(n) = 2 T(n/2) + c1n +  c2n + c3

Let's use the iterative method to figure out the running time of merge_sort. We know that any solution must work for arbitrary constants c0 and c4, so again we replace them both with 1 to keep things simple. That leaves us with the following recurrence equations to work with:

T(1) = 1
T(n) = 2 T(n/2) + n

Starting with the iterative method, we can start expanding the time equation until we notice a pattern:

T(n) = 2T(n/2) + n
     = 2(2T(n/4) + n/2) + n
     = 4T(n/4) + n + n
     = 4(2T(n/8) + n/4) + n + n
     = 8T(n/8) + n + n + n
     = nT(n/n) + n + ... + n + n + n
     = n + n + ... + n + n + n

Counting the number of repetitions of n in the sum at the end, we see that there are lg n + 1 of them.  Thus the running time is n(lg n + 1) = n lg n + n. We observe that n lg n + n < n lg n + n lg n = 2n lg n for n>0, so the running time is O(n lg n).  So now we've done the analysis by using the iterative method, let's use induction to verify that the bound is correct. It will be convenient to use a slightly different version of the induction proof technique known as strong or course-of-values induction.

Merge sort analysis using strong induction

Consider n0 = 2.

Property of n to prove:

For n>n0, there exists
T(n) = n lg n + n

Proof by strong (course-of-values) induction on n

Base case: n = 1
    T(1) = 1 = 1 lg 0 + 1

Induction Step:

Induction Hypothesis:
T(k) = k lg k + k         for all k£n

Property to prove for n+1:
T(n+1) = (n+1) lg (n+1) + (n+1)

Proof:

T(n+1) = 2 T((n+1)/2) + (n+1)

    = 2 ((n+1)/2 lg ((n+1)/2) + (n+1)/2) + (n+1)             (by induction hypothesis)

    = (n+1)(lg ((n+1)/2)) + (n+1) + (n+1)

    = (n+1)(lg(n+1) − 1) + 2(n+1)

    = (n+1) lg(n+1) + (n+1)


Thus we have shown that merge sort is Q(n lg n).

The substitution method

Here is another way to compute the asymptotic complexity: guess the answer (In this case, O(n lg n)), and plug it directly into the recurrence. By looking at what happens we can see whether the guess was correct or whether it needs to be increased to a higher order of growth (or can be decreased to a lower order). This works as long as the recurrence equations are monotonic in n, which is usually the case. By monotonic, we mean that increasing n does not cause the right-hand side of any recurrence equation to decrease.

For example, consider our recurrence for merge sort. To show T(n) is O(n lg n), we need to show that  T(n) kn lg n for large n and some choice of k. Define F(n) = n lg n, so we are trying to show that T(n) kF(n). This turns out to be true if we can plug kF(n) into the recurrence for T(n) and show that the recurrence equations hold as "≥" inequalities. Here, we plug the expression kn lg n into the merge-sort recurrence:

kn lg n ≥ 2k(n/2) lg (n/2) + c4n
           = kn
lg (n/2)  + c4n
           = kn
(lg n −1)  + c4n
           = kn
lg nkn  + c4n
           = kn
lg n + (c4k)n

Can we pick a k that makes this inequality come true for sufficiently large n? Certainly; it holds if kc4. Therefore this function is O(n lg n). In fact, we can make the two sides exactly equal by choosing k=c4, which tells us that it is Θ(n lg n) as well.

More generally, if we want to show that a recurrence solution is O(F(n)), we show that we can choose k so that for each recurrence equation with kF(n) substituted for T(n), LHS ≥ RHS for all sufficiently large n. If we want to show that a recurrence is Θ(F(n)), we need to show that there is also a k such that LHS ≤ RHS for all sufficiently large n. In the case above, it happens that we can choose the same k.

Why does this work? It's really another use of strong induction where the proposition to be proved is that T(n) kF(n) for all sufficiently large n. We ignore the base case because we can always choose a large enough k to make the inequality work for small n. Now we proceed to the inductive step. We want to show that T(n+1) kF(n+1) assuming that for all mn we have T(m) kF(m). We have

T(n+1)   =   2T((n+1)/2) + c4n   ≤   2kF((n+1)/2) + c4n   ≤   kF(n+1)

so by transitivity T(n) F(n). The middle inequality follows from the induction hypothesis T((n+1)/2) ≤ F((n+1)/2) and from the monotonicity of the recurrence equation. The last step is what we showed by plugging kF(n) into the recurrence and checking that it holds for any sufficiently large n.

To see another example, we know that any function that is O(n lg n) is also O(n2) though not Θ(n2). If we hadn't done the iterative analysis above, we could still verify that merge sort is at least as good as insertion sort (asymptotically) by plugging kn2 into the recurrence and showing that the inequality holds for it as well:

kn2 ≥ 2k(n/2)2 + c4n
      =
½kn2 + c4n

For sufficiently large n, this inequality holds for any k. Therefore, the algorithm is  O(n2). Because it holds for any k, the algorithm is in fact o(n2). Thus, we can use recurrences to show upper bounds that are not tight as well as upper bounds that are tight.

On the other hand, suppose we had tried to plug in kn instead of kn2. Then we'd have:

kn ≥? 2k(n/2) + c4n
          = kn
+ c4n

Because c4 is positive, the inequality doesn't hold for any k ; therefore, the algorithm is not O(n). In fact, we see that the inequality always holds in the opposite direction (<); therefore kn is a strict lower bound on the running time of the algorithm; its running time is more than linear.

Thus, reasonable guesses about the complexity of an algorithm can be plugged into a recurrence and used not only to find the complexity, but also to obtain information about its solution.

Example: Another sorting algorithm

The following function sorts the first two-thirds of a list, then the second two-thirds, then the first two-thirds again:

fun sort3(a: int list): int list =
  case a of
    nil => nil
  | [x] => [x]
  | [x,y] => [Int.min(x,y), Int.max(x,y)]
  | a => let
      val n = List.length(a)
      val m = (2*n+2) div 3
      val res1 = sort3(List.take(a, m))
      val res2 = sort3(List.drop(res1, n-m) @
                       List.drop(a, m))
      val res3 = sort3(List.take(res1, n-m) @
                       List.take(res2, 2*m-n))
    in
      res3 @ List.drop(res2,2*m-n)
    end

Perhaps surprisingly, this algorithm actually does sort the list. We will leave the proof that it sorts correctly as an exercise to the reader. Its run time, on the other hand, we can derive from its recurrence. The routine does some O(n) work and then makes three recursive calls on lists of length 2n/3. Therefore its recurrence is:

T(n) = cn + 3T(2n/3)

Let's try plugging in possible solutions. How about F(n) = n lg n? Substituting into the right side we have

   cn + 3kF(2n/3)
= cn + 3k(2n/3) lg (2n/3)
= cn + 2kn lg n − 2kn lg (2/3)

= cn + 2kn lg n + 2kn lg (3/2)

There is no way to choose k to make the left side (kn lg n) larger, so the algorithm is not O(n lg n); we must try a higher order of growth.

By plugging in kn2 and kn3 for T(n), we find that kn2 grows strictly more slowly than T(n) and kn3 grows strictly more quickly. We can solve for the correct exponent x by plugging in knx:

    cn + 3T(2n/3)
cn + 3k(2/3)xnx

This will be asymptotically less than knx as long as 3(2/3)x > 1 , which requires x > lg3/2 3 = 2.7095. Define this as a = lg3/2 3. Then we can see that the algorithm is O(na) for any positive ε. Let's try O(na) itself. The RHS after substituting kna is cn + 3(2/3)akna = cn + kna kna. This tells us that  kna is an asymptotic lower bound on T(n): T(n) is Ω(na). So the complexity is somewhere between Ω(na) and O(na). It is in fact Θ(na).

To show that the complexity is O(na), we need to use a refinement of the substitution method. Rather than trying F(n) = na, we will try  F(n) = na + bn where b is a constant to be filled in later. The idea is to pick a b so that bn will compensate for the cn  term that shows up in the recurrence. Because bn is O(na), showing T(n) is O(na + bn) is the same as showing that it is O(na). Substituting kF(n)for T(n) in the RHS of the recurrence, we obtain:

   cn + 3kF(2n/3)
= cn + 3k((2n/3)a + b(2n/3))
= cn + 3k(2n/3)a + 3kb(2n/3)
= cn + kna + 2kbn
= kna + (3kb+c)n

The substituted LHS of the recurrence is  kna + kbn, which is larger than  kna + (2kb+c)n as long as kb>2kb+c, or b<−c/k. There is no requirement that b be positive, so choosing k=1, b= −1 satisfies the recurrence. Therefore T(n) = O(na + bn) = O(na), and since T(n) is both O(na) and Ω(na), it is Θ(na).


Lower bounds on sorting performance

It turns out that no sorting algorithm can have asymptotic running time lower than O(n lg n), and thus other than constant factors in running time, merge sort is as good an algorithm as we can expect for sorting general data. Its constant factors are also pretty good, so it's a useful algorithm in practice. We can see that O(n lg n) time is needed by thinking about sorting a list of n distinct numbers. There are n! = n×(n1)×(n2)×...×3×2×1 possible lists, and the sorting algorithm needs to map all of them to the same sorted list by applying an appropriate inverse permutation. For general data, the algorithm must make enough observations about the input list (by comparing list elements pairwise) to determine which of the n! permutations was given as input, so that the appropriate inverse permutation can be applied and sort the list. Each comparison of two elements to see which is greater generates one bit of information about which permutation was given; at least lg(n!) bits of information are needed. Therefore the algorithm must take at least O(lg(n!)) time. It can be seen easily that n! is O(nn); note that lg nn=n lg n. With a bit more difficulty a stronger result can be shown: lg(n!) is Q(n lg n). Therefore merge sort is not only much faster than insertion sort on large lists, it is actually optimal to within a constant factor! This shows the value of designing algorithms carefully.

Note: there are sorting algorithms for specialized inputs that have better than O(n lg n) performance: for example, radix sort. This is possible because radix sort doesn't work by comparing elements pairwise; it extracts information about the permutation by using the element itself as an index into an array. This indexing operation can be done in constant time and on average extracts lg n bits of information about the permutation. Thus, radix sort can be performed using O(n) time, assuming that the list is densely populated by integers or by elements that can be mapped monotonically and densely onto integers. By densely, we mean that the largest integer in a list of length n is O(n) in size. By monotonically we mean that the ordering of the integers is the same as the ordering of the corresponding data to be sorted. In general we can't find a dense monotonic mapping, so Q(n lg n) is the best we can do for sorting arbitrary data.