CS312 Lecture 13
Reasoning about Complexity

Note: this page uses the following special characters: Greek capital letter theta: (Θ), Greek capital letter omega (Ω), minus sign (−). If these characters do not appear correctly, your browser is not able to fully handle HTML 4.0, and some of the following text will likely not have the correct appearance.

Let's take a look at a useful algorithm in more detail and show that it is not only correct but that its worst-case performance is O(n lg n). The algorithm we'll look at is merge sort, a recursive algorithm for sorting a list of items. Merge sort is an example of a divide-and-conquer algorithm. It sorts a list by dividing it into two smaller sublists, recursively sorting the sublists, and then merging the two sorted lists together to produce the final result. Merging two lists is pretty simple if they themselves are already sorted. To prove the correctness and run time of merge sort we will want a stronger proof technique: strong induction.

Strong induction

Strong induction has the same 5 steps as ordinary induction, but the induction hypothesis is a little different:

  1. State the proposition to be proved in terms of P(n)
  2. Base case: show P(n0) is true
  3. Induction hypothesis: Assume that P(m) is true for all n0mn. This is different from ordinary induction where we only get to assume that P(m) is true for m=n.
  4. Induction step: Using the induction hypothesis, prove P(n+1) is true.
  5. Conclusion:  P(n) is true for all nn0

It is often easier to prove asymptotic complexity bounds using strong induction than it is using ordinary induction, because you have a stronger induction hypothesis to work with when trying to prove P(n+1).

Implementation and correctness of merge sort

(* split(xs) is a pair (ys,zs) where half (rounding up) of the 
 * elements of xs end up in ys and the rest in zs. *) 
fun split (xs: int list): int list * int list = 
    let fun loop(xs:int list, left:int list, right:int list):int list * int list =
	case xs of
	    [] => (left, right)
	  | x::[] => (x::left, right)
	  | x::y::rest => loop(rest, x::left, y::right)
    in 
	loop(xs, [], [])
    end

(* A simpler way to write split. Recall the definition of foldl. What is the
 * asymptotic performance of foldl f v l where f is an O(1) function and 
 * l is an n-element list? O(n). *)
fun split2(xs:int list): int list * int list = 
  foldl (fn (x, (left,right)) => (x::right,left)) ([],[]) xs
	
(* merge(left,right) is a sorted list (in ascending order)
 * containing all the elements of left and right.
 * Requires: left and right are sorted lists *)
fun merge (left: int list, right: int list): int list =
  case (left, right) of
    (nil,_) => right
  | (_,nil) => left
  | (x::left_tail, y::right_tail) => 
      (if x > y then y::(merge(left, right_tail))
                else x::(merge(left_tail, right))))

How do we know that merge works? By induction on the sum of the length of the two input lists (i.e., length(left)+length(right)). Clearly if that minimum length is zero, the function works because one of the first two cases are used and they are trivially correct. What about the general case? We are trying to show that merge works on lists left and right whose total length is n+1, and we are allowed to assume that it works on lists left and right whose total length is n or less. If one of the two lists is empty the function works. What if both lists are non-empty? By the precondition (requires clause) we know that x is the smallest element of left and y the smallest element of right, and that rest_left and rest_right are sorted lists. Our inductive hypothesis lets us assume that merge works correctly in the recursive calls because the total length of the two lists is smaller than the total length of left and right, and the precondition of merge in the recursive calls is satisfied (it is being applied to sorted lists). If the then branch executes, y must be smaller than any element in either list; therefore, y::(merge(left, rest_right)) is a sorted list. Conversely, in the else branch x is smaller than or equal to any element in either list; therefore x::(merge(rest_left, right)) is also a sorted list. And we can see that merge doesn't "lose" any elements of left or right assuming that the recursive calls don't either.

Now we can write the merge-sort function itself. Note how we explicitly separate the specification of the function from the description of the algorithm that implements it. With merge and split specified as above, we don't really need even this much description of how merge_sort works.

(* merge_sort(xs) is a list containing the same elements as xs but in
 * ascending (nondescending) sorted order.
 *
 * Implementation: lists of size 0 or 1 are already sorted. Otherwise,
 * split the list into two lists of equal size, recursively sort
 * them, and then merge the two lists back together. *)
fun merge_sort (xs:int list): int list =
  case xs of
    [] => []   
  | [x] => [x]
  | _ => let val (left, right) = split xs
	 in 
           merge (merge_sort(left), merge_sort(right))
	 end

Again, we can see by induction on the length of the input list that this function works. For lists of length 0 or 1 it clearly works. For larger lists we observe from the specification for split that both left and right must contain some elements and together they contain all the elements of xs. By the inductive hypothesis, merge_sort applied to each of these lists results in sorted lists. From the specification for merge the result must be a sorted list containing all the elements of xs. Therefore the merge_sort function will work correctly.

Merge sort asymptotic timing analysis

Now let's show that merge_sort is not only a correct but also an efficient algorithm for sorting lists of numbers. We start by observing without proof that the performance of the split function is linear in the size of the input list. This can be shown by the same approach we will take for merge, so let's just look at merge instead.

The merge function too is linear-time -- that is, O(n) -- in the total length of the two input lists. We will first find a recurrence for the execution time. Suppose the total length of the input lists is zero or one. Then the function must execute one of the two O(1)  arms of the case expression. These take at most some time c0 to execute. So we have

T(0) = c0
T(1) = c0

Now, consider lists of total length n. The recursive call is on lists of total length n−1, so we have

T(n) = T(n1) + c1

where c1 is an constant upper bound on the time required to execute the if statement and the operator :: (which takes constant time for usual implementations of lists). This gives us a recurrence to solve for T.  We can apply the iterative method to solve the recurrence by expanding out the recurrence inequalities for the first few steps. 

T(0) = c0
T(1) = c0
T(2) = T(1) + c1 = c0 + c1
T(3) = T(2) + c1 = c0 + 2c1
T(4) = T(3) + c1 = c0 + 3c1
...
T(n) = T(n−1) + c1 = c0 + (n1)c1 = (c0 - c1) + c1n

We notice a pattern which the last line captures. Recall that T(n) is O(n) if for all n greater than some n0, we can find a constant k such that T(n) < kn. For n at least 1, this is easily satisfied by setting k = c0 + 2c1. Or we can just remember that any first-degree polynomial is O(n) and also Θ(n). An even simpler way to find the right bound is to observe that the choice of constants c0 and c1 doesn't matter (but they need to be positive, at least 1); if we plug in 1 for both of them we get T(1) = 1, T(2)=2, T(3)=3, etc., which is clearly O(n).

Now let's consider the merge_sort function itself. Again, for zero- and one-element lists we compute in constant time. For n-element lists we make two recursive calls, but to sublists that are about half the size, and calls to split and merge that each take Θ(n) time. For simplicity we'll pretend that the sublists are exactly half the size. The recurrence we obtain has this form:

T(0) = c0
T(1) = c0
T(n) = 2 T(n/2) + c1n +  c2n + c3

Let's use the iterative method to figure out the running time of merge_sort. We know that any solution must work for arbitrary constants c0 and c4, so again we replace them both with 1 to keep things simple. That leaves us with the following recurrence equations to work with:

T(1) = 1
T(n) = 2 T(n/2) + n

Starting with the iterative method, we can start expanding the time equation until we notice a pattern:

T(n) = 2T(n/2) + n
     = 2(2T(n/4) + n/2) + n
     = 4T(n/4) + n + n
     = 4(2T(n/8) + n/4) + n + n
     = 8T(n/8) + n + n + n
     = nT(n/n) + n + ... + n + n + n
     = n + n + ... + n + n + n

Counting the number of repetitions of n in the sum at the end, we see that there are lg n + 1 of them.  Thus the running time is n(lg n + 1) = n lg n + n. We observe that n lg n + n < n lg n + n lg n = 2n lg n for n>0, so the running time is O(n lg n).  This fact can be proved using strong induction, as follows.

Consider n0 = 2.

Property of n to prove:

For n>n0, there exists
T(n) = n lg n + n

Proof by strong (course-of-values) induction on n

Base case: n = 1
    T(1) = 1 = 1 lg 0 + 1

Induction Step:

Induction Hypothesis:
T(k) = k lg k + k         for all kn

Property to prove for n+1:
T(n+1) = (n+1) lg (n+1) + (n+1)

Proof:

T(n+1) = 2 T((n+1)/2) + (n+1)

    = 2 ((n+1)/2 lg ((n+1)/2) + (n+1)/2) + (n+1)             (by induction hypothesis)

    = (n+1)(lg ((n+1)/2)) + (n+1) + (n+1)

    = (n+1)(lg(n+1) − 1) + 2(n+1)

    = (n+1) lg(n+1) + (n+1)


Thus we have shown that merge sort is Θ(n lg n).

Example: Another sorting algorithm

The following function sorts the first two-thirds of a list, then the second two-thirds, then the first two-thirds again:

fun sort3(a: int list): int list =
  case a of
    nil => nil
  | [x] => [x]
  | [x,y] => [Int.min(x,y), Int.max(x,y)]
  | a => let
      val n = List.length(a)
      val m = (2*n+2) div 3
      val res1 = sort3(List.take(a, m))
      val res2 = sort3(List.drop(res1, n-m) @
                       List.drop(a, m))
      val res3 = sort3(List.take(res1, n-m) @
                       List.take(res2, 2*m-n))
    in
      res3 @ List.drop(res2,2*m-n)
    end

Perhaps surprisingly, this algorithm actually does sort the list. We will leave the proof that it sorts correctly as an exercise to the reader. Its run time, on the other hand, we can derive from its recurrence. The routine does some O(n) work and then makes three recursive calls on lists of length 2n/3. Therefore its recurrence is:

T(n) = cn + 3T(2n/3)

Let's try plugging in possible solutions. How about F(n) = n lg n? Substituting into the right side we have

   cn + 3kF(2n/3)
= cn + 3k(2n/3) lg (2n/3)
= cn + 2kn lg n − 2kn lg (2/3)

= cn + 2kn lg n + 2kn lg (3/2)

There is no way to choose k to make the left side (kn lg n) larger, so the algorithm is not O(n lg n); we must try a higher order of growth.

By plugging in kn2 and kn3 for T(n), we find that kn2 grows strictly more slowly than T(n) and kn3 grows strictly more quickly. We can solve for the correct exponent x by plugging in knx:

    cn + 3T(2n/3)
cn + 3k(2/3)xnx

This will be asymptotically less than knx as long as 3(2/3)x > 1 , which requires x > lg3/2 3 = 2.7095. Define this as a = lg3/2 3. Then we can see that the algorithm is O(na) for any positive ε. Let's try O(na) itself. The RHS after substituting kna is cn + 3(2/3)akna = cn + kna kna. This tells us that  kna is an asymptotic lower bound on T(n): T(n) is Ω(na). So the complexity is somewhere between Ω(na) and O(na). It is in fact Θ(na).

To show that the complexity is O(na), we need to use a refinement of the substitution method. Rather than trying F(n) = na, we will try  F(n) = na + bn where b is a constant to be filled in later. The idea is to pick a b so that bn will compensate for the cn  term that shows up in the recurrence. Because bn is O(na), showing T(n) is O(na + bn) is the same as showing that it is O(na). Substituting kF(n)for T(n) in the RHS of the recurrence, we obtain:

   cn + 3kF(2n/3)
= cn + 3k((2n/3)a + b(2n/3))
= cn + 3k(2n/3)a + 3kb(2n/3)
= cn + kna + 2kbn
= kna + (3kb+c)n

The substituted LHS of the recurrence is  kna + kbn, which is larger than  kna + (2kb+c)n as long as kb>2kb+c, or b<−c/k. There is no requirement that b be positive, so choosing k=1, b= −1 satisfies the recurrence. Therefore T(n) = O(na + bn) = O(na), and since T(n) is both O(na) and Ω(na), it is Θ(na).


Lower bounds on sorting performance

It turns out that no sorting algorithm can have asymptotic running time lower than O(n lg n), and thus other than constant factors in running time, merge sort is as good an algorithm as we can expect for sorting general data. Its constant factors are also pretty good, so it's a useful algorithm in practice. We can see that O(n lg n) time is needed by thinking about sorting a list of n distinct numbers. There are n! = n×(n1)×(n2)×...×3×2×1 possible lists, and the sorting algorithm needs to map all of them to the same sorted list by applying an appropriate inverse permutation. For general data, the algorithm must make enough observations about the input list (by comparing list elements pairwise) to determine which of the n! permutations was given as input, so that the appropriate inverse permutation can be applied and sort the list. Each comparison of two elements to see which is greater generates one bit of information about which permutation was given; at least lg(n!) bits of information are needed. Therefore the algorithm must take at least O(lg(n!)) time. It can be seen easily that n! is O(nn); note that lg nn=n lg n. With a bit more difficulty a stronger result can be shown: lg(n!) is Θ(n lg n). Therefore merge sort is not only much faster than insertion sort on large lists, it is actually optimal to within a constant factor! This shows the value of designing algorithms carefully.

Note: there are sorting algorithms for specialized inputs that have better than O(n lg n) performance: for example, radix sort. This is possible because radix sort doesn't work by comparing elements pairwise; it extracts information about the permutation by using the element itself as an index into an array. This indexing operation can be done in constant time and on average extracts lg n bits of information about the permutation. Thus, radix sort can be performed using O(n) time, assuming that the list is densely populated by integers or by elements that can be mapped monotonically and densely onto integers. By densely, we mean that the largest integer in a list of length n is O(n) in size. By monotonically we mean that the ordering of the integers is the same as the ordering of the corresponding data to be sorted. In general we can't find a dense monotonic mapping, so Θ(n lg n) is the best we can do for sorting arbitrary data.