Order of Growth and Big-O Notation
In estimating the running time of insert_sort
(or any other program) we don't
know what the constants c or k are. We know that it is a constant
of moderate size, but other than that it is not important; we have enough
evidence from the asymptotic analysis to know that a merge_sort
(see below) is
faster than the quadratic insert_sort
, even though the constants may differ
somewhat. (This does not always hold; the constants can sometimes make a
difference, but in general it is a very good rule of thumb.)
We may not even be able to measure the constant c directly. For example, we may know that a given expression of the language, such as if, takes a constant number of machine instructions, but we may not know exactly how many. Moreover, the same sequence of instructions executed on a Pentium IV will take less time than on a Pentium II (although the difference will be roughly a constant factor). So these estimates are usually only accurate up to a constant factor anyway. For these reasons, we usually ignore constant factors in comparing asymptotic running times.
Computer scientists have developed a convenient notation for hiding the constant factor. We write O(n) (read: ''order n'') instead of ''cn for some constant c.'' Thus an algorithm is said to be O(n) or linear time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn on inputs of size n. An algorithm is said to be O(n2) or quadratic time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn2 on inputs of size n. O(1) means constant time.
Polynomial time means nO(1), or nc for some constant c. Thus any constant, linear, quadratic, or cubic (O(n3)) time algorithm is a polynomial-time algorithm.
This is called big-O notation. It concisely captures the important differences in the asymptotic growth rates of functions.
One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time
10n3 + 24n2 + 3n log n + 144
is still a cubic algorithm, since
10n3 + 24n2 + 3n log n + 144
<= 10n3 + 24n3 + 3n3 + 144n3
<= (10 + 24 + 3 + 144)n3
= O(n3).
Of course, since we are ignoring constant factors, any two linear algorithms will be considered equally good by this measure. There may even be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant may be preferable in practice. This is a valid criticism of asymptotic analysis and big-O notation. However, as a rule of thumb it has served us well. Just be aware that it is only a rule of thumb--the asymptotically optimal algorithm is not necessarily the best one.
Some common orders of growth seen often in complexity analysis are
O(1) | constant |
O(log n) | logarithmic |
O(n) | linear |
O(n log n) | "n log n" |
O(n2) | quadratic |
O(n3) | cubic |
nO(1) | polynomial |
2O(n) | exponential |
Here log means log2 or the logarithm base 2, although the logarithm base doesn't really matter since logarithms with different bases differ by a constant factor. Note also that 2O(n) and O(2n) are not the same!
Analyzing Running Times of Procedures
Now we can use these ideas to analyze the asymptotic running time of SML functions. The use of order notation can greatly simplify our task here. We assume that the primitive operations of our language, such as arithmetic operations and pattern matching, all take constant time (which they do):
Consider the following multiplication routine:
fun times1 (a:int, b:int):int = if (b = 0) then 0 else a + times1(a,b-1)
What is the order of growth of the time required by times1 as a function of n, where n is the magnitude of the parameter b? Note that the "size" of a number can be measured either in terms of its magnitude or in terms of the number of digits (the space it takes to write the number down). Often the number of digits is used, but here we use the magnitude. Note that it takes only about log10 x digits to write down a number of magnitude x, thus these two measures are very different.
We assume that all the primitive operations in the times1 function if, +, =, and -) and the overhead for function calls take constant time. Thus if n=0, the routine takes constant time. If n>0, the time taken on an input of magnitude n is constant time plus the time taken by the recursive call on n-1. In other words, there are constants c1 and c2 such that T(n) satisfies
T(n) = T(n-1) + c1 | for n > 0 |
T(0) = c2 |
This is called a recurrence relation. It simply states that the time to multiply a number a by another number b of size n > 0 is the time required to multiply a by a number of size n-1 plus a constant amount of work (the primitive operations performed).
This recurrence relation has a unique closed form solution, namely
T(n) = c2 + c1n
which is O(n), so the algorithm is linear in the magnitude of b
. One can obtain this equation by generalizing from small values
of n, then prove that it is indeed a solution to the recurrence relation
by induction on n.
Now consider the following procedure for multiplying two numbers:
fun times2(a:int, b:int):int = if (b = 0) then 0 else if even(b) then times2(double(a), half(b)) else a + times2(a, b-1)
Again we want an expression for the running time in terms of n, the magnitude of the parameter b. We assume that double and half operations are constant time (these could be done in constant time using arithmetic shift) as well as the standard primitives. The recurrence for this problem is more complicated than the previous one:
T(n) = T(n-1) + c1 | if n > 0 and n is odd |
T(n) = T(n/2) + c2 | if n > 0 and n is even |
T(0) = c3 |
We somehow need to figure out how often the first versus the second branch of this recurrence will be taken. It's easy if n is a power of two, i.e. if n = 2m for some integer m. In this case, the second branch of will only get taken when n = 1, because 2m is even except when m = 0, i.e. when n = 1. Note further that T(1) = O(1) because T(1) = T(0) + O(1) = O(1) + O(1) = O(1). Thus, for this special case we get the recurrence
T(n) = T(n/2) + c2 | if n > 0 and n is a power of 2 |
T(0) = c3 |
or
T(n) = T(n/2) + c2 | for n > 0 and n is a power of 2 |
T(0) = c3 |
for some constants c2 and c3. For powers of 2, the closed form solution of this is:
T(n) = c3 + c2 log2 n
which is O(log n).
What if n is not a power of 2? The running time is still O(log n) even in this more general case. Intuitively, this is because if n is odd, then n-1 is even, so on the next recursive call the input will be halved. Thus the input is halved at least once in every two recursive calls, which is all you need to get O(log n).
A good way to handle this formally is to charge to the cost of a call to times2 on an odd input the cost of the recursive call on an even input that must immediately follow it. We reason as follows: on an even input n, the cost is the cost of the recursive call on n/2 plus a constant, or
T(n) = T(n/2) + c2
as before. On an odd input n, we recursively call the procedure on n-1, which is even, so we immediately call the procedure again on (n-1)/2. Thus the total cost on an odd input is the cost of the recursive call on (n-1)/2 plus a constant. In this case we get
T(n) = T((n-1)/2) + c1+ c2
In either case,
T(n) ≤ T(n/2) + (c1 + c2)
whose solution is still O(log n). This approach is more or less the
same as explicitly unwinding the else
clause that handles odd inputs:
fun times2(a:int, b:int):int = if (b = 0) then 0 else if even(b) then times2(double(a), half(b)) else a + times2(double(a), half(b-1))
then analyzing the rewritten program, without actually doing the rewriting.
Charging one operation to another (bounding the number of times one thing can happen by the number of times that another thing happens) is a common technique for analyzing the running time of complicated algorithms.
Order notation is a useful tool, and should not be thought of as being just a theoretical exercise. For example, the practical difference in running times between the logarithmic times1 and the linear times2 is noticeable even for moderate values of n.
The key points are:
- We can use the asymptotic growth rates of functions (as n gets large) to bound the resources required by a given algorithm and to compare the relative efficiency of different algorithms.
- Big-O notation provides a way of expressing rough bounds on the resources required in a form that is meaningful yet easy to work with.
- Recurrence relations can be used to express the running times of recursive programs, and can often be solved for a closed form expression of the running time.
Exercises
1. Write an implementation of stacks using lists. What is the big-O running time of each operation? The signature for stacks is:
signature STACK = sig type 'a stack val empty : unit -> 'a stack val push : ('a * 'a stack) -> 'a stack val top : 'a stack -> 'a option val pop : 'a stack -> ('a stack) option end
2. Write an implementation of queues using lists. What is the big-O running time of each operation? The signature for queues is:
signature QUEUE = sig type 'a queue val empty : unit -> 'a queue val insert : ('a * 'a queue) -> 'a queue val first : 'a queue -> 'a option val rest : 'a queue -> 'a option end
3. Write some of the functions that occur in the List structure by hand (e.g., rev, @, map, foldl, etc.) and analyze them to determine their big-O running time.