These notes cover lectures for both 2/10 and 2/12: Note availability of handout * Order of Growth Today * Understanding the resource requirements of programs We've been reasoning about programs a little before: [Think computationally] * CORRECTNESS - A program that does the wrong thing is utterly useless. - Substitution model - mathematical induction * EFFICIENCY - A program that does the right thing too slowly isn't so good either. - Analysis of Algorithms We will apply these methods to programs throughout the semester. OVERALL GOALS: * Determine the space and time requirements of our algorithms. * computational complexity (named by Hartmanis) Complexity is measured in terms of the SIZE of the input. * Traditionally called `n' * For example, if input is a list of names to sort n = number of names in the list. * If input is a number, size can either be the *magnitude* of the number or the number of *digits* in it. IT IS IMPORTANT TO KNOW WHICH, they are very different: there are log_10(x) digits for a decimal number of magnitude x. (Try writing 10^20 in unary!) For today we'll do the analysis in terms of the magnitude of the number itself GOAL: To be able to compare the efficiency of algorithms/programs independent of a given machine or a given input. QUEST: Find some function R(n) that measures the resources needed to run it on an input of size n. To simplify the analysis, we investigate *asymptotic* resource requirements. * How they behave as n gets large Example of an asymptote: y = sqrt(1+x^2) x = 0, y = 1 x = 1, y = sqrt(2) ~= 1.4 x = 10, y = 10.05 x = 100, y = 100.005 x = 1000, y = 1000.0005, you get the point >>> Picture: Cartesian coords, and a hyperbola asymptotic to x=y, >>> x=-y. The lines y=x and y=-x are asymptotes: sqrt(1+x^2) ---> x as x ---> infinity. We generalize this notion of asymptote to mean * An approximation that gets CLOSER TO THE TRUTH in the limit Functions generally have different asymptotic rates of growth. x and x^2 both go to infinity as x --> infinity, BUT x^2 goes there a *lot* faster. We can measure this by [note << is scripty-<]: f << g iff f(n) lim ---- = 0 g(n) (limits are to "n to infinity" unless otherwise mentioned.) << = "grows more slowly than" Here are some standard functions (c is any constant): 1 << log n << n^c << c^n << n^n REMINDER: Think BIG. Sometimes you need to take really large n: log n << n^0.0001 Try a = 10^100, we don't have log a < a^0.0001 log a = 100, a^0.0001 is about 1.02 But b = 10^(10^100). Then log b = 10^100, b^0.0001 is about 10^(10^96) [Does anyone know the name for b?] WARNING: You *cannot* generally tell f< C for some constant C; this generalizes <<, and also handles x^2 + 1 vs x^2 + 2 What we do instead is a slightly stronger condition (how so?) DEF: f is O(g) -- pronounced, "f is order g" -- when there is a constant c such that f(n) <= c g(n) for all n >= 1. MEANS: f is at most g. Kind of a <=. WARNING: Often written f = O(g) This is an abuse of `='. This is in NO WAY an equality, and careful people write it f is in O(g) EXAMPLES: f(n) = (1/4)n^3 + 27n^2 + n + 1000 f is O(n^3) Could determine that 1/4 n^3 < c n^3 for all n>=1 when c >= 1. 27 n^2 < c n^3 for all n>=1 when c >= 27. etc. THIS is too much work. We don't need to compute these constants as long as we know they exist. Why don't we care about the constants? Well, the truth of the matter is that we rarely know what they are. How many instructions does it take on a PowerPC to run the Noodle expression (+ x y)? We'd have to count how many Java VM instructions are used to implement this, then count for each of the Java VM instructions, how many Motorola 68000 instructions are used to implement them, and then count how many PowerPC instructions are used to implement them and so on. And this just tells us how many instructions it took -- figuring out how much time it takes on a modern processor is hard (consider superscalar, pipelined CPUs with caches and so forth.) The point is we don't know the constants, and the constants could change. All we know is that they're _some_ constants and so we should abstract away from them. Here are some rules for manipulating O()-notation: [Proofs: see CS 280 and CS 410] 1. r n^a = O(n^b) whenever a <= b, for any r (note r, a, b independent of n) "Multiplicative constants don’t matter" Another consequence: n^2 = O(n^3) as well as O(n^4), in fact its any order larger than O(n^2). We want the LEAST BOUND, which O(n^2) 2. O(f) + O(g) = O(f+g) These two rules let us simplify that polynomial example pretty easily. [Do this in section!] 3. f = O(f) 4. O(a) = O(b) if a and b are any two constants. Just write O(1) for "essentially constant" 5. log_b(n) = O(log n) for ANY b (independent of n). Note: CSists usually take logs to the base 2: log 8 = 3 Mathematicians like base e Engineers I think like base 10 But, for order of growth, it doesn't matter. They differ by a *constant* *factor*, and O()-notation drops constant factors. log_2(n) = log_10(n) * log_2(10) is O(log n) This stuff is pretty old == 1800's (Euler, I believe) With << and O(.) at our disposal, let's look at the complexity of times-1 from a recent lecture: (define (times-1 ) (method ((a ) (b )) (if (= b 0) 0 (+ a (times-1 a (- b 1)))))) >> Recall that this is a recursive process Let's compute the order of growth as a function of the magnitude of b (not the number of digits). The operations involved are: * times-1 * built-ins, like if and = and + Most built-in primitives are CONSTANT cost, O(1). * To apply them once! * This is true in 212 of arithmetic primitives and conditionals * SOME PRIMITIVES AREN'T CONSTANT COST (none you’ve seen yet…) [A point that some students forget on problem sets and exams. We'll show you some 2 lectures from today.] * Note: for induction we assume them sound * Also note that even for arithmetic operations this isn't really right (See CS410/421?) but it’s OK for CS212. The big question is, how many times does times-1 get called? * We could trace through the substitution model and count calls by hand. * That's a real pain, so we just try expressing the running time as a function T(b). (You could back this up by working with the substitution model if you like.) T(b) = T(b-1) + c ---- if b>0 T(0) = c c's are just O(1)'s, constants or about that. These equations are RECURRENCE RELATIONS * basically, define T on big numbers in terms of T on smaller ones. * Kind of like a recursive definition. We can sometimes *solve* these: T(b) = T(b-1) + c = T(b-2) + c + c = ... = T(b-b) + c + c ... + c ;; b c's = T(0) + bc = (b+1)c But we usually don't care about the solutions at this level of detail, since we're just going to blur all the details with O()-notation anyways: (b+1)c = O(b) Why? Because we are interested in the complexity as a function of b! It's cb+c and we basically ignore constant summands and factors. For this class, it'll help to learn some recurrence patterns * Like the linear recurrence we just saw. * O(n), O(log n), O(n^2), O(n log n) [do some in section] Pattern 1 T(n)=T(n-k)+c T(0)=c is O(n), where k, c are constants Here's another one: (define (fast-times ) (method ((a ) (b )) (cond ((= b 0) 0) ((even? b) (fast-times (double a) (halve b))) (else: (+ a (fast-times a (- b 1))))))) even?, =, double, halve, +, 1 are all constant-time (why?) T(0) = c T(b) = T(b/2) + c if b is even > 0 T(b) = T(b-1) + c if b is odd Consider the easy case, b is a power of 2. b = 2 ^ (log_2 b) T(b) = T(b/2) + c = T(b/4) + c + c = T(b/8) + c + c + c .. = T(1) + (log_2 b) * c = T(0) + (1 + log_2 b) * c = (2+log_2 b) * c = O(log b) This general pattern * Make a problem smaller by a multiplicative factor is a common pattern for algorithms. >> DIVIDE AND CONQUER << [see CS410] It typically has logarithmic growth, which is a very good thing: O(log n) is *much* smaller than O(n) log_10(10^100) = 100 10^100 = 10,000,000,000... Pattern 2 T(n)=T(n/k)+c T(0)=c is O(log n) We still have to finish the analysis for fast-times when n is not a power of two. * At worst, we double the number of recursive calls, even if we take the else-branch every other time. n-1 and n-2 can't *both* be odd. * So, at worst, T(b) is 2c(2+log b), which is still O(log b). So, we're estimating all over the place. * Are we losing too much detail? NO: it's a good model. Running times for fast-times and times-1 on an antique Sparc: times-1 fast-times b 0.017 0 2 0.017 0.033 10 0.25 0.05 100 2.45 0.067 1000 24.283 0.067 10000 Analyzing running time of our programs (asymptotic analysis, as n gets large): * << "grows more slowly than" * f = O(g) "f is of order g" * Assume constant time primitives * Recurrences for expressions