3/06: 2. (3/08 preview) analyzing program performance _______________________________________________________________________________ 2.1 terminology and concepts for analyzing program performance + we typically analyze speed and memory usage as functions of the "size" of the input data. how we measure "size" can vary from situation to situation. + the function for speed is the *(run(ning))time* *complexity* of the program + the function for memory is the *space* *complexity* of the program + note: the term *complexity* is partially explained below. + for memory used for data storage, we roughly count the maximum number of scalar values that need to be stored. + smaller numbers are better: less time (faster) or less space (smaller) + bonus/technical definition that you are not required to know but might find helpful: *performance* = multiplicative inverse (1/function): smaller function ==> bigger and better performance + for runtime, we roughly count the number of *flops* (FLoating point OPerations) or number of "steps" (e.g. the number of lines of code) executed. + part of "roughly count" is to "over-estimate to make calculations simpler" + we typically care only "how quickly" these functions grow as $n$ gets large. + roughly speaking, when functions have the same relative growth pattern, we say they have the same *complexity* or belong to the same *complexity class* ("class" = "set"). + "how quickly" can often be thought of as the answer to the question: "what is the factor of increase is there if $n$ is multiplied by 10?" + this emphasis means we can concentrate on "dominant terms" and ignore constant factors: + *constant factor* means an actual, fixed, constant number, not a variable or a quantity/function that depends on $n$. 3 3 3 n n 2 + functions $n $, $----$, and $---- + n $ are about equally good/bad: 10 10 for large $n$, substituting $n <-- 10 n$ results in about the same factor slowdown: about 1000 times slower! 3 3 n $n $ vs $----$: identical factor slowdown, same relative growth pattern: 10 we don't care about the constant factor $1/10$ 3 3 n n 2 $----$ vs $---- + n $: diminishingly small relative error as $n$ grows, 10 10 so virtually indistinguishable for large $n$: we don't care about the dominated term $n^2$ or the constant factor $1/10$ (who cares about the difference between 1 year versus 1 year + 1 minute?) 2 + function $100 n $ is, in the long run, better than the 3 functions above: + maybe worse for "small" $n$, but often "small" $n$ is not a problem: often only "big" $n$ are a problem + substituting $n <-- 10 n$ results in only a factor 100 slowdown, so this function has a slower growth pattern, so its performance (smaller is better) catches up and overtakes the functions above. _______________________________________________________________________________ 2.2 concrete example comparing $100 n^2$ and $n^3/10$: 9 computers these days (1 gigahertz clock speed) can do about 10 (1 billion) steps per second. let us compare $100 n^2$ versus $n^3/10$ steps and say anything that takes less than a second in practice still takes at least one second, e.g. for us to type a command or double-click a program. 3 2 n 100 n vs ---- 10 -1 +---------------+ (usually can't perform 10 = 1/10 step, 0 | 2 -1| but nevermind) n = 10 | 10 > 10 | 1 sec = 1 sec |---------------| 1 | 4 2 | n = 10 | 10 > 10 | 1 sec = 1 sec |---------------| 2 | 6 5 | n = 10 | 10 > 10 | 1 sec = 1 sec |---------------| 3 | 8 8 | n = 10 | 10 = 10 | 1 sec = 1 sec |---------------| 4 | 10 11| n = 10 | 10 < 10 | 10 sec < 100 seconds |---------------| 5 | 12 14| n = 10 | 10 < 10 | 1/4 hour < 1 day |---------------| 6 | 14 17| n = 10 | 10 << 10 | 1 day << 3 years |---------------| 7 | 16 20| n = 10 | 10 << 10 | 4 months << 3000 years +---------------+ if we change the coefficients, we will change the crossover point, but the overall pattern is the same: a quadratic function (e.g. $n^2$ or $100 n^2$) is always eventually better than a cubic function (e.g. $n^3$ or $n^3 / 10$). that is, we can ignore constant factors (and dominated terms) to get the high-level picture for large $n$. this idea is capture by *big-Oh notation*. _______________________________________________________________________________ 2.3 big-Oh notation ("big-Oh" stands for "the capital letter `O'"): + we consider only functions whose values always positive. + since we ignore some details (dominated terms and constant factors), we are treating many different functions as if they were the same. + $O(f(n))$ is the the *complexity class* (i.e. set) of functions that are equal to $f(n)$ OR smaller than $f(n)$ from this point of view (namely, ignoring dominated terms and constant factors). + bonus/technical definition that you are not required to know but might find helpful if you are comfortable with limit-like things: $g(n)$ is in the complexity class $O(f(n))$ if there are constants $N_g>0$ and $k_g>0$ (note: the constants depend on $g$!) such that for all large $n$ (i.e. all $n >= N_g$): g(n) < k_g f(n) is true. examples: + $O(1)$ is the complexity class "constant functions", e.g. $g(n) = 3$ ($g$ returns 3 no matter what $n$ is) and $h(n) = 1000000$ ($h$ returns 3 no matter what $n$ is) + $O(n^3)$, $O(n^3/10)$, and $O(n^3/10+n^2)$ are the same complexity class (since we ignore dominated terms and constant factors) + $O(n^2)$ and $O(100 n^2)$ are the same complexity class. + $O(1000) = O(1)$ is better than $O(100 n) = O(n)$, which is better than $O(10 n^2) = (n^2)$, which is better than $O(n^3)$. notes: be a little careful about throwing away "dominated terms" or "constant" factors before you get the final answer _______________________________________________________________________________ 2.4 simple examples of analyzing programs example 1: $O(1)$ time and space complexity number of times each line is executed: x = 2; 1 + y = 3; 1 + a = x*y + 10; 1 + z = .5 * (z + a/z); 1 + z = .5 * (z + a/z); 1 + z = .5 * (z + a/z); 1 = 6 + space + store at most 4 numbers at a time: $x$, $y$, $a$, $z$ + $O(4) = O(1)$ + time + steps: $O(6) = O(1)$ + flops: $O(11) = O(1)$ + 4 multiplications + 4 additions + 3 divisions = 11 flops + note: same answer (complexity class) for #steps and #flops! example 2: $O(1)$ space, $(n)$ time function x = foo(n) number of times each line is executed: x = 0; 1 i = 0; 1 while i ~= n 1+n (for i = 0..n = 0 and 1..n) x = x + 2*i; n (for i = 0..n-1) i = i+1; n (for i = 0..n-1) end n? (for i = 0..n-1) + space: + store at most 3 numbers at a time: $x$, $n$, $i$ + $O(3)$ = $O(1)$ + time, hard way: + steps: $O(3n+3) = O(4n+3) = O(n)$ + flops: $O(3n) = O(n)$ + $n$ multiplications $2*i$ + $n$ additions $x + 2*i$ + $n$ additions $i+1$ + note: same answer (complexity class) for #steps and #flops! + the complexity class is *robust*, i.e. resists change when "small details" are changed: + count $end$s as being executed or not? + count #flops or #steps? _______________________________________________________________________________ 2.5 easier way to get time complexity (when you get used to it): + key insight: loops (and function calls) are what make programs slow! without loops (or function calls), programs can run only a constant number of steps, which is fast compared to anything that depends on $n$, e.g. $O(n^2)$. + [3/13] start counting from outer loops and work your way in + count steps, ignoring constant factors + but do NOT ignore dependencies on input values! example 2, time complexity revisited: + loop guard tested once, plus possibly one iteration of loop body: number of steps is bounded some constant, say, $C$ i ~= n x = x + 2*i; i = i+1; + loop, overall: $C * (n+1)$ + do guard+body combination at most $n+1$ times + each time costs at most $C$ + outside loop: + constant amount of work: dominated by $C (n+1)$ so ignore it + throw away constant factor $C$ from $C (n+1)$ + throw away $1$ from $n+1$ since dominated by $n$ + final answer: $O(n)$ ______________________________________________________________________________ 2.6 analyzing programs that process a sequence, e.g. an input sequence + usually we consider the size $n$ of the input data to be the length of the input sequence. (rather than, say, the sum of the absolute values of the input values.) + for processing a sequence, usually we have to *inspect* --"look at", i.e. read from the vector or read as input from the user-- each value: that is already $n$ steps. + thus, the best we can do is linear, i.e. $O(n)$, i.e. proportional to the size of the input sequence: substituting $n <- 10 n$ yields a factor 10 slowdown, the same factor by which we increased the size of the input.