3/06: 2. (3/08 preview) analyzing program performance
_______________________________________________________________________________

2.1 terminology and concepts for analyzing program performance

+ we typically analyze speed and memory usage as functions of the "size"
  of the input data.  how we measure "size" can vary from situation to
  situation.

+ the function for speed is the *(run(ning))time* *complexity* of the program

+ the function for memory is the *space* *complexity* of the program

+ note: the term *complexity* is partially explained below.

+ for memory used for data storage, we roughly count the maximum
  number of scalar values that need to be stored.

+ smaller numbers are better: less time (faster) or less space (smaller)

+ bonus/technical definition that you are not required to know but
  might find helpful: *performance* = multiplicative inverse (1/function):

      smaller function ==> bigger and better performance

+ for runtime, we roughly count the number of *flops* (FLoating point
  OPerations) or number of "steps" (e.g. the number of lines of code)
  executed.

+ part of "roughly count" is to "over-estimate to make calculations simpler"

+ we typically care only "how quickly" these functions grow as $n$
  gets large.  

+ roughly speaking, when functions have the same relative growth pattern,
  we say they have the same *complexity* or belong to the same
  *complexity class* ("class" = "set").

+ "how quickly" can often be thought of as the answer to the question:

  "what is the factor of increase is there if $n$ is multiplied by 10?"

+ this emphasis means we can concentrate on "dominant terms" and ignore
  constant factors:        

  + *constant factor* means an actual, fixed, constant number, 
    not a variable or a quantity/function that depends on $n$.

                       3           3
                3     n           n      2
  + functions $n $, $----$, and $---- + n $ are about equally good/bad: 
                      10          10

    for large $n$, substituting $n <-- 10 n$ results in about the same factor
    slowdown: about 1000 times slower!

               3
      3       n
    $n $ vs $----$: identical factor slowdown, same relative growth pattern:
              10    we don't care about the constant factor $1/10$ 
                
       3         3
      n         n      2
    $----$ vs $---- + n $: diminishingly small relative error as $n$ grows,
      10        10         so virtually indistinguishable for large $n$:
                           we don't care about the dominated term $n^2$ 
                           or the constant factor $1/10$ 

    (who cares about the difference between 1 year versus 1 year + 1 minute?)

                   2
  + function $100 n $ is, in the long run, better than the 3 functions above:

    + maybe worse for "small" $n$, but often "small" $n$ is not a
      problem: often only "big" $n$ are a problem

    + substituting $n <-- 10 n$ results in only a factor 100 slowdown,
      so this function has a slower growth pattern, so its performance
      (smaller is better) catches up and overtakes the functions above.
_______________________________________________________________________________

2.2 concrete example comparing $100 n^2$ and $n^3/10$:
                                                              9
computers these days (1 gigahertz clock speed) can do about 10  (1 billion)
steps per second.  let us compare $100 n^2$ versus $n^3/10$ steps and
say anything that takes less than a second in practice still takes at
least one second, e.g. for us to type a command or double-click a program. 

                      3
             2       n
        100 n   vs  ----  
                     10                             -1
        +---------------+  (usually can't perform 10  = 1/10 step, 
      0 |   2         -1|                                       but nevermind)
n = 10  | 10    >   10  |    1 sec = 1 sec
        |---------------|
      1 |   4         2 |
n = 10  | 10    >   10  |    1 sec = 1 sec
        |---------------|
      2 |   6         5 |
n = 10  | 10    >   10  |    1 sec = 1 sec
        |---------------|
      3 |   8         8 |
n = 10  | 10    =   10  |    1 sec = 1 sec
        |---------------|
      4 |   10        11|
n = 10  | 10    <   10  |   10 sec < 100 seconds
        |---------------|
      5 |   12        14|
n = 10  | 10    <   10  | 1/4 hour < 1 day
        |---------------|
      6 |   14        17|
n = 10  | 10    <<  10  |    1 day << 3 years
        |---------------|
      7 |   16        20|
n = 10  | 10    <<  10  | 4 months << 3000 years
        +---------------+

if we change the coefficients, we will change the crossover point, but
the overall pattern is the same: a quadratic function (e.g. $n^2$ or
$100 n^2$) is always eventually better than a cubic function
(e.g. $n^3$ or $n^3 / 10$).

that is, we can ignore constant factors (and dominated terms) to get
the high-level picture for large $n$.  this idea is capture by *big-Oh
notation*.
_______________________________________________________________________________

2.3 big-Oh notation ("big-Oh" stands for "the capital letter `O'"):

+ we consider only functions whose values always positive.

+ since we ignore some details (dominated terms and constant factors),
  we are treating many different functions as if they were the same.

+ $O(f(n))$ is the the *complexity class* (i.e. set) of functions
  that are equal to $f(n)$ OR smaller than $f(n)$ from this point of view
  (namely, ignoring dominated terms and constant factors).

+ bonus/technical definition that you are not required to know but
  might find helpful if you are comfortable with limit-like things:

  $g(n)$ is in the complexity class $O(f(n))$ if there are constants 
  $N_g>0$ and $k_g>0$ (note: the constants depend on $g$!) such that
                  
      for all large $n$ (i.e. all $n >= N_g$): g(n) < k_g f(n) is true.

examples:

+ $O(1)$ is the complexity class "constant functions", e.g.
  $g(n) = 3$        ($g$ returns 3 no matter what $n$ is) and
  $h(n) = 1000000$  ($h$ returns 3 no matter what $n$ is)

+ $O(n^3)$, $O(n^3/10)$, and $O(n^3/10+n^2)$ are the same complexity class
  (since we ignore dominated terms and constant factors)

+ $O(n^2)$ and $O(100 n^2)$ are the same complexity class.

+ $O(1000) = O(1)$ is better than
  $O(100 n) = O(n)$, which is better than
  $O(10 n^2) = (n^2)$, which is better than
  $O(n^3)$.

notes: be a little careful about throwing away "dominated terms" or
       "constant" factors before you get the final answer
_______________________________________________________________________________

2.4 simple examples of analyzing programs


example 1: $O(1)$ time and space complexity

                                number of times each line is executed:
    x = 2;                       1 +
    y = 3;                       1 +
    a = x*y + 10;                1 +
    z = .5 * (z + a/z);          1 +
    z = .5 * (z + a/z);          1 +
    z = .5 * (z + a/z);          1 = 6

+ space
  + store at most 4 numbers at a time: $x$, $y$, $a$, $z$
  + $O(4) = O(1)$

+ time
  + steps: $O(6) = O(1)$
  + flops: $O(11) = O(1)$
    + 4 multiplications + 4 additions + 3 divisions = 11 flops
  + note: same answer (complexity class) for #steps and #flops!

example 2: $O(1)$ space, $(n)$ time

    function x = foo(n)         number of times each line is executed:
        x = 0;                   1
        i = 0;                   1
        while i ~= n             1+n (for i = 0..n = 0 and 1..n)
            x = x + 2*i;         n   (for i = 0..n-1)
            i = i+1;             n   (for i = 0..n-1)
        end                      n?  (for i = 0..n-1)

+ space:
  + store at most 3 numbers at a time: $x$, $n$, $i$
  + $O(3)$ = $O(1)$

+ time, hard way:
  + steps: $O(3n+3) = O(4n+3) = O(n)$
  + flops: $O(3n) = O(n)$
    + $n$ multiplications $2*i$
    + $n$ additions $x + 2*i$     
    + $n$ additions $i+1$
  + note: same answer (complexity class) for #steps and #flops!
  + the complexity class is *robust*, 
    i.e. resists change when "small details" are changed:
    + count $end$s as being executed or not?
    + count #flops or #steps?
_______________________________________________________________________________

2.5 easier way to get time complexity (when you get used to it):

+ key insight: loops (and function calls) are what make programs slow!

  without loops (or function calls), 
  programs can run only a constant number of steps,
  which is fast compared to anything that depends on $n$, e.g. $O(n^2)$.

+ [3/13] start counting from outer loops and work your way in

+ count steps, ignoring constant factors
  + but do NOT ignore dependencies on input values!


example 2, time complexity revisited:

+ loop guard tested once, plus possibly one iteration of loop body: 
  number of steps is bounded some constant, say, $C$

    i ~= n
        x = x + 2*i;
        i = i+1;

+ loop, overall: $C * (n+1)$
  + do guard+body combination at most $n+1$ times
  + each time costs at most $C$

+ outside loop: 
  + constant amount of work: dominated by $C (n+1)$ so ignore it

+ throw away constant factor $C$ from $C (n+1)$

+ throw away $1$ from $n+1$ since dominated by $n$ 

+ final answer: $O(n)$
______________________________________________________________________________

2.6 analyzing programs that process a sequence, e.g. an input sequence

+ usually we consider the size $n$ of the input data to be the length
  of the input sequence.  (rather than, say, the sum of the absolute
  values of the input values.)

+ for processing a sequence, usually we have to *inspect*
  --"look at", i.e. read from the vector or read as input from the user--
  each value: that is already $n$ steps.  

+ thus, the best we can do is linear, i.e. $O(n)$, i.e. proportional to
  the size of the input sequence: substituting $n <- 10 n$ yields a factor
  10 slowdown, the same factor by which we increased the size of the input.