When analyzing the running time or space usage of programs, we usually try to
estimate the time or space as function of the input size. For example, when
analyzing the worst case running time of a function that sorts a list of
numbers, we will be concerned with how long it takes as a function of the length
of the input list. For example, we say the standard insertion sort takes
time *T*(*n*) where *T*(*n*)*= c*n*^{2}*+k * * *for some constants *c
and k*. In contrast, merge sort takes time *T* ′(*n*)* =
c*′**n*log*_{2}(*n*)
+ *k*′*.*

The **asymptotic** behavior of a function *f(n)* (such as *f(n)=c*n
*or *f(n)=c*n ^{2}*, etc.)

By this measure, a linear algorithm (*i.e., f(n)=d*n+k*) is always
asymptotically better than a quadratic one (*e.g., f(n)=c*n ^{2}+q*).
That is because for any given (positive)

When we say that an algorithm runs in time *T(n)*, we mean that *T(n)*
is an upper bound on the running time that holds for all inputs of size *n*.
This is called *worst-case analysis*. The algorithm may very well take less
time on some inputs of size *n*, but it doesn't matter. If an algorithm
takes *T(n)=c*n ^{2}+k* steps on only a single input of each size

A popular alternative to worst-case analysis is *average-case analysis*.
Here we do not bound the worst case running time, but try to calculate the
expected time spent on a randomly chosen input. This kind of analysis is
generally harder, since it involves probabilistic arguments and often requires
assumptions about the distribution of inputs that may be difficult to justify.
On the other hand, it can be more useful because sometimes the worst-case
behavior of an algorithm is misleadingly bad. A good example of this is the
popular quicksort algorithm, whose worst-case running time on an input sequence
of length *n* is proportional to *n*^{2} but whose expected running time is proportional to *n* log *n.*

In estimating the running time of ` insert_sort`

(or any other program) we don't
know what the constants *c* or *k *are. We know that it is a constant
of moderate size, but other than that it is not important; we have enough
evidence from the asymptotic analysis to know that a ` merge_sort`

(see below) is
faster than the quadratic `insert_sort`

, even though the constants may differ
somewhat. (This does not always hold; the constants can sometimes make a
difference, but in general it is a very good rule of thumb.)

We may not even be able to measure the constant *c* directly. For
example, we may know that a given expression of the language, such as `if`,
takes a constant number of machine instructions, but we may not know exactly how
many. Moreover, the same sequence of instructions executed on a Pentium IV will
take less time than on a Pentium II (although the difference will be roughly a
constant factor). So these estimates are usually only accurate up to a constant
factor anyway. For these reasons, we usually ignore constant factors in
comparing asymptotic running times.

Computer scientists have developed a convenient notation for hiding the
constant factor. We write *O(n)* (read: ''order *n*'') instead of ''*cn*
for some constant *c*.'' Thus an algorithm is said to be *O(n)* or *linear
time* if there is a fixed constant *c* such that for all sufficiently
large *n*, the algorithm takes time at most *cn* on inputs of size *n*.
An algorithm is said to be *O(n ^{2})* or

*Polynomial time* means *n ^{O(1)}*, or

This is called *big-O notation*. It concisely captures the important differences
in the asymptotic growth rates of functions.

One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time

*10n ^{3} + 24n^{2} + 3n log n + 144*

is still a cubic algorithm, since

*10n ^{3} + 24n^{2} + 3n log n + 144*.

<= 10n^{3} + 24n^{3} + 3n^{3} + 144n^{3}

<= (10 + 24 + 3 + 144)n^{3}

= O(n^{3})

Of course, since we are ignoring constant factors, any two linear algorithms
will be considered equally good by this measure. There may even be some
situations in which the constant is so huge in a linear algorithm that even an
exponential algorithm with a small constant may be preferable in practice. This
is a valid criticism of asymptotic analysis and big-O notation. However, as a
rule of thumb it has served us well. Just be aware that it is *only* a rule
of thumb--the asymptotically optimal algorithm is not necessarily the best one.

Some common orders of growth seen often in complexity analysis are

O(1) |
constant |

O(log n) |
logarithmic |

O(n) |
linear |

O(n log n) |
"n log n" |

O(n^{2}) |
quadratic |

O(n^{3}) |
cubic |

Here *log* means *log _{2}* or the logarithm base 2,
although the logarithm base doesn't really matter since logarithms with different bases differ by
a constant factor. Note also that

**O**- Let
*f*and*g*be functions from positive integers to positive integers. We say*f*is*O(g(n))*(read: ''*f*is order*g*'') if*g*is an upper bound on*f*: there exists a fixed constant*c*and a fixed*n*_{0}such that for all*n≥n*_{0},*f(n) ≤ cg(n)*.Equivalently,

*f*is*O(g(n))*if the function*f(n)/g(n)*is bounded above by some constant. **o**- We say
*f*is*o(g(n))*(read: "*f*is little-o of*g*'') if for all arbitrarily small real*c > 0*, for all but perhaps finitely many*n*,*f(n) ≤ cg(n)*.Equivalently,

*f*is*o(g)*if the function*f(n)/g(n)*tends to 0 as*n*tends to infinity. That is, f is small compared to g. If*f*is*o(g)*then*f*is also*O(g)* **Ω**- We say that
*f*is Ω(*g*(*n*)) (read: "f is omega of g") if*g*is a*lower*bound on*f*for large*n.*Formally, f is Ω(g) if there is a fixed constant*c*and a fixed*n*_{0}such that for all*n*>*n*_{0},*c**g*(*n*)*≤**f*(*n*)For example, any polynomial whose highest exponent is

*n*is Ω(^{k}*n*). If^{k}*f*(*n*) is Ω(*g*(*n*)) then*g(n)*is O(*f*(*n*)). If*f*(*n*) is*o*(*g*(*n*)) then*f(n)*is*not*Ω(*g*(*n*)). **Θ**- We say that
*f*is Θ(*g(n)*) (read: "f is theta of g") if*g*is an accurate characterization of*f*for large*n:*it can be scaled so it is both an upper and a lower bound of*f*. That is,*f*is both O(*g*(*n*)) and Ω(*g*(*n*)). Expanding out the definitions of Ω and*O*,*f*is Θ(*g*(*n*)) if there are fixed constants*c*_{1}and*c*_{2}and a fixed*n*_{0}such that for all*n*>*n*_{0},*c*_{1}*g*(*n*)*≤**f*(*n*)*≤**c*_{2}*g*(*n*)For example, any polynomial whose highest exponent is

*n*is Θ(^{k}*n*). If^{k}*f*is Θ(g), then it is*O*(*g*) but not*o*(*g*). Sometimes people use*O*(*g*(*n*)) a bit informally to mean the stronger property Θ(*g*(*n*)); however, the two are different.

Here are some examples:

*n + log n*is*O(n)*and*n*), because for all*n > 1*,*n*<*n + log n < 2n*.*n*is^{1000}*o(2*, because^{n})*n*tends to 0 as^{1000}/2^{n}*n*tends to infinity.- For any fixed but arbitrarily small real number
*c*,*n log n*is*o(n*, since^{1+c})*n log n / n*tends to 0. To see this, take the logarithm^{1+c}*log(n log n / n*^{1+c})

= log(n log n) - log(n^{1+c})

= log n + log log n - (1+c)log n

= log log n - c log n

and observe that it tends to negative infinity.

The meaning of an expression like *O*(*n*^{2}) is really a
set of functions: all the functions that are *O*(*n*^{2}).
When we say that *f(n)* is *O(n*^{2}*)*, we mean that *f(n)* is a
member of this set. It is also common to write this as *f*(*n*)
= *O*(*g*(*n*)) although it is not really an equality.

We now introduce some convenient rules for manipulating expressions involving
order notation. These rules, which we state without proof, are useful for
working with orders of growth. They are really statements about sets of
functions. For example, we can read #2 as saying that the product of any two
functions in *O*(*f*(*n*)) and *O*(*g*(*n*)) is in
*O*(*f*(*n*)*g*(*n*)).

*cn*for any constant^{m}= O(n^{k})*c*and any*m ≤ k*.*O(f(n)) + O(g(n)) = O(f(n) + g(n))*.*O(f(n))O(g(n)) = O(f(n)g(n))*.*O(cf(n)) = O(f(n))*for any constant*c*.*c*is*O(1)*for any constant*c*.*log*for any base_{b}n = O(log n)*b*.

All of these rules (except #1) also hold for Q as well.

A binary search tree is one in which every node *n* satisfies the
**binary search tree invariant**: its left child and all the nodes below it
have values (or keys) less than that of *n*.
Similarly, the right child node and all nodes below it have values greater
than that of *n*.

The code for a binary search tree looks like the following. First, to check for an element or to add a new element, we simply walk down the tree.

(* contains(t,x) is whether x is in the tree *) let contains(t: tree, x: value): bool = match t with Empty -> false | Node{value=value; left=left; right=right} -> (match compare(x, value) with EQUAL -> true | LESS -> contains(left, x) | GREATER -> contains(right, x)) (* add(t,x) is a BST with the same values as t, plus x *) let add(t: tree, x: value): tree = let balance(t: tree): tree = t (* what to write here? *) in match t with Empty -> Node{value=x; left=Empty; right=Empty} | Node {value=value; left=left; right=right} -> (match compare(x, value) with EQUAL -> Node{value=x; left=left; right=right} | LESS -> Node{value=value; left=add(left, x); right=right} | GREATER -> Node{value=value; left=left; right=add(right,x) } )

When a tree satisfies the BST invariant, an in-order traversal of the tree nodes will visit the nodes in ascending order of their contained values. So it's easy to fold over all tree nodes in order.

Removing elements is a little trickier. If a node is a leaf, it can be removed. If it has one child, it can be replaced with its child. If it has two children, it can be replaced with either its immediate successor (or predecessor), which requires searching in the tree.

(* Returns: a tree just like t except that the node containing x is removed. * Checks: x is in the tree. *) let remove(t: tree, x: value): tree = (* Returns: a tree in which the successor of the root is removed, * along with the value of that successor. * Checks: the root has a successor. *) let removeSuccessor(t: tree): tree*value = match t with Empty -> raise Fail "impossible" | Node {value=value; left=Empty; right=right} -> (right, value) | Node {value=value; left=left; right=right} -> let (l, v) = removeSuccessor(left) in (Node{value=value; left=l; right=right}, v) in match t with Empty -> raise Fail "value not in the tree" | Node {value=value; left=left; right=right) -> match Int.compare(x, value) with LESS -> Node{value=value; left=remove(left, x); right=right) | GREATER -> Node{value=value; left=left; right=remove(right, x)} | EQUAL -> match (left, right) with (_, Empty) -> left | (Empty, _) -> right | _ -> let (r, v) = removeSuccessor(right) in Node {value=v; left=left; right=r}

The time required to find a node in a BST, or to remove a node from a BST, is
*O*(*h*), where *h* is
the **height** of the tree: the length of the longest path from the root
node to any leaf. If a tree is perfectly balanced, so that all leaf nodes are
at the same depth, then *h* is *O*(log
*n*). This makes binary search trees an attractive data structure,
especially for implementing ordered sets and maps.

The problem with BST's is that they are not necessarily balanced. In fact, if
nodes are added to a BST in increasing order, the resulting BST will be
essentially a linked list.
The solution to this problem is to make sure that the BST is
balanced. Making the BST perfectly balanced at every step
is too expensive, but if we are interested in asymptotic complexity,
we merely need the height *h* to be proportional
to *O*(log *n*). We will say that the
BST is balanced in this case.

There are many ways to keep binary search trees balanced. Some of the more popular methods are red-black trees, AVL trees, B-trees, and splay trees. But there are many more, including 2-3 trees, 2-3-4 trees, AA trees, and treaps. Each kind of binary search tree works by strengthening the representation invariant so that the tree must be approximately balanced.

1. Write an implementation of stacks using lists. What is the big-O running time of each operation? The signature for stacks is:

module type STACK = sig type 'a stack val empty : unit -> 'a stack val push : ('a * 'a stack) -> 'a stack val top : 'a stack -> 'a option val pop : 'a stack -> ('a stack) option end

2. Write an implementation of queues using lists. What is the big-O running time of each operation? The signature for queues is:

module type QUEUE = sig type 'a queue val empty : unit -> 'a queue val insert : ('a * 'a queue) -> 'a queue val first : 'a queue -> 'a option val rest : 'a queue -> 'a option end

3. Write some of the functions that occur in the List structure by hand (e.g., rev, @, map, foldl, etc.) and analyze them to determine their big-O running time.