# CS 312 Lecture 18Amortized Algorithms

(See: Introduction to Algorithms, Cormen, Leiserson and Rivest and Stein, Chap 17 of 2nd ed.)

Amortized analysis refers to determining the time-averaged running time for a sequence of operations.  It is different from what is commonly referred to as average case analysis, because amortized analysis does not make any assumption about the distribution of the data values, whereas average case analysis assumes the data are not "bad" (e.g., some sorting algorithms do well on "average" over all input orderings but very badly on certain input orderings).  That is, amortized analysis is a worst case analysis, but for a sequence of operations, rather than for individual operations.  It uses the fact that we are analyzing a sequence to "spread out" the costs (think of insurance where everyone can pay a relatively modest amount despite some catastrophic costs).

The motivation for amortized analysis is to better understand the running time of certain techniques, where standard worst case analysis provides an overly pessimistic bound.  Amortized analysis generally applies to a method that consists of a sequence of operations, where the vast majority of the operations are cheap, but some of the operations are expensive.  If we can show that the expensive operations are particularly rare we can "charge them" to the cheap operations, and only bound the cheap operations.

The general approach is to assign an artificial cost to each operation in the sequence, such that the total of the artificial costs for the sequence of operations bounds total of the real costs for the sequence.  This artificial cost is called the amortized cost of an operation. In order to analyze the running time, the amortized cost thus is a correct way of understanding the overall running time - but note that particular operations can still take longer so it is not a way of bounding the running time of any individual operation in the sequence.

Three approaches are commonly used for amortized analysis, and have been termed the aggregate method, the banker's method (tokens) and the physicist's method (potential functions).

## Binary Counter

Consider the problem of storing a very large binary counter.  Say we decide to use an array, where each entry A[i] stores the i-th bit.

We will analyze the running time of the operation of counting using this representation, so the sequence of operations is a sequence of increments of the counter.

We will use the standard way of incrementing the counter, which is to toggle the lowest order bit.  If that bit switches to a 0 we toggle the next higher order bit, and so forth until the bit that we toggle switches to a 1 at which point we stop.

```  A[m]  A[m-1] ...    A[3]  A[2]  A[1]  A[0]      cost --------------------------------------------      ----     0     0             0     0     0     0                                                     1     0     0             0     0     0     1                                                     2     0     0             0     0     1     0                                                     1     0     0             0     0     1     1                                                     3     0     0             0     1     0     0                                                     1     0     0             0     1     0     1                                                     2     0     0             0     1     1     0```

When the result of the increment operation is n, the number of bits that change is at most 1+floor(lg n), that is the number of bits in the binary representation of n.

Thus in a traditional worst case analysis, the cost of counting up to n, which a sequence of n increments, is O(n log n).

But do you really think this is taking that much time?  Does each increment in the sequence really cost O(log n)?  How do we show its less?  This is the goal of amortized analysis.

Aggregate Method.  Let's consider how often we flip each individual bit, and sum those up to bound the total, rather than individually obtaining a worst case bound for each bit.  How often is A[0] toggled in counting up to n?  How often is A[1] toggled, A[2] toggled, etc?  Every time, every other time, every fourth time, ...

n + floor(n/2) + floor(n/4) + ...
<=   n + n/2 + n/4 + ....
<=   2n

So the amortized cost of an increment is 2, and the time to count to n (n increments) is 2n=O(n).

Banker's method. This approach is more akin to an insurance view, or an accounting view.  The idea is to see how much each individual operation needs to "pay" in order for there to be enough "money" to do all the work.  The only work here is toggling a bit, so lets say that it costs \$1 to toggle a bit.  The question is, how much money do we need to allocate to each increment to have banked up enough to pay for all the toggles as they occur.

Of course having just used the aggregate method, \$2 would be a good guess (though not a guarantee, maybe we run out along the way at some point even though there is enough money later).

What happens on an increment?  Zero or more bits change from a 1 to a 0, and then one bit changes from a 0 to a 1.  Let's say we want to leave \$1 at each 1-bit so that when it needs to turn to 0 we can pay for the work (because there may be many of those so we need to pay up in advance).   Then a simple scheme, costing \$2 per increment, is to use up the \$1 to flip each 1-bit to a 0, and then pay \$1 to flip the high-order 0-bit to a 1 and another \$1 to leave there for later use.

This shows that the amortized cost of the increment is 2, and the overall time is O(n) for n increments.

## Queue

Earlier in the semester we saw a way of implementing a queue (FIFO) using two stacks (LIFO).  Say that our stack has three operations, push, pop and empty, each with cost 1.

We saw that a queue can be implemented as

• enqueue(x): push x onto stack1
• dequeue(): if stack2 is empty then pop the entire contents of stack1 pushing each element in turn onto stack 2.  Now pop from stack2 and return the result.

We've seen earlier that this algorithm is correct, now we will consider the running time in more detail.

A conventional worst case analysis would establish that dequeue takes O(n) time but this is clearly a weak bound for a sequence of operations because very few dequeues can take that long.  That is, O(n^2) is not a very accurate characterization of how long a sequence of n enqueue and dequeue operations will take, even though in the worst case an individual dequeue can certainly take O(n) time.

To simplify the amortized analysis we will consider only the cost of the push and pop operations and not of checking whether stack2 is empty, as that operation clearly costs 1 per dequeue.

Aggregate method.  As each element is processed, it is clearly pushed at most twice and popped at most twice.  However, if an element is enqueued and never dequeued then it is pushed at most twice and popped at most once. Thus the cost of each enqueue is 3 and of each dequeue is 1.

Banker's method. Allocate \$2 to each element stored in the queue structure.  This will cover the cost of popping it and pushing it from stack1 to stack2 if that ever needs to be done.  There is an additional cost of \$1 for the initial push onto stack1, for a cost of \$3.  The dequeue operations cost \$1 to pop from stack2.

Note that the analysis in both cases seems to charge more for storing than removing, even though in the code it is the other way around.  Amortized analysis bounds the overall sequence, which in this case depends on how much stuff is stored in the data structure.  It does not bound the individual operations.

## Dynamic Array Resizing

When we use an array to implement a hash table or a stack, the array is of a fixed size and may run out of storage as elements are inserted.  One common approach is to double the size of the array when it fills up.  This is precisely the kind of technique that is amenable to amortized analysis - a sequence of operations, such as push/pop or insert/remove which are usually cheap but sometimes expensive, and where we want to understand the time bound for an overall sequence.

For concreteness lets consider the case of a stack implemented using an array.

The push operation increments a counter keeping track of where the top element is and puts the new element at that location.  If the array is full, then before incrementing and storing we create a new array of twice the size and copy all the old elements to the new array.

The pop operation simply reads the element indexed by the counter and decrements the counter.

The basic operations of incrementing or decrementing a counter and storing or reading an item in an array are constant time.

Aggregate method.  Assuming the array is arbitrarily large (no doubling is needed) the cost of all the operations is O(n) for n push and pop operations, because push is just increment and store and pop is just read and decrement, all of which are constant time.  For simplicity lets assume each of these operations costs 1 unit of time.

In the worst case a doubling happens after the first element, the second element, the fourth element, etc. Let's say copying costs 1 unit (though one might argue it costs 2 units, one to read and one to write). In that case, the total cost for doubling is at worst

1+2+4+ ... + n/2 + n < 2n

Because each doubling involves copying all the elements at that time (note: what about array initialization?  Then would be 2+4+...+2n.

Thus the overall cost is 3n, or O(n), and the amortized cost of an operation is 3 (or 5 if we account for array initialization).

What would be the amortized running time if we just added a constant amount of new storage instead of doubling?  What about if we multiplied the storage by 1.5 or 3 rather than doubling?

Banker's method.  Say we keep \$2 for each element stored in the stack that is beyond the midpoint of the current array.  Then in the normal case we need \$3 to store an item, \$1 to do the work of storing it, and \$2 that we keep for that item in the future.  Thus the cost of a push is \$3 (note the cost of a pop is just \$1).

Now when we need to do a doubling operation, there are clearly twice as many element to copy as there are elements with money allocated to them.  Again if copying an element costs \$1, we have enough money to pay for all the copies needed.

Before doubling:

```                 \$ \$ \$ \$ \$ \$ \$ \$                  \$ \$ \$ \$ \$ \$ \$ \$ +-------------------------------+ |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x| +-------------------------------+                 ^                 |              midpoint <--------------L---------------> ```
After doubling:

```+-------------------------------+-------------------------------+ |x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x| | | | | | | | | | | | | | | | | +-------------------------------+-------------------------------+                                 ^                                 |                              midpoint```

For all of the examples we have considered here, the aggregate method worked pretty well; we didn't really need the banker's method.  Sometimes it is a useful way of thinking about things where it's not so easy to just write down a sum.  However, note that it is an "art" to figure out what operations to bank money on, and how much to bank.

## Potential Functions (Physicist's Method)

This is a more formal version of the banker's method.  Instead of distributing money across items in the data structure, we will keep money in a piggy bank, as what's really important is that we have enough money to pay for operations as they happen.  We will call the amount of money in the bank the potential function, Phi, for the problem.

For each of the problems considered above we can write down a potential function.

For the counter problem:

Phi(counter) = # of 1 bits in the counter

For the queue implemented in terms of a stack:

Phi(queue) = 2 * size of stack 1

For the stack implemented in terms of a resizing array:

Phi(stack) = 2(k-L/2) if k>=L/2, 0 otherwise

where L is the current array size and k is the number of elements currently in the stack.

Say a data structure changes from state S to state S' as the result of doing some operation, we can define the amortized cost of the operation using Phi as

amortized cost = actual cost + Phi(S') - Phi(S)

Which is simply the amount of additional money that we need to maintain our piggy bank and pay for the work.

Note that if the potential is non-negative and starts at zero, then the sum of the actual costs is <= the sum of the amortized costs.

Next we will consider splay trees, where we will see how useful this formalism is.

## Splay Trees and Amortized Analysis

A splay tree is an efficient implementation of a binary search tree that takes advantage of locality in the incoming lookup requests. We will cover splay trees in recitation.