Memoization

Even when programming in a functional style, *O*(1)
mutable map abstractions like arrays and hash tables can be extremely
useful. One important use of hash tables is for **memoization**, in which
a previously computed result is stored in the table and retrieved later.
Memoization is a powerful technique for building efficient algorithms,
especially in a functional language.

For example, consider the problem of computing the nth Fibonacci number,
defined as *f*(*n*) = *f*(*n*-1) +
*f*(*n*-2). We can translate this directly into an SML
algorithm:

fun f(n) = if n<2 then 1 else f(n-1) + f(n-2)

Unfortunately, this code takes exponential time:
Θ(φ^{n}), where
φ is the golden ratio, (1 + √5)/2.
We can easily verify these asymptotic bounds by using
the substitution method:
*k*φ^{n} = *k*φ^{n-1} + *k*φ^{n-2},
because
φ^{2} = φ + 1
.

The key observation is that the recursive implementation is inefficient
because it recomputes the same Fibonacci numbers
over and over again. If we record Fibonacci numbers as they are computed,
we can avoid this redundant work.
The idea is that whenever we compute `f(n)`

, we store it
in a table indexed by `n`

.
In this case the keys are integers, so we can use implement this
table using an array:

The function `f_alg`

contains the original recursive algorithm,
except that it calls itself via the function `f_mem`

, which
checks the table first. In a call to `f(n)`

, the function
`f_alg`

can be called at most `n`

times, and each of
those calls can call `f_mem`

at most twice. So total
time is asymptotically linear in `n`

.
The speedup from memoization is more than a million at
*n*=40.

Although this code uses imperative constructs (specifically,
`Array.update`

), the side effects are not visible outside the
function `f`

. Therefore these are **benign side
effects** that do not need to be mentioned in the specification of
`f`

.

Suppose we want to throw a party for a company whose org chart is a binary
tree. Each employee has an associated “fun value” and we want
the set of invited employees to have a maximum total fun value. However, no
employee is fun if his superior is invited. There are 2^{n}
possible invitation lists, so the naive algorithm takes exponential time.
We can use memoization to turn this into a linear-time algorithm.

We start by defining a datatype to represent the employees.

datatype tree = Empty | Node of int * tree * tree

Now, how can we solve this recursively? One important observation
is that in any tree, the optimal invitation list that doesn't include
the root node will be the union of optimal invitation lists for the
left and right subtrees. And the optimal invitation list that does
include the root node will be the union of optimal invitation lists
for the left and right children that do not include their respective
root nodes. So it seems useful to have functions that optimize
the invite lists for the case where the root node is required to be
invited, and for the case where the root node is excluded. We'll call
these two functions `party_in`

and `party_out`

.
Then the result of `party`

is just the maximum of these
two functions:

This code has exponential performance. But notice that there are only *n* possible distinct calls to `party`

. If we change
the code to memoize the results of these calls, the performance will be linear
in *n*. Here is a version that memoizes the result of
`party`

and also computes the actual invitation lists. Notice that
this code memoizes results directly in the tree.

Why was memoization so effective for solving this problem? As with the
Fibonacci algorithm, we had the **overlapping subproblems**
property, in which the naive recursive implementation called the function
`party`

many times with the same arguments. Memoization saves all
those calls. Further, the party optimization problem has the property of
**optimal substructure**, meaning that the optimal answer to a
problem is computed from optimal answers to subproblems. Not all optimization
problems have this property. The key to using memoization effectively is
to figure out how to write a recursive function implementing the algorithm,
with these two properties. Sometimes this requires thinking carefully.

Here is a more involved example of memoization. Suppose that we have some text that we want to format as a paragraph within a certain column width. For example, we might have to do this if we were writing a web browser. For simplicity we will assume that all characters have the same width. A formatting of the text consists of choosing certain pairs of words to put line breaks in between. For example, when applied to the list of words in this paragraph, with width 60, we want output like the following:

val it = ["Here is a more involved example of memoization. Suppose that", "we have some text that we want to format as a paragraph", ... "applied to the list of words in this paragraph, with width", "60, we want output like the following:"] : string list

A good formatting uses up a lot of each column, and also gives each line
similar widths. The **greedy** approach would be to just
fill each line as much as possible, but this can result in lines with
very different lengths. For example, if we format the string
“this may be a difficult example” at a width of
13 characters, we get a formatting that could be improved:

Greedy | Optimal |

this may be a difficult example |
this may be a difficult example |

The TeX formatting program does a good job of keeping line widths similar
by finding the formatting that
minimizes the sum of the *cube* of the leftover space in each line (except
for the last). However, for n words, there are Ω(2^{n}) possible formattings, so the
algorithm can't possibly check them all for large text inputs. Remarkably,
we can use memoization to find the optimal formatting efficiently. In fact,
memoization is useful for many optimization problems.

We start by writing a simple recursive algorithm to walk down the list and try either inserting a line break after each word, or not inserting a linebreak:

This algorithm is exponential because it computes all possible formattings. It is therefore much too slow to be practical.

The key observation is that in the optimal formatting of a paragraph
of text, the formatting of the text past any given point is the optimal
formatting of just that text, given that its first character starts at
the column position where the prior formatted text ends. Thus,
the formatting problem has **optimal substructure** when
cast in this way.

So if we compute the best formatting after a particular line break position, that formatting is the best for all possible formattings of the text before the break.

We can make `linebreak`

take linear time by memoizing the best
formatting for the calls where `clen = 0`

. (We could memoize
all calls, but that wouldn't improve speed much.
This requires just introducing a function `lb_mem`

that
looks up and records memoized formatting results:

Memoization is a powerful technique for asymptotically speeding up simple
recursive algorithms, without having to change the way the algorithm works. In
general, memoized functions have several arguments, and so hash tables are
needed to store the memoized results. Memoization is closely related to the
technique of **dynamic programming**, which you will see in CS482.
Dynamic programming requires planning the order in which results are computed,
whereas memoization automatically computes results as needed.