Speaker:  Matteo Frigo
Affiliation:  MI T, Biomedin, s.r.l.
Host:  Steve Vavassis
Date:  2/17/00
Time & Location:  4:15PM, 101 Phillips
Title: Cache-Oblivious Algorithms

In order to exploit computer memory hierarchies fully, people usually design algorithms that depend on the parameters of the hierarchy, such as cache size and cache-line length. For example, it is common to process data in blocks,'' where the block size is explicitly specified so that the block fits into cache. Surprisingly, for certain problems, \emph{cache-oblivious} algorithms exist that use the cache as effectively as their cache aware'' counterparts. In a cache-oblivious algorithm, no variables dependent on hardware parameters need to be tuned to achieve optimality.

We have designed asymptotically optimal cache-oblivious algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. For a cache with size $Z$ and cache-line length $L$ where $Z=\Omega(L^2)$ the number of cache misses for an $m\times n$ matrix transpose is $\Theta(1+mn/L)$. The number of cache misses for either an $n$-point FFT or the sorting of $n$ numbers is $\Theta(1+(n/L)(1+\logZ n))$. We also have an $\Theta(mnp)$-work algorithm to multiply an $m\times n$ matrix by an $n\times p$ matrix that incurs $\Theta(1 + (mn+np+mp)/L + mnp/L\sqrt{Z})$ cache faults.

We introduce an ideal-cache'' model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the somewhat unrealistic assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice. Work done at MIT jointly with Charles~E.~Leiserson, Harald Prokop, and Sridhar Ramachandran.