Speaker:  Matteo Frigo
Affiliation:  MI T, Biomedin, s.r.l.
Host:  Steve Vavassis
Date:  2/17/00
Time & Location:  4:15PM, 101 Phillips
Title: Cache-Oblivious Algorithms

In order to exploit computer memory hierarchies fully, people usually design algorithms that depend on the parameters of the hierarchy, such as cache size and cache-line length. For example, it is common to process data in ``blocks,'' where the block size is explicitly specified so that the block fits into cache. Surprisingly, for certain problems, \emph{cache-oblivious} algorithms exist that use the cache as effectively as their ``cache aware'' counterparts. In a cache-oblivious algorithm, no variables dependent on hardware parameters need to be tuned to achieve optimality.

We have designed asymptotically optimal cache-oblivious algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. For a cache with size $Z$ and cache-line length $L$ where $Z=\Omega(L^2)$ the number of cache misses for an $m\times n$ matrix transpose is $\Theta(1+mn/L)$. The number of cache misses for either an $n$-point FFT or the sorting of $n$ numbers is $\Theta(1+(n/L)(1+\logZ n))$. We also have an $\Theta(mnp)$-work algorithm to multiply an $m\times n$ matrix by an $n\times p$ matrix that incurs $\Theta(1 + (mn+np+mp)/L + mnp/L\sqrt{Z})$ cache faults.

We introduce an ``ideal-cache'' model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the somewhat unrealistic assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice. Work done at MIT jointly with Charles~E.~Leiserson, Harald Prokop, and Sridhar Ramachandran.