Algorithms for Tile Size Selection

Problem Description

Tiling is one of the most important locality enhancement techniques for loop-nests since it permits the exploitation of data reuse in multiple loops in a loop-nest.

An important parameter for tiling is the size of the tiles. If there are n loops, then there are n parameters T1, T2, ... , Tn to be determined one for each loop. These parameters affect the performance of the resulting code. Tiling with these parameters set to 1, for example, is equivalent to the original code with no tiling. Reuse improves as the tile-size increases, as long as the data used in the tile resides in the cache. Tiling with very large tile-sizes also does not have improve the performance as the data will not fit into cache. Clearly there is an optimal tile-size depending on the characteristics of the cache being used and the actual code being tiled.

There has been some research in this area which attempts to determine the optimal tile-size. The fundamental idea in these attempts is to ensure that the data used by a tile fits into cache. If we assume a fully-associative cache with an optimal replacement policy, this requires that the amount of data touched by a tile should be less than the cache size. We would want the largest tile size that satisfies this condition. This condition can be used to determine bounds on the tile-size. Unfortunately, caches are not fully associative. As a result, conflicts arise when data used in the same tile map to the same cache locations. Certain forms of conflicts (due to self-interference, that is, interference due to data brought in by the same array reference) can be eliminated by making sure the tile-size chosen avoids interference.

Recent research in our group has shown how to tile imperfectly-nested loops in a similar manner by embedding the statements in a fully permutable iteration space. Once this embedding has been done, the code resembles a perfectly nested loop-nest and can be tiled.

The goals of the project are the following :-

  1. Develop a tile-size determination algorithm for imperfectly-nested loops. We will give you our code that converts imperfectly-nested loops into a fully-permutable loop nest where possible.
  2. Compare the performance of the code obtained by tiling with the tile-size predicted by your algorithm to the performance with the "opt imal" tile-size. You will determine the optimal tile-size empirically y, by executing the tiled code for various tile-sizes and picking the best one.

What you need to do

  1. Understand the program iteration space representation of an imperfectly nested loop-nest. This will be discussed in class. Details can be obtained from the tech-report.
  2. Read and understand the existing research in tile-size selection for perfectly-nested loops. Some of the papers are listed at the end.
  3. Determine how to adapt one or a combination of these methods for the imperfectly-nested case.
  4. Implement your tile-size selection algorithm.
  5. Generate tiled code (this will be provided) for your predicted tile-size and for a range of tile-sizes. How good is your predicted tile-size?
  6. Present statistics about the number of cache misses incurred by the predicted tiles-size and the optimal tile-size.

Codes to look at

You should look at matrix multiplication, Cholesky and Jacobi for this project. These codes are in the first tech-report.

References

  1. Tiling Imperfectly-nested Loops.
  2. Tile Size Selection Using Cache Organization and Data Layout.
  3. A Tile Selection Algorithm for Data Locality and Cache Interference.