CS612 Homework #3

Due date: April 14th, 2005

In this assignment, you will integrate transformations for locality enhancement into a BCCK compiler pass and evaluate the performance of the compiler-generated code. You should use the updated version of the BCCK. Read these notes very carefully!!!

1. Code Transformations

Create a BCCK pass to convert a perfectly-nested loop nest to a fully-permutable loop nest if possible, using the algorithm described in class. To keep matters simple, you do not need to implement height reduction.

To find a good value for NB, self-tuning systems like Atlas use a simple search strategy: generate code for different NB values, run them on the machine, and choose the one that performs best (for details, see Yotov et al. PLDI'03). Invent a search strategy of your choice, and implement it. One possibility is to perform exhaustive search within some interval; for example, you can use the size of the cache determined by running a micro-benchmark to find an upper bound for tile size. Note that the optimal value of  NB may be different for different loop nests.

Compile the supplied C code file with your tiling transformation and without. Compare the running times of the resulting programs. 

2. Tiling for the register file (MMM only)

Tiling for the register file is no different than tiling for the cache, except that you need to unroll the code for the register tile completely and assign array elements to scalar variables, as compilers rarely (if not never) are able to allocate registers to array subscripts. This technique, known as scalarization is described in more detail in many papers; here is one. This paper gives a general algorithm for scalarization, but for this assignment, you should think about what you need to handle just MMM, and implement only that. 

Note that because registers are few in comparisons to the capacity of the L1 data cache, it is usually not optimal to use square tiles for that level of the memory hierarchy (see Yotov et al. PLDI'03).

Implement loop unrolling and scalarization passes for BCCK. Using a search strategy, explore different register tile sizes. The use of register tiling may change the best value of NB for L1 cache tiling from the previous step, so your search strategy should search for the best value of NB as well as the best register tile.

What you must submit:


Last Updated: Monday, March 28, 2005 01:09:32 PM