CS612 Homework #3
Due date: April 1st, 2003 - In class
NEWS:
- 3/30 - More bug fixes. please down load this file
and rename it to Matrix.cs and this file and rename
it to Element.cs
- 3/28 - Some bugs in Matrix.cs have been fixed. You can find the updated
file here. UPDATE: The web server doesn't like files with .cs
extensions. Download this file here and rename it to
Matrix.cs.
- 3/24 - The assignment is now due on 4/1.
- 3/21 - The supplied C code is updated again. Each
loop-nest appears in its own procedure. Please send email to
stodghil@cs.cornell.edu if you
encounter any additional problems with this file.
- 3/18 - An updated version of BCCK is available here.
IMPORTANT: Read Kamen's release notes about the new release
here. Also the supplied C code
has been modified to work with BCCK. This involved changing "#define size 500"
to "const int SIZE = 500;".
- 3/14 - The first version of the updated BCCK is here.
The "unimodular_transform" function is not finished, but the interface is in
place. A finished version of BCCK will appear shortly.
- 3/11 - First version of the assignment
In this assignment, you will integrate transformations for locality
enhancement into a BCCK compiler pass and evaluate the performance of the
compiler-generated code. You should use the updated version of the BCCK, found
here.
1. Code Transformations
Create a BCCK pass to do all of
the following:
- Compute the locality matrix for a perfectly nested loop. In addition
to dependence vectors, your locality matrix should contain vectors for spatial
locality and for read reuse locality as explained in class. Note:
You may use any outside tools you wish, or write your own, to do the linear
algebra involved (basis of a null space, etc.). We will provide some
tools for you.
- From the locality matrix and the dependence matrix, generate a
transformation matrix as described in class. Application of the
transformation matrix should:
- perform height reduction
- make as many inner loops fully permutable as possible
- Apply the transformation using the BCCK transform library routine unimodular_transform.
- Fully tile the permuted loop nest using the BCCK transform library routine
tile_transform. Use tiles that have the same
length (say K) in all dimensions (convert K to the parameters needed by
tile_transform in your code).
2. Code Testing
Compile the supplied C code file with your tiling
transformation and without. Compare the running times of the resulting programs. Test you
code for several values of K.
3. Tile size selection
A number of models have been proposed for automatically finding the "optimal"
tile size, K. The approach used by
Atlas, a system for automatically generating performance-tuned version of
BLAS routines, is to use brute forward search.
For this part of the assignment, you are to implement a strategy for choosing
a value of K based upon search. In other words, devise a strategy whereby your
compiler generates tiled code for a number of different values of K, measures
the performance of all of them, and chooses the version with the best
performance.
What you must submit:
- Source code for your BCCK pass
- C code generated by your tiling transformation for the supplied program for some value of K
- A graph of running time as function of K (use K=N for no optimization).
Indicate which value of K your compiler choose.
- A transcript of your compiler's output while running, as appropriate.
Last Updated:
Sunday, March 30, 2003 12:59:32 PM