My research group works in the areas of programming languages and compilers for parallel architectures.
Our goal is to develop tools for generating parallel code for applications programs that deal with large sparse matrices. Most scientific applications involve the numerical solution of partial differential equations. The techniques used almost always produce a system of algebraic equations that involve large sparse matrices. Unfortunately, existing compiler technology does a poor job of parallelizing sparse matrix programs. We take a radically different approach to this problem. Our compiler produces parallel sparse-matrix programs from sequential dense-matrix programs, using information from the user about the sparsity structure of matrices in the program. This enables us to use tools from the restructuring compiler area. Preliminary experiments with some Krylov space solvers show that the code produced by our compiler is competitive with hand-parallelized code in libraries like Argonne's PetSc library. We will extend our approach to direct methods for solving linear systems and to applications that require adaptive mesh refinement.
This project builds on our earlier work on restructuring compilation techniques for dense matrix programs. We have developed restructuring techniques for compiling programs to distributed memory and non-uniform memory access (NUMA) architectures like the IBM SP-2 and CM-5, where a processor can access local memory faster than non-local memory. To get good performance, the compiler must not only parallelize but must also ensure locality of reference by matching code and data distribution; when non-local references must be made, block transfers are preferable to many small messages. We recently developed the best algorithm known for the automatic alignment of computation and data and are incorporating it into our compiler test-bed. In earlier work, we developed a novel loop restructuring technique called access normalization, which transforms loop nests for increased locality and potential for block transfers, and implemented it in the LAMBDA loop transformation toolkit - our paper summarizing these results won the best paper prize at ASPLOS V. We worked with Hewlett-Packard to transfer this technology to HP's FORTRAN compiler product line for uniprocessors and multiprocessors.
We have developed new frameworks for program analysis and optimization based on the dependence flow graph (DFG). The DFG knits together the data and control dependence information of a program, permitting the development of optimization algorithms that generate better code than is possible with competing approaches. Our results are of independent interest; for example, we recently developed optimal algorithms for control dependence problems, answering a foundational question that had been open for almost a decade. This work led to the development of a linear-time algorithm for computing the static single assignment (SSA) form of programs. These results have been incorporated into a number of compilers, including those at IBM, Microsoft, HP, and Flavors.
If you have questions or comments please contact: firstname.lastname@example.org.