• Home
  • Resume [PS] [PDF]
  • Research
  • Publications
  • Current Research

    My research centers on developing the compiler technology for high-performance systems. My recent work falls broadly into three categories: generic programming with sparse matrices, memory hierarchy optimizations, and program analysis. Currently I am considering numerical applications; however, many of the techniques are applicable in other areas such as databases and multimedia.

    Generic programming for sparse matrix computations:

      The main focus of my dissertation is a novel generic programming system for simplifying the job of writing computational science codes that use sparse matrices. What distinguishes this system from traditional generic programming systems is that it separates the API used for writing generic programs from the API used to describe data structures. Restructuring compiler technology is used to "instantiate" generic programs into efficient sparse matrix programs.
    • APIs: No single API is appropriate for generic programming with sparse matrices. A high-level API hides details of compressed formats from the compiler, resulting in poor performance; while a low-level API is not suitable for writing generic programs. In the Bernoulli generic programming system, generic programs are dense matrix programs; i.e. the programmer API is that of a random-access data structure. A low-level API describes sparse matrices as indexed-sequential-access data structures. The design of the two APIs is explained in detail in this paper. More examples of compressed formats are available here.
    • Restructuring compiler: Efficient sparse matrix code has the following properties:
      • It is organized as data-centric computation that enumerates the non-zero elements of sparse matrices and performs computations with these elements as they are enumerated.
      • It enumerates multiple sparse matrices together. There are a number of ways of performing common enumerations which are closely related to join strategies in database systems such as merge-join and hash-join.
      • It preserves the semantics of the original dense matrix code. The technology used for restructuring imperfectly-nested codes is similar to the one developed for locality enhancement.
      The restructuring compiler technology for instantiating generic programs into efficient sparse matrix programs is described in this paper.
    • C++ implementation: The Bernoulli Generic Matrix Library (BGML) is the C++ implementation of the sparse matrix abstraction. It serves as the "glue" between the APIs and the restructuring compiler. This paper discusses the implementation techniques used to get the most performance from the BGML.

    Exploiting memory hierarchies:

      The memory systems of modern computers are organized as a hierarchy in which the latency of memory accesses increases by roughly an order of magnitude from one level of the hierarchy to the next. Therefore, a program runs well only if most of its data accesses are satisfied by the faster levels of the memory hierarchy. 
    • Locality optimizations reduce memory latency by restructuring codes to enhance locality of reference. Restructuring of perfectly-nested loops (where all assignment statements are contained in the innermost loop) is well understood. In practice, however, most loop nests are imperfectly-nested. This paper develops a theory of imperfectly-nested loop transformations for locality enhancement, and this paper presents a systematic approach to tiling imperfectly-nested loops.
    • Prefetching is a technique for tolerating memory latency by moving data close to the processor before it is actually needed. While interning at HP Labs Cambridge, I studied the data cache performance of SpecInt95 and other integer benchmarks on an embedded VLIW processor with very complex memory hierarchy. The results of this study and the potential for prefetching are discussed here.

    Program analysis:

    Before locality optimizations can be performed, the source program must be analyzed to ensure that the proposed transformation preserves the semantics of the program. The most commonly used analysis technique is dependence analysis. Whenever that analysis is inexact, the compiler has to be conservative and assume that restructuring is illegal. Fractal symbolic analysis is a new analysis technique for use in restructuring compilers to verify legality of program transformations. It combines the power of symbolic program analysis with the tractability of dependence analysis, and it permits restructuring of  far more complex codes than current technology can handle. 

    Other Work

  • Watermarking MPEG / JPEG is a very interesting project I worked on for my Multimedia class at Cornell. The project involved the design, implementation, and testing of a JPEG codec with built-in digital watermarking. The watermarking was based on the idea of using the round-off error during the quantization stage of DCT compression, and did not require any changes to the standard JPEG decoder. Error-correcting codes were used to improve the robustness of the watermark. The method was demonstrated to work very well against common signal processing and geometric distortion attacks.
    • Error-correcting codes: For my M.S. thesis at Sofia University, I studied the covering radii of binary linear error-correcting codes. I computed the exact minimal covering radii t[n,k] for all self-complementary codes of dimension k<7, as well as lower and upper bounds on t[n,k] for self-complementary codes of length n<65. I also developed an efficient algorithm for computing the weight distribution of coset leaders of binary linear codes with repeated coordinates.


    Last updated March 22, 2000.