Resume [PS] [PDF]
My research centers on developing the compiler technology for high-performance
systems. My recent work falls broadly into three categories:
programming with sparse matrices, memory hierarchy
optimizations, and program analysis. Currently
I am considering numerical applications; however, many of the techniques
are applicable in other areas such as databases and multimedia.
Generic programming for sparse matrix computations:
The main focus of my dissertation is a novel generic programming system
for simplifying the job of writing computational science codes that use
sparse matrices. What distinguishes this system from traditional generic
programming systems is that it separates the API used for writing generic
programs from the API used to describe data structures. Restructuring compiler
technology is used to "instantiate" generic programs into efficient sparse
APIs: No single API is appropriate for generic programming with
sparse matrices. A high-level API hides details of compressed formats from
the compiler, resulting in poor performance; while a low-level API is not
suitable for writing generic programs. In the Bernoulli generic programming
system, generic programs are dense matrix programs; i.e. the programmer
API is that of a random-access data structure. A low-level API describes
sparse matrices as indexed-sequential-access data structures. The design
of the two APIs is explained in detail in this
paper. More examples of compressed formats are available here.
Restructuring compiler: Efficient sparse matrix code has the following
The restructuring compiler technology for instantiating generic programs
into efficient sparse matrix programs is described in this
It is organized as data-centric computation that enumerates the
non-zero elements of sparse matrices and performs computations with these
elements as they are enumerated.
It enumerates multiple sparse matrices together. There are a number of
ways of performing common enumerations which are closely related
to join strategies in database systems such as merge-join and hash-join.
It preserves the semantics of the original dense matrix code. The technology
used for restructuring imperfectly-nested codes is similar to the one developed
for locality enhancement.
C++ implementation: The Bernoulli Generic Matrix Library (BGML)
is the C++ implementation of the sparse matrix abstraction. It serves as
the "glue" between the APIs and the restructuring compiler. This
paper discusses the implementation techniques used to get the most
performance from the BGML.
Exploiting memory hierarchies:
The memory systems of modern computers are organized as a hierarchy
in which the latency of memory accesses increases by roughly an order of
magnitude from one level of the hierarchy to the next. Therefore, a program
runs well only if most of its data accesses are satisfied by the faster
levels of the memory hierarchy.
Locality optimizations reduce
memory latency by restructuring codes to enhance locality of reference.
Restructuring of perfectly-nested loops (where all assignment statements
are contained in the innermost loop) is well understood. In practice, however,
most loop nests are imperfectly-nested.
paper develops a theory of imperfectly-nested loop transformations
for locality enhancement, and
presents a systematic approach to tiling imperfectly-nested loops.
Prefetching is a technique for tolerating memory latency by moving
data close to the processor before it is actually needed. While interning
at HP Labs Cambridge, I studied the data cache performance of SpecInt95
and other integer benchmarks on an embedded VLIW processor with very complex
memory hierarchy. The results of this study and the potential for prefetching
are discussed here.
Before locality optimizations can be performed, the source
program must be analyzed to ensure that the proposed transformation preserves
the semantics of the program. The most commonly used analysis technique
is dependence analysis. Whenever that analysis is inexact, the compiler
has to be conservative and assume that restructuring is illegal. Fractal
symbolic analysis is a new analysis technique for use in restructuring
compilers to verify legality of program transformations. It combines the
power of symbolic program analysis with the tractability of dependence
analysis, and it permits restructuring of far
more complex codes than current technology can handle.
Watermarking MPEG / JPEG
is a very interesting project I worked on for my Multimedia class at Cornell.
The project involved the design, implementation, and testing of a JPEG
codec with built-in digital watermarking. The watermarking was based on
the idea of using the round-off error during the quantization stage of
DCT compression, and did not require any changes to the standard JPEG decoder.
Error-correcting codes were used to improve the robustness of the watermark.
The method was demonstrated to work very well against common signal processing
and geometric distortion attacks.
Error-correcting codes: For my M.S.
thesis at Sofia University, I studied the covering radii of binary
linear error-correcting codes. I computed the exact minimal covering radii
t[n,k] for all self-complementary codes of dimension k<7, as well as
lower and upper bounds on t[n,k] for self-complementary codes of length
n<65. I also developed an efficient algorithm for computing the weight
distribution of coset leaders of binary linear codes with repeated coordinates.