Data-centric Multi-level Blocking . [PS]
In Programming Language Design and Implementation, June 1997.
Synthesizing Transformations for Locality Enhancement of Imperfectly-nested Loop Nests. [PS][PDF]
To appear in the International Conference on Supercomputing, May 2000.
Tiling Imperfectly-nested Loops. [PS][PDF]
Technical Reoprt TR2000-1782, Cornell University, Computer Science, February 2000.
Automatic Generation of Block-recursive Codes. [PS][PDF]
To appear in Euro-Par, August 2000.
Compiling Imperfectly-nested Sparse Matrix Codes with Dependencies. [PS][PDF]
Submitted for publication.
I interned with SGI's compiler group in Mountain View working on data locality optimizations across loop-nests. A number of examples had been discovered during bench-marking showing that data reuse across loop-nests (which was being ignored in the current versions of the compiler) could have serious effects on performance. My job was to identify the cases which could be handled systematically, and to design and implement an analysis framework that would quantify the amount of reuse across loop-nests and to select appropriate transformations for the code. This required interfacing with the existing framework and adding additional phases in the compiler's run.
I interned with the Digital's System's Research Center at Palo Alto working on enhancing data-locality in Java programs. The first part of the project focussed on studying Java benchmarks and identifying data-locality hot-spots through tracing. The second part attacked these hot-spots through the combined effort of data-placement, data-allocation and garbage collection techniques. A report of this work can be found here.
Designed and implemented a system for the real-time model-based tracking of moving objects through complex scenes. The system implements a Hausdorff-fraction based tracker running on a cluster of processors (either shared-memory or connected via ATM and communicating over an user-level communication substrate). The image sequence is provided to the cluster by a video source which captures and transmits data over an ATM network. A report on this project can be found here.
Developed a fast RPC system over an ATM network, with a latency and throughput comparable to that provided by the underlying user-level communication sub-system (UNET). The system included a stub-compiler and run-time environment that supported interfaces described in the Interface Definition Language (IDL) defined by the Open System Foundation's (OSF) Distributed Computing Environment (DCE).
Designed and implemented a system intended to reduce latency and increase network efficiency of web-browsers. Taking advantage of patterns of access in local domains, virtual servers undertake caching, prefetching, maintaining long-lived connections and multiplexing on existing connections on behalf of clients. This required modification of publicly available server software as well as the browser. The virtual server system was made fault-tolerant using the Horus communication substrate.
For my undergraduate senior project, I was involved in developing a library of functions (combining features of stdio and curses) which facilitates the development of utilities that interact with users in Indian langauges (Hindi, Tamil etc.). I also helped develop support utilities to generate and maintain fonts for such langauges. This is a part of an ongoing project Multilingual User Interface Software .