Jaguar: Java in Next-Generation Database Systems

Johannes Gehrke and Philippe Bonnet

Department of Computer Science

Cornell University

Contact Information

Johannes Gehrke

4108 Upson Hall

Ithaca, NY 14853

Phone: (607) 255-1045
Fax : (607) 255-4428
Email: johannes@cs.cornell.edu, http://www.cs.cornell.edu/johannes

WWW PAGE

http://www.cs.cornell.edu/database/jaguar

Project Award Information

·        Award Number: IIS-9812020

·        Duration: 09/01/1998 – 8/31/2001

·        Title: Jaguar: Java in Next-Generation Database Systems

Keywords

Extensibility, query optimization, heterogeneous environments, database compression.

Project Summary

This project explores fundamental systems issues in query processing performance. We investigate this problem from three different directions: client-server processing, heterogeneous environments, and database compression. First, we devised new query processing strategies than push processing capabilities into the client, and we devised query execution plans that can span server and clients. This allows us to trade resource usage between client, server and the interconnection network. We then extended this work to parallel query processing in heterogeneous environments; we are currently implementing a parallel dataflow engine that adapts naturally to resource imbalances at the hardware components. Last, we are investigating the use of compression in database systems. We devised a new framework for database compression and new query processing and query optimization strategies to integrate compression into a modern query processor. All our techniques have been implemented in the NSF-funded Cornell Predator object-relational database system. We extended the system with several ways to store compressed relations, and we implemented a fully compression-aware query optimizer. To best of our knowledge, our work is the first result on compression-aware query optimization.

Publications and Products

Project Impact

·         Technology transfer:  Praveen Seshadri is currently on leave for two years at Microsoft Corporation to transfer some of the technology developed under this grant. White at Microsoft, Praveen published the following paper that builds upon research funded by this NSF project:
P. Seshadri and P. Garnett. SQLServer For Windows CE – A Database Engine for Mobile and Embedded Platforms. In Proceedings of the Sixteenth International Conference on Data Engineering (ICDE 2000), pages 642-644. San Diego, CA, March 2000.

·         Tobias Mayr (PhD student) finished his A-exam on “Query Processing in Heterogeneous Environments” and is scheduled to defend his PhD thesis in the summer of 2001. Zhiyuan Chen (PhD student) finished his A-exam on “Compression in Database Systems” and is expected to defend his PhD thesis towards the end of 2001. During the course of the project, overall eight MEng students worked on the project.

·         Tobias Mayr (PhD student) visited the Microsoft Research Bay Area Research Center during the fall semester 2000 where he worked under Jim Gray on a parallel dataflow engine.

·         Microsoft Corporation made a substantial gift of mobile devices to the Department of Computer Science at Cornell University.

·         We developed the (to our knowledge) first compression-aware query optimizer for compressed database systems, and we implemented a prototype in the Cornell Predator Database System.

Goals, Objectives, and Targeted Activities

The initial goal of the Jaguar project was to safely integrate server-site extensions and client-site extensions into database query processing. We introduced a new class of portable query plans for executing Java extensions together with relational database operations, and we extended the Cornell Predator object-relational system with a lightweight query execution engine capable of evaluating these portable query execution plans. Both the execution of portable query execution plan and the compression of the partial query results were demonstrated at SIGMOD 1999.

We are currently actively compression techniques in a database system. We developed a framework for applying and combining compression algorithms depending on the structure of the partial results, resulting in a published paper in ICDE 2000. We then concentrated on query optimization for compressed database systems. We integrated compression into the Cornell Predator database system, and we used this experimental infrastructure to implement the first compression-aware query optimizer. This study will appear in the upcoming SIGMOD 2001 conference. Our current research focuses on physical design for compressed database systems, where we are investigating the question of which attributes should be compressed given a user-specified query workload.

Our second area of investigation is a parallel dataflow environment. We are integrating the Cornell Predator system as a query processor in a parallel dataflow environment, and we have developed new query processing techniques that extend beyond the three-stage split-merge-join push-parallelism currently in parallel database systems.

Project References

See the list of publications above.

Area Background

Query processing and query optimization techniques have been studied thoroughly for object-relational database systems with expensive user-defined functions (UDFs). Traditionally, query processing of these functions resides on the database server, with post-processing capabilities at the client. However, experiences with object-relational database systems show that extending the server is difficult even for experienced programmers, and impossible for non-expert users. In environments such as the WWW, users need to incorporate client-side UDFs into SQL queries run at the server. Our research addresses performance issues for such client-side extensions. Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access speeds by orders of magnitude.  This technology trend has enabled the use of data compression techniques to improve performance by trading reduced storage space and I/O against additional CPU overhead for compression and decompression of data.  Compression has been utilized in a wide range of applications from file storage to video processing; the development of new compression methods is an active area of research. Our research addresses how compression can be integrated into modern query processing engines.

Our research builds on the Cornell Predator System, an object-relational database system developed under recent NSF funding. More information on the motivation and background of the Cornell Jaguar Project can be found at the following URL: http://www.cs.cornell.edu/database/jaguar.

Area References

·         J.M. Hellerstein and M. Stonebraker. Predicate Migration: Optimizing queries with expensive predicates. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 267-276, Washington, D.C., May 1993.

·         R. Ramakrishnan and J. Gehrke. Database Management Systems, Second Edition. McGraw Hill, 1999. 

Potential Related Projects

Jaguar relates to projects on heterogeneous query processing and mediated database systems, and there are connections to work on distributed database systems and database compression.