Jaguar: Java in Next-Generation Database
Systems
Johannes
Gehrke and Philippe Bonnet
Department
of Computer Science
Cornell
University
Contact
Information
Johannes
Gehrke
4108
Upson Hall
Ithaca,
NY 14853
Phone:
(607) 255-1045
Fax : (607) 255-4428
Email: johannes@cs.cornell.edu, http://www.cs.cornell.edu/johannes
WWW
PAGE
http://www.cs.cornell.edu/database/jaguar
Project
Award Information
·
Award Number:
IIS-9812020
·
Duration:
09/01/1998 – 8/31/2001
·
Title: Jaguar:
Java in Next-Generation Database Systems
Keywords
Extensibility, query optimization, heterogeneous environments, database
compression.
Project
Summary
This
project explores fundamental systems issues in query processing performance. We
investigate this problem from three different directions: client-server
processing, heterogeneous environments, and database compression. First, we
devised new query processing strategies than push processing capabilities into
the client, and we devised query execution plans that can span server and
clients. This allows us to trade resource usage between client, server and the
interconnection network. We then extended this work to parallel query
processing in heterogeneous environments; we are currently implementing a
parallel dataflow engine that adapts naturally to resource imbalances at the
hardware components. Last, we are investigating the use of compression in
database systems. We devised a new framework for database compression and new
query processing and query optimization strategies to integrate compression
into a modern query processor. All our techniques have been implemented in the
NSF-funded Cornell Predator object-relational database system. We extended the
system with several ways to store compressed relations, and we implemented a
fully compression-aware query optimizer. To best of our knowledge, our work is
the first result on compression-aware query optimization.
Publications
and Products
Project
Impact
·
Technology
transfer: Praveen Seshadri is currently on leave for two years
at Microsoft Corporation to transfer some of the technology developed under
this grant. White at Microsoft, Praveen published the following paper that
builds upon research funded by this NSF project:
P. Seshadri and P. Garnett. SQLServer For Windows CE – A Database Engine for
Mobile and Embedded Platforms. In Proceedings of the Sixteenth International
Conference on Data Engineering (ICDE 2000), pages 642-644. San Diego, CA,
March 2000.
·
Tobias
Mayr (PhD student) finished his A-exam on “Query Processing in Heterogeneous
Environments” and is scheduled to defend his PhD thesis in the summer of 2001. Zhiyuan Chen (PhD student)
finished his A-exam on “Compression in Database Systems” and is expected to
defend his PhD thesis towards the end of 2001. During the course of the
project, overall eight MEng students worked on the project.
·
Tobias
Mayr (PhD student) visited the Microsoft Research Bay Area Research Center
during the fall semester 2000 where he worked under Jim Gray on a parallel
dataflow engine.
·
Microsoft
Corporation made a substantial gift of mobile devices to the Department of
Computer Science at Cornell University.
·
We
developed the (to our knowledge) first compression-aware query optimizer for
compressed database systems, and we implemented a prototype in the Cornell
Predator Database System.
Goals,
Objectives, and Targeted Activities
The initial goal of the Jaguar project was to safely integrate
server-site extensions and client-site extensions into database query
processing. We introduced a new class of portable query plans for executing
Java extensions together with relational database operations, and we extended
the Cornell Predator object-relational system with a lightweight query
execution engine capable of evaluating these portable query execution plans. Both
the execution of portable query execution plan and the compression of the
partial query results were demonstrated at SIGMOD 1999.
We are currently actively compression techniques in a database system.
We developed a framework for applying and combining compression algorithms
depending on the structure of the partial results, resulting in a published
paper in ICDE 2000. We then concentrated on query optimization for compressed
database systems. We integrated compression into the Cornell Predator database
system, and we used this experimental infrastructure to implement the first
compression-aware query optimizer. This study will appear in the upcoming
SIGMOD 2001 conference. Our current research focuses on physical design for
compressed database systems, where we are investigating the question of which
attributes should be compressed given a user-specified query workload.
Our second area of investigation is a parallel dataflow environment. We
are integrating the Cornell Predator system as a query processor in a parallel
dataflow environment, and we have developed new query processing techniques
that extend beyond the three-stage split-merge-join push-parallelism currently
in parallel database systems.
Project
References
See
the list of publications above.
Area
Background
Query processing
and query optimization techniques have been studied thoroughly for
object-relational database systems with expensive user-defined functions
(UDFs). Traditionally, query processing of these functions resides on the
database server, with post-processing capabilities at the client. However,
experiences with object-relational database systems show that extending the
server is difficult even for experienced programmers, and impossible for
non-expert users. In environments such as the WWW, users need to incorporate
client-side UDFs into SQL queries run at the server. Our research addresses
performance issues for such client-side extensions. Over the last decades,
improvements in CPU speed have outpaced improvements in main memory and disk
access speeds by orders of magnitude.
This technology trend has enabled the use of data compression techniques
to improve performance by trading reduced storage space and I/O against
additional CPU overhead for compression and decompression of data. Compression has been utilized in a wide
range of applications from file storage to video processing; the development of
new compression methods is an active area of research. Our research addresses
how compression can be integrated into modern query processing engines.
Our research builds
on the Cornell Predator System, an object-relational database system developed
under recent NSF funding. More information on the motivation and background of
the Cornell Jaguar Project can be found at the following URL: http://www.cs.cornell.edu/database/jaguar.
Area
References
·
J.M.
Hellerstein and M. Stonebraker. Predicate Migration: Optimizing queries with
expensive predicates. In Proceedings of the 1993 ACM SIGMOD International
Conference on Management of Data, pages 267-276, Washington, D.C., May
1993.
·
R.
Ramakrishnan and J. Gehrke. Database Management Systems, Second Edition. McGraw
Hill, 1999.
Potential
Related Projects
Jaguar relates to projects on heterogeneous query processing and
mediated database systems, and there are connections to work on distributed
database systems and database compression.