Completed projects

Youtopia: Causal Databases

Spatial Indexing

Data Privacy

Games and Simulations


DBToaster: A compiler for database engines. DBToaster is a novel SQL compiler that generates database engines for high-performance main-memory processing of streaming data. In a nutshell, DBToaster aggressively compiles aggregate queries to incremental (or delta-) form, enabling stream data to be processed highly efficiently, in contrast to today's operator-centric query plan interpreters. The DBToaster compiler produces database engines in native code to perform incremental view maintenance of continuous queries posed on update streams. Update streams cannot be addressed efficiently by today's systems, and one clear motivating application is that of algorithmic trading on orderbook data, where buy and sell orders on an exchange's orderbooks are updated arbitrarily.

MayBMS: An uncertain database management system. Incompleteness of data is a problem that often arises in practice. Examples include scientific databases, data integration, sensor data management, as well as scenarios where information is manually entered and is therefore prone to mistakes and incompleteness. MayBMS is a system for the efficient management of large uncertain databases. Its main features include a powerful query language for processing uncertain data, space-efficient storage of uncertain and probabilistic data, support for data cleaning, and efficient query evaluation.

Petabyte Data Management and Analysis Services for Data-Driven Science. The rapid growth in the generation of digital data is changing computational science in a fundamental way. Traditionally, the scope of computational problems was limited by the available processing power. But today, many problems are extremely data-intensive, and the lack of large-scale storage infrastructure creates a new bottleneck. Thus, modern data-intensive applications need high-performance computational resources and a system in which the computational resources are tightly coupled with large-scale storage. The goal of this project is to develop such a system to enable users to focus on their scientific tasks, freeing them from having to set up an increasingly complex data-processing infrastructure.

Hilda: A High-Level Language for Data-Driven Web Applications. An important class of applications is data-driven web applications, i.e., web applications that are run on top of a back-end database system. Examples of such applications include online shopping sites, online auctions, and business-to-business portals. While developing data-driven web applications is a complex and challenging task, the application development interface provided by existing platforms is often too low-level or does not provide a unified model for the whole application stack. Hilda addresses the above shortcomings by providing a high-level language for developing data-driven web applications. The primary benefits of Hilda over existing development platforms are: (a) it uses a unified data model for all layers of the application, (b) it is declarative, (c) it models both application queries and updates, (d) it supports structured programming for web sites, (e) it enables conflict detection due to concurrent updates, and (f) it separates application logic from presentation.

Cayuga: Event Stream Processing. Publish/Subscribe is a popular paradigm for users to express their interests ("subscriptions") in certain kinds of events ("publications"). It allows efficient asynchronous interaction among distributed applications. For many years it has been an active field of research, with topics spanning active databases, event systems, high performance implementations of pub/sub, and distributed pub/sub. Today, a publish/subscribe system is part of a typical message-oriented middleware, and major vendors sell products with the functionality of message brokers. Traditional publish/subscribe (pub/sub) systems such as topic-based and content-based pub/sub systems have a major limitation: They only allow users to express stateless subscriptions that are evaluated over individual events that arrive at the system. However, many applications require the ability to handle stateful subscriptions that involve more than a single event, and users need to be notified with customized witness events as soon as one of their stateful subscriptions is satisfied. In Cayuga, we are building a system for such stateful event processing.

Quark: Unifying Database Systems and Information Retrieval Systems. The data stored in most enterprises is a mix of structured and unstructured data. Traditionally, structured data has been queried using (relational) database systems, while unstructured data has been queried using information retrieval systems. In the Quark project, we are exploring a much tighter integration (or unification) of database systems and information retrieval systems. Specifically, we are developing a novel system architecture that allows users to issue complex structured queries and ranked keyword search queries over any mix of structured, unstructured, and semi-structured data.

Cougar: The Network is the Database . The widespread distribution and availability of small-scale sensors, actuators, and embedded processors is transforming the physical world into a computing platform. Sensor networks that combine physical sensing capabilities such as temperature, light, or seismic sensors with networking and computation capabilities will soon become ubiquitous. Applications range from environmental control, warehouse inventory, and health care to scientific and military scenarios. Existing sensor networks assume that the sensors are preprogrammed and send data to a central frontend where the data is aggregated and stored for offline querying and analysis. This approach has two major drawbacks. First, the user cannot change the behavior of the system dynamically. Second, communication in today's networks is orders of magnitude more expensive than local computation; thus in-network storage and processing can vastly reduce resource usage and extend the lifetime of a sensor network. In the Cougar project, we are developing data management technology for wireless sensor networks.

The PEPPER Peer-to-Peer Database Project. Peer-to-peer systems, such as Napster and Gnutella, provide a new paradigm for structuring massively distributed and fault-tolerant computer systems. However, existing peer-to-peer systems are mostly file systems, with limited query capability. In the PEPPER project, we are building a system for evaluating complex queries over millions/billions of peers.

The HIMALAYA Data Mining Project. Innovative research techniques for analyzing large datasets.