Menu:

Projects

Ongoing projects

Cayuga: Event Stream Processing. Publish/Subscribe is a popular paradigm for users to express their interests ("subscriptions") in certain kinds of events ("publications"). It allows efficient asynchronous interaction among distributed applications. For many years it has been an active field of research, with topics spanning active databases, event systems, high performance implementations of pub/sub, and distributed pub/sub. Today, a publish/subscribe system is part of a typical message-oriented middleware, and major vendors sell products with the functionality of message brokers. Traditional publish/subscribe (pub/sub) systems such as topic-based and content-based pub/sub systems have a major limitation: They only allow users to express stateless subscriptions that are evaluated over individual events that arrive at the system. However, many applications require the ability to handle stateful subscriptions that involve more than a single event, and users need to be notified with customized witness events as soon as one of their stateful subscriptions is satisfied. In Cayuga, we are building a system for such stateful event processing.

Data-Driven Games. Game development is a particularly interesting design challenge because it requires the design team to have expertise in many different areas: art, music, engineering and so on. Historically, game developers have solved this problem by separating game content from game code; games designed this way are called data-driven . In this project we are working to further the development of data-driven games by adapting techniques from the data management community and applying them to game development. Databases have revolutionized the design of data-driven business applications because they separate the processing of data from its specification. SImilarly, our goal is develop a general framework that allows game designers to specify game behavior without having to worry about how to implement it efficiently.

Data Privacy. The digitization of our daily lives has led to an explosion in the collection of personal data by governments, corporations, and individuals. Such information is stored in large databases. This has led to easy access to sensitive personal information, resulting in a dramatic increase in the disclosure of sensitive information. Hence it is crucial to design database systems which can limit the disclosure of private information. At the Database Privacy Group @ Cornell, we research various aspects of the privacy problem including formal definitions of privacy, efficient algorithms for checking various definitions of privacy, the trade-off between privacy and utility, and apply this to different settings like privacy preserving data mining and data publishing.

Petabyte Data Management and Analysis Services for Data-Driven Science. The rapid growth in the generation of digital data is changing computational science in a fundamental way. Traditionally, the scope of computational problems was limited by the available processing power. But today, many problems are extremely data-intensive, and the lack of large-scale storage infrastructure creates a new bottleneck. Thus, modern data-intensive applications need high-performance computational resources and a system in which the computational resources are tightly coupled with large-scale storage. The goal of this project is to develop such a system to enable users to focus on their scientific tasks, freeing them from having to set up an increasingly complex data-processing infrastructure.

MayBMS: An uncertain database management system. Incompleteness of data is a problem that often arises in practice. Examples include scientific databases, data integration, sensor data management, as well as scenarios where information is manually entered and is therefore prone to mistakes and incompleteness. MayBMS is a system for the efficient management of large uncertain databases. Its main features include a powerful query language for processing uncertain data, space-efficient storage of uncertain and probabilistic data, support for data cleaning, and efficient query evaluation.

The WebLab Project.

Completed projects