The Cornell Database Group is exploring issues related to all aspects of data management. Our interests range from developing efficient algorithms for very large data sets, to building large-scale systems for new and emerging applications. We are currently working on the following projects.

Computer games are becoming the next frontier for social interaction between humans. In our project on data-driven games , we use techniques from the database community to scale up computer games and virtual worlds. An example is the artificial intelligence of non-player characters. We made a first step towards truly scalable AI in computer games by modeling game AI as a data management problem. With our highly expressive scripting language SGL we can use sophisticated query processing and indexing techniques to efficiently execute large numbers of SGL scripts, thus providing a framework for games with a truly epic number of non-player characters.

The digitization of our daily lives has led to an explosion in the collection of personal data by governments, corporations, and individuals. Such information is stored in large databases, and easy access to these databases has resulted in a dramatic increase in the disclosure of private information about individuals. Analogously, many organizations need to guard their proprietary data against unauthorized access and administrators need to be able to express various data access control policies. In the Data Privacy and Security Project, we are working on techniques to limit disclosure of information from such databases.

Traditional content based publish/subscribe (pub/sub) systems allow users to xpress stateless subscriptions evaluated on individual events. However, many applications require the ability to handle stateful subscriptions. In the Cayuga Project , we are building a scalable complex event monitoring engine that can process queries that span multiple events. Applications of Cayuga include system monitoring, monitoring RSS streams, stock tickers, and management of RFID data streams.

For many of the most challenging data management problems, such as data integration, data cleaning, Web information extraction, Web search, and data mining, uncertainty is the central issue that renders the problem difficult. Database systems for managing uncertain and probabilistic data will also be used for managing sensor data, tracking vehicles, and for decision support in business, crime fighting, and intelligence. In the MayBMS project we are currently working on developing database technology and query languages for efficiently managing very large uncertain and probabilistic databases.