News 2017

  • Our paper "Optimizing Voice-Based Output of Relational Data" was accepted at VLDB 2017.
  • Our paper "Solving the Join Ordering Problem via Mixed Integer Linear Programming" was accepted at SIGMOD 2017.
  • Our student project on data audiolization receives Lockheed Martin Award at BOOM 2017, our student project on automated fact checking is covered in the news.
  • Immanuel receives Google Faculty Award for project on data audiolization.
  • Immanuel receives Honorable Mention for Jim Gray Doctoral Dissertation Award.
  • Our paper "Multi-objective Parametric Query Optimization" was selected for publication in CACM as CACM Research Highlight.

Ongoing Projects

The Cornell Database Group is exploring issues related to all aspects of data management. Our interests range from database tuning and query optimization, over data mining and novel query interfaces, to building large-scale systems for new and emerging applications. A non-exhaustive list of recent and ongoing projects follows. 

Query Optimization

Query optimization lays the fundament for declarative query languages such as SQL. The goal of query optimization is to translate a declarative query, describing data to generate, into an optimal query plan (describing how to generate it). Query optimization is a hard optimization problem that typically needs to be solved at run time. Members of the Cornell Database Group have recently worked on novel approaches to query optimization that significantly extend the query sizes that can be optimized efficiently. We have proposed several new query optimization variants to capture the context of approximate and Cloud based query processing and are currently exploring the use of machine learning for improved execution cost estimation.

Text Mining

Search engine providers such as Google regularly receive queries that contain subjective predicates. In order to answer those queries from structured data, search engine providers need to understand what properties an average user associates to which entities. In a recent collaboration with Google, we developed a system that mines the entire Web to find subjective associations via natural language analysis and unsupervised machine learning. 
In another ongoing project, we analyze text documents summarizing relational data sets in order to identify false claims in a semi-automated process.

Voice-Based Interfaces

Recent research on data visualization aims at automatically identifying the best way to represent data on visual interfaces. Supported by a Google Faculty Award, the Cornell Database Group is currently studying the complementary problem of "data audiolization". The goal here is to optimize the way in which structured data is represented via audio interfaces. This problem setting is motivated by emerging devices such as Google Home or Amazon Echo that interact with users primarily over voice based interfaces.