Thursday, March 1, 2007
4:15 pm
B17 Upson Hall

Computer Science
Colloquium
Spring 2007

Cong Yu
University of Michigan
 

Taming Complex Databases through Schema Summaries
 

Real databases can often be very complex and their schemas can comprise thousands of tables, elements, attributes, etc.  Any one wishing to interact with such a complex database first has the daunting task of understanding the database schema. In this talk, I will propose the concept of schema summary, which can provide a succinct overview of the underlying complex schema and significantly reduce the human effort required to understand the database.  I will define criteria for good schema summaries, and describe efficient algorithms for producing them.

User effort in locating schema elements needed to construct a structured query can be greatly reduced with a schema summary, which allows the user to explore only portions of the schema that are of interest.  Nonetheless, as the query complexity increases, this approach of querying through exploration is no longer a viable option because a significant percentage of the schema will have to be explored.  By leveraging schema summary and a novel schema-based semantics for matching meaningful data fragments with structure-free search conditions, I will propose a novel query model called Meaningful Summary Query.  The MSQ query model allows the users to query a complex database through its schema summary, with embedded structure-free conditions.  As a result, an MSQ query can be generated with the knowledge of the schema summary alone, and yet retrieve highly accurate results from the database.