Tuesday, April 11, 2006
4:15 pm
B17 Upson Hall

Computer Science
Colloquium
Spring 2006

Christoph Koch
Saarland University Database Group
 

Processing Queries on Tree-Structured Data Efficiently


Queries on tree-structured data have received much research attention recently. This is due to the important role they play in enabling technologies such as XML on one hand and complex application areas such as data integration and computational linguistics on the other.
 
In this talk, I present recent results on the efficient evaluation of queries on tree-structured data, focusing on popular query languages such as conjunctive and first-order queries, XPath, and XQuery. Both (complexity-) theoretic and algorithmic results will be presented; the latter have been shown to be of immediate usefulness in practical systems.
 
I start with structural properties of tree-like data and present a dichotomy theorem for conjunctive queries over trees. This result characterizes the tractable classes of conjunctive queries in terms of tree structure relations.
 
I then move on to structural properties of queries and their use in obtaining efficient query processing algorithms. I present the first polynomial-time algorithm for XPath processing and will then discuss the syntactic properties of XPath this algorithm is based on. I also discuss lower bounds and the parallel complexity of XPath queries, and derive predictions about XPath processing algorithms from these. Furthermore, I give a lower bound from communication complexity for streaming XPath algorithms.
 
Finally, I study the substantial complexity of XQuery, the most popular tree-transformation query language, and present a natural fragment, composition-free XQuery, that can be evaluated with reasonable efficiency while requiring no loss of power to formulate queries.
 
The talk concludes with a comparison of query languages on trees in terms of their expressive power, complexity, and succinctness.
 
Christoph Koch did his doctoral research at CERN, a high energy physics research laboratory near Geneva, Switzerland, and received his PhD in 2001. He was a postdoctoral researcher first at TU Vienna and later at the University of Edinburgh. From 2003 to 2005 he was on the faculty of TU Vienna and obtained his Habilitation in 2004. Since April 2005 he is professor (W2) and Chair of Information Systems at Saarland University, Saarbruecken, Germany. He has authored or co-authored about 50 publications in the areas of data management and Artificial Intelligence, two of which won best paper awards at PODS 2002 and ICALP 2005. He co-chaired DBPL 2005 and is on the editorial board of ACM Transactions on Internet Technology. His current research interests are in database systems and database theory, in particular in queries on tree-structured data, XML, data stream processing, managing incomplete information, efficient query evaluation and query optimization, visual query languages, and scientific databases.