The Cindrew query rewriting system

Important note: This is a project that ended in 2001 and some of the description below is outdated! In particular, my notion of conjunctive inclusion dependency is nowadays better known as GLAV mapping or source-to-target dependency.

Downloads

About Cindrew

Cindrew (Conjunctive INclusion Dependency (cind) REWriter) is a prototype query rewriting tool for information integration, and is the first implemented system of its kind to work with symmetric constraints. Symmetric constraints are necessary if neither the classic global-as-view nor local-as-view assumptions hold, as is the case in environments that consist of a large number of heterogeneous information systems whose schemata and information models have to be considered "legacy" (i.e. they cannot be reengineered and integrated against one common global enterprise model) but which still need to share data, as it is the case at CERN. Symmetric constraints allow to conveniently handle concept mismatch in its most general form.

Information Integration approaches based on symmetric constraints have been previously proposed in the context of description logics. However, these approaches are confined to query answering, i.e. they require the processing of the available data in order to decide what to return as result of the integration effort. Consequently, they are not scalable. Query rewriting, on the other hand, exclusively reasons on the level of descriptions (i.e., the queries), and allows to restrict and optimize queries before they are executed. Also, since the overall integration problem is solved on the level of queries, no data that are later found not to be relevant for the result need to be transferred across the (distributed) computing environment.

The query rewriting problem is the following:

Given a number of materialized and virtual relations, a set of constraints that encode knowledge about the relative "meaning" of the relations, and a query over any materialized or virtual relations, find a rewriting (another query) which

  • only consists of materialized relations
  • will only return tuples that, under the given constraints, should also be in the result of the input query
  • is the best (most encompassing) such rewriting attainable in a given query language

Materialized relations are database relations and hold data that we may be interested in, while the virtual relations do not, just like classical Views in SQL. Constraints in Cindrew - cind's - encode subsumption (containment) relationships between pairs of conjunctive queries. This generalizes both logical views in the local-as-view and the global-as-view paradigm.

The query language that we cover are the positive relational queries. Cindrew will, given a conjunctive query, return the maximally contained positive rewriting, as a set of conjunctive queries. That means that for any conjunctive query for which we can be sure that it returns only correct answers to the input query given a number of constraints (and the semantics that we will discuss below), we can guarantee that there will be a conjunctive query in the result that is equivalent or subsumes that query.

Cindrew does not execute queries, but is meant to demonstrate novel query rewriting concepts and algorithms. Its results can be more or less directly (after translating Cindrew's results to their proper query languages, a straightforward task) used by add-on query executors such as mysql and Kweelt, or middleware such as Oracle SQL*Net/Net8. Note, however, that the current prototype of cindrew is based on the relational data model only, which requires some ingenuity when mapping from different data models such as the object-oriented or the semi-structured. Also, as the result produced by Cindrew is inherently nonrecursive, it is not suited for many Web applications. Functional dependencies, on the other hand, are on the near-term to-do list of Cindrew, which will improve its practical usefulness in the context of structured relational and object-oriented applications.

Syntax and Semantics

The syntax of constraints and queries in Cindrew is based on a generalization of datalog into (a plain-text encoding of) set-theoretic/tuple relational calculus notation, and should be familiar to information systems researchers. For instance, the cind

is encoded as

{<X, Y> | exists Z: p1(X, Z), p2(Z, Y)} >= {<X, Y> | p3(X, Y)}

Queries and cind's that can be expressed in datalog may be expressed that way. See the Cindrew Unix man page for a BNF of the syntax used.

The semantics of query rewriting in Cindrew is the classical logical. A maximally contained rewriting of a conjunctive query Q is equivalent to the set of all conjunctive queries Q' for which the constraints logically imply that Q contains Q'. In the pathological case that the constraint base is cyclic (in terms of a dependency graph containing an arc for each pair of predicates where one predicate appears in the subsumer side of a constraint and the other in the subsumed side), Cindrew will detect cyclicity but nevertheless try to rewrite the input; and it may fail to terminate. The resulting positive query may be infinite, however, this is not a necessary condition for Cindrew not terminating. If the constraint base is acyclic, Cindrew is guaranteed to produce the maximally contained rewriting and terminate.

Examples

(Use copy & paste to transfer the examples into the input box and use the "Back" button of the browser to return to this page.)

ullman97 (a pure local-as-view rewriting example)
pottinger_levy2000.1 (a pure local-as-view rewriting example)
pottinger_levy2000.2 (a pure local-as-view rewriting example)

example.01 (with explanation / SQL)
example.02 (somewhat) simple(-minded ?)
example.03
recursion_unfolding
rewrite_system
pcp.3 A straightforward but ineffient encoding of PCP.
pcp.4 An much more efficient encoding of PCP.
pcp.5 An automatically generated instance of PCP.

tiling Encoding of the Tiling problem which shows that query containment and rewriting with acyclic sets of cind's are NEXPTIME-hard (and NEXPTIME-complete due to a result on nonrecursive logic programming in [Dantsin and Voronkov, 1997])

query_optimization.1 (with explanation / SQL)
query_optimization.2

unification.1
constants.1 (with explanation / SQL)
constants.2 (with explanation / SQL)
layers.1
layers.2

bench.7 (a somewhat larger example, 100 chain query constraints, 22K)
bench_l5.1 (a layered chain query example with 5 layers, 10 predicates (two on each layer), and 148 constraints, 26K)
random.1 (random queries, many solutions)