Yet another Query Graph Model (YQGM)

Chavdar Botev - Lin Guo

Computer Science Department

Cornell University

Copyright (C) 2003-2005 Cornell University. All Rights Reserved.


Contents

Introduction

This document describes Yet another Query Graph Model (YQGM) - a general model for representing queries. YQGM is inspired by the Query Graph Model (QGM) in the Starburst project [1], the XML Query Graph Model (XQGM) in XPERANTO [3], and the RelOperator graph model in PREDATOR [6]. YQGM is more general because it is independent of the underlying data model and can support XML, relational and other data models.

The YQGM Model

In short, the model represents the query as a graph. The nodes in the graph are called operators and they are used to perform a specific kind of a computation. The edges are called quantifiers and represent the data flow in the graph. Each quantifier takes the output produced by an operator (called the output operator) and directs it to another operator (called the input operator).

The data flow in the model is encoded as a sequence of tuples, which can be viewed as a relation. Each tuple for a given operator output has a fixed number of attributes. The value of each attribute is a sequence as defined in the underlying data model [4].

As it can be seen, the data model is quite similar to the relational model. The major difference is that the value of an attribute in our model can be quite complex - it can be even a sequence of documents represented as document nodes. Nevertheless, through the rest of the exposition we will try to draw some analogies to SQL, which we hope will make the model clearer.

Let us first brush up the general structure of a SQL statement:

SELECT Expr1, Expr2, ...

FROM R1, R2, ...

WHERE Pred1 AND Pred2 ...

ORDER BY Key1, Key2, ...

GROUP BY GroupKey1, GroupKey2, ...

We now move to a more detailed description of the elements in the YQGM model.

Elements and Relationships

An YQGM Static Class Diagram (e.g. Figure [*]) consists of three main types of graphical objects: classes (represented with boxes), generalizations (represented with arrows with hollow ends), and associations between classes (represented with lines with possible diamond heads or "stick" arrows). For more information about the UML language, refer to [2].

The classes describe the YQGM elements; these are the building blocks of the graph model. The associations describe how the YQGM elements relate to each other. In our model, we call them YQGM relationships. Each relationship has a name, which is specified in both of the corresponding association's ends.

The following sections discuss the general YQGM elements (in library YQGM) and relationships grouped by type. The full YQGM class diagram is shown in section [*] on Figure [*]. The standard YQGM elements are implemented in libray YQGM_STD (Documents for the standard YQGM elements are stored in quark2/yqgm_std/doc ).

Implementation Notes

The YQGM implementation is in C++. All the classes related to it are in the "quark::yqgm:structures::" namespace.

In our implementation, each YQGM element is represented through a C++ class. The name of the class is the same as the name in the diagram.

We chose to implement relationships using template classes. There are two template classes with parameters the class of the parent element and class the children elements: Rel1N for 1-to-N relationships and Rel11 for 1-to-1 relationships. They define two subclasses, ParentAccessor and ChildrenAccessor/ChildAccesor, which represent the view on the relationship from the children's/child's perspective and the parent's perspective. The former allows access to the parent of the relationship and the latter allows access to the children/child in the relationship.

These relationships govern the way YQGM elements can be created. An element is constructed by taking as parameters all its parents in relationships where the element is a child. Therefore, the graph is always constructed top-down: first the query, then the operators, then the quantifiers, etc.

The names of the relationships are formed by the short name of the parent class (given in parenthesis in the YQGM element descriptions below), the short name of the children class, and possible additional tag identifying the relationship if there are more than one relationship between the parent and the child class.

The YQGM Query Graph


Query (Qry)

Figure: UML Class Diagram for Query
\begin{figure}\centering\psfig{file=figures/qry.ps, width = 4in, height = 5in}\end{figure}

Short Description

The Query class is used to represent a query.

Relationships

Relationship Type Role Description
qryOpr

Detailed Description

The UML static class diagram for the Query class is shown on Figure [*]. On this diagram (and the subsequent class diagrams) the classes in focus are shaded. All the other related classes are given in white boxes. Every query (without user defined functions) in YQGM is represented as an instance of the Query class. It consists of one or more operators (instances of the class Operator - see [*]), connected by means of quantifiers (the class Quantifier - see [*]), thus forming the query graph.

The operators are accessible through the qryOpr relationship. There is one designated operator, called the top operator, which produces the result of the query. It is accessible through the qryOprTop relationship.

If we go back to the analogy with SQL and the relational model, the query roughly corresponds to a compound SQL SELECT-statement with the possible use of nested sub-queries.

QueryWithFunction (Qry)

Short Description

The QueryWithFunction class is used to represent a query with user defined functions.

Relationships

Relationship Type Role Description
qryOpr

Detailed description

QueryWithFunction is a query with user defined functions -- FunctionQuery (see [*]). The qryUsrFun relationship provides access to all the user defined functions in this query.

The class QueryWithFunction inherits from Query the qryOpr relationship to all its operators and the qryOprTop relationship to the top operator.


FunctionQuery(UsrFun)

Short Description

The FunctionQuery class is used to represent a user defined function.

Attributes

Attribute Type Description
functionName

Relationships

Relationship Type Role Description
qryOpr

Detailed description

FunctionQuery is a kind of query used to represent user defined functions (UDF). It stores the function name and a parameter operator ParamOperator (see [*]) which holds all parameters of the UDF. The qryUsrFun relationship provides access to the containing query; and the PUserFunction (see [*]) which uses this UDF is accessible through the usrFunPuf relationship.

The FunctionQuery inherits from Query the qryOpr relationship to all its opertators and the qryOprTop relationship to the top operator.

Operators

Figure: UML Class Diagram for Operators
\begin{figure}\centering\psfig{file=figures/opr.ps, width=7in, height = 8in}\end{figure}


Operator (Opr)

Short Description

The Operator class is used to represent an operator in the query graph.


Relationships

Relationship Type Role Description
qryOpr

Detailed Description

The operator is the primary computational element in the YQGM graph. It may or may not take input data from its input quantifiers but it always produces some output, which is accessed by the other operators through its output quantifiers.

The Operator class is an abstract class that defines the common features of all operators (see Figure 2). There are eight types of operators that differ on the type of computation (functionality) they perform:

The common features among all these types of operators are the relationships listed in [*]. The qryOpr relationship allows access to the containing query. The oprOcl relationship encompasses the output columns (see [*]) of the operator that are used to compute the values for each attribute of the output tuple. The oprQunIpt relationship provides access to the input quantifiers for the operator, i.e. the quantifiers that contain the input sequences of tuples. The oprQunOpt relationship contains the output quantifiers of the operator; these are used by other operators to access the data produced by the operator.

Except for ParamOperator, all other specific operators are implemented in library YQGM_STD. The specifics of these operators are explained in quark2/yqgm_std/doc.


ParamOperator

Short Description

The ParamOperator class is used to hold all parameters of a FunctionQuery object (see [*]).


Relationships

Relationship Type Role Description
qryOpr

Detailed Description

ParamOperator is a specific operator. We use it to hold all the parameter parse trees (see [*]) of a FunctionQuery object. However, it doesn't need to define its own relationships, since the parameter parse trees are wrapped in the output columns (see [*]) of the parameter operator, which can be accessed by its relationship oprOcl (inherited from Operator). Please refer to section [*] for the relationships between output columns and parse trees.

Quantifiers

Figure: UML Class Diagram for Quantifiers
\begin{figure}\centering\psfig{file=figures/qun.ps, width=6in, height = 7in}\end{figure}


Quantifier (Qun)

Short Description

The Quantifier class represents the flow of data from one operator to another.

Relationships

Relationship Type Role Description
oprQunIpt

Detailed Description

Instances of the Quantifier class are used to connect one operator's output as another operator's input. The former is accessible through the oprQunOpt relationship, and the latter is accessible through the oprQunIpt relationship (see Figure [*]). The tuples are accessible by the input operator through input columns (see [*]). Each input column is used to access the value of an output column in the output operator. The input columns can be referred to using the qunIcl relationship.

There are three types of quantifiers: EachQuantifier, AllQuantifier and SomeQuantifier. All of the three subclasses are implemented in library YQGM_STD.


InputColumn (Icl)

Short description

The InputColumn class allows the access to the value of an operator's output column through a quantifier.

Relationships

Relationship Type Role Description
qunIcl

Detailed Description

The InputColumn class doesn't inherit Quantifier, but is an accessor to the value of an attribute in the tuples produced by the output operator for a quantifier. The value is accessed through the output column used for its calculation. Then the value can be used in the input operator using a PColumn(see [*]).

The containing quantifier is accessible through the qunIcl relationship. The output column producing the value of the attribute is accessible through the oclIcl relationship. The PColumns accessing the value of the input column can be found using the iclPcl relationship.

Expressions

Figure: UML Class Diagram for Expressions
\begin{figure}\centering\psfig{file=figures/exp.ps, width=6in, height = 7in}\end{figure}


Expression (Exp)

Short Description

The Expression class represents the concept of expression.

Relationships

Relationship Type Role Description
expPrt

Detailed Description

The Expression class is the abstract ancestor of all types of expressions used in YQGM. Through the expPrt relationship, it provides a parse tree (see [*]) that can be used to compute the value of the expression.

There are five types of expressions. Except for OutputColumn, all the specific expressions are implemented in library YQGM_STD.

The Expression class corresponds to the expression that can be found in the different clauses in a SELECT statement in SQL.


OutputColumn (Ocl)

Short Description

The OutputColumn is used to compute the value of an attribute in the output tuple sequence of an operator.

Relationships

Relationship Type Role Description
expPrt

Detailed Description

The output column is a kind of expression ([*]) used to compute the value of a single attribute in the output tuple sequence of an operator. This value is then accessed by some of the input columns of the output quantifiers for the parent operator.

The oprOcl relationship provides access to the containing operator. The expPrt relationship contains the parse tree used to compute the output column's value. The oclIcl relationship gives access to the input columns referring to the values produces by this output column.

The output column is analogous to the expressions in the SELECT clause of a SELECT query.


Parse Trees

Figure: UML Class Diagram for Parse Trees
\begin{figure}\centering\psfig{file=figures/prt.ps, width=5in, height = 6in}\end{figure}


ParseTree (Prt)

Short Description

The ParseTree class represents the concept of a tree generated by a parsed expression.

Relationships

Relationship Type Role Description
expPrt

Detailed Description

The ParseTree is the base class to represent a (sub-)tree used to compute the value of an expression. The name comes from the fact that this tree is constructed by parsing the string representation of the expression.

Parse trees can be composed by supplying a parse tree as a parameter to a parse tree node representing a function (see below). There are three types of parse trees (see also Figure 5).

The relationship expPrt points to the containing expression object and is set only for the root node of the parse tree. The relationship pfnPrtParent can be used to access the parent PFunction (see [*]) which takes this parse tree as an input parameter.


PColumn (Pcl)

Short Description

The PColumn is a type of ParseTree used to access the value of an input column.

Relationships

Relationship Type Role Description
expPrt

Detailed Description

The PColumn class can be used to access the value of an input column (see [*]) in an expression. This class inherits from ParseTree the relationship to the containing expression (expPrt) and the relationship to the parent PFunction object (if any) - pfnPrtParent. The PColumn class adds the iclPcl relationship to the input column whose value is accessed.

The PColumn corresponds to the reference to a column of a relation in the FROM clause in a SQL statement.


PFunction (Pfn)

Short description

The PFunction class represents the application of a function to a set of parameters in the form of ParseTrees ([*]).

Relationships

Relationship Type Role Description
expPrt

Detailed Description

PFunction is an abstract class used to relate a set of parameters (accessible by the pfnPrtChildren relationship) to some functions (implemented by its subclasses). The parameters are other ParseTrees. This class inherits from ParseTree the relationships to the containing expression (expPrt) and the relationship to the parent PFunction object (if any) - pfnPrtParent.

This class corresponds to the function application in SQL expressions. We implement two types of PFunctions (see also Figure 5).


PUserFunction (Puf)

Short Description

PUserFunction represents an application of a user defined function to the parameters.

Relationships

Relationship Type Role Description
expPrt

Detailed Description

PUserFunction is an implementation of class PFunction which applies a user defined function ( accessed by relationship usrFunPuf) to the parameters. All parameters are accessible by the relationship pfnPrtChildren (inherited from PFunction). This class indirectly inherits from ParseTree the relationships to the containing expression (expPrt) and the relationship to the parent PFunction object (if any) - pfnPrtParent.


Full Diagram

See Figure [*].

Figure: YQGM Full Class Diagram

Bibliography

1
Haas, L.M., J.C. Freytag, G.M. Lohman, H. Pirahesh, "Extensible Query Processing in Starburst", Proceedings of ACM SIGMOD, Portland, Oregon (May 1989).

2
Object Management Group (OMG). Unified Modeling Language (UML). http://www.omg.org/uml/

3
Shanmugasundaram, J. et al. Querying XML Views of Relational Data. VLDB 2001.

4
W3 Consortium. XQuery 1.0 and XPath 2.0 Data Model. http://www.w3.org/TR/xpath-datamodel/

5
W3 Consortium. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery

6
Praveen Seshadri, "PREDATOR Design and Implementation".

http://www.distlab.dk/predator/papers/designdochtml/designdoc.html

About this document ...

Yet another Query Graph Model (YQGM)

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html yqgm.tex -split 0 -nonavigation

The translation was initiated by Anand Bhaskar on 2005-04-12


Anand Bhaskar 2005-04-12