Disclaimer: This survey is not finished work, and its discussion may not even reflect my current beliefs. It has been put online as a service to the web community, and should be useful as at least the references are correct!
An example where it deters optimization could involve the union of
Students and Employees, with Person being a superclass of both. If we
wanted to find all supervisors of students and employees we can not
perform the union first and then apply a supervisor() to the
result, since it may not be defined for Persons, the type of the union.
We may be forced to perform the method on each set by itself, and then
form the union of the results, which can obviously be very inefficient.
We can see that the union of Students and Employees is guaranteed to have
supervisor() defined, and safely apply it to the union, but
the requires the optimizer to provide powerful type-inference mechanisms.
2.1.2. Complex Objects
The path expressions and query closure of OODBMS query languages (see chapter
28) complicate the processing of queries in several
ways. One of these difficulties is the building of indices for path
expressions, especially in face of arbitrary methods in the path. This is in
general a very hard problem, and near unsolvable if methods can have
side-effects.
Another problem with path expressions is that they suggest an execution order
of the path methods, which may well be a very inefficient order. As an
example the path Orders.part.name, may well be best evaluated
right-to-left if there are many orders but few parts. When methods are
involved in queries this is further complicated since the optimizer may have
no idea what the method execution times and return values are.
The closure properties of OODBMS query languages support the use of nested
queries, and their use is often very desirable in order to concisely and
express complex queries. Their use, however, greatly complicates the
optimization process, turning it from a local problem to a global one,
i.e. requiring global knowledge entire query expression.
2.1.3. Methods and Encapsulation
Optimizations in the face of methods and encapsulation is self-evidently a
very dubious process, if there is no way of determining the cost of evaluating
methods, and the size of their results. To reinvoke an example from last
section, the expression july.specialorders().part.name may best
be evaluated left-to-right or right-to-left depending on the size of the
return value of specialorders().
A simple example where optimization is hopeless without knowing the method
cost is the expression a.x() AND a.t == b.t. If we don't know
the cost of evaluating method x() we cannot hope to know whether
performing the join first will optimize the query or make it worse.
There are several proposals for the solution of this dilemma in the literature. Some systems allow the optimizer to break the encapsulation of methods and examine them to determine cost, which forces the methods to be written in a language understood by the optimizer. Other systems declare the cost of methods as a part of the definition. Yet other systems optimize queries under several assumptions, and then determine at query run-time what assumptions are most valid, based on current cost statistics, and execute the version optimized for those assumptions.
Optimizing with regard to methods is even further complicated when dealing
with late binding of method calls. In that case it may not be know until at
run-time which method will get called, and thereby what the method cost is.
In this case the run-time selection of one of several pre-optimized queries,
based on the actual method type, might be the best solution.
2.1.4. Identity and Equivalence
Query optimization is complicated by the notion of object identity and
equivalence, as mentioned in section
2.3 of
chapter 28. When predicating on equivalence it
isn't always clear what semantics the equivalence has, and this the optimizer
has to know in order to perform its task.
A major factor in query optimization is whether the query language allows the
creation of new objects, and, if so, whether those objects have object
identities (OIDs). This influences the query optimization in that it can
introduce the need for duplicate elimination, deep equality comparisons where
OID compares would otherwise be sufficient, or the need to deal with objects
without OIDs. Another approach is to enforce a values versus objects
dichotomy, as is discussed in chapter 28.
2.2. An Optimization Methodology
In a triad of papers,
[Str91],
[Özs94a] and
[Özs94b], M.T. Özsu defines, with
co-authors, the query processing methodology presented in figure
1, and discusses current query-optimization research
in its light.

Figure 1:
In this methodology, the query-rewrite optimization is a high-level process where general-purpose heuristics drive the application of rewrite rules, and plan optimization is a lower level process which generates execution plans based on knowledge of relative costs, statistics and physical structure. A declarative query is thus optimized as follows:
The lack of a standard algebra and calculi has hampered research into OODBMS query optimization, prohibiting generalized conclusions to be drawn from research results. Hopefully KOLA, or some other algebra, will soon remedy this like ODMG-93 OQL did for user-level query languages.
2.4. Algebraic Optimization
Algebraic optimization has the advantage that a query can be transformed using
well-defined operators having properties such as transitivity and commutativity.
If the algebra is equivalence-preserving, which it should be, each query has a
large number of equivalent queries making up the optimization search space.
As mentioned before, simple enumeration and evaluation is usually unfeasible, because of the large size of the search space. Even so, dynamic programming methods, such as memoization and branch-and-bound can make this approach feasible. Methods using heuristics are another alternative, examples of which are randomized search, simulated annealing and hill climbing. These methods only examine a part of the search space, trying to evaluate those parts of it which look hopeful in one way or another. These methods therefore cannot guarantee an optimal solution, but, as discussed later, that may not always matter since the search is based on incomplete information anyway.
When evaluating positions in the search space, the optimizer uses a cost function, which is based on such things as cost in IO and CPU time. The cost of a query is defined by recursive addition of subquery cost. As discussed before the cost may not be available for OODBMS methods, and a critical issue is whether to allow the optimizer to break object encapsulation to gather this information. An alternative is to declare the cost of a method in its visible specification.
Compile time optimization is static, in that it cannot use information only available at run time, such as current statistics or the actual type of an polymorphic method. As discussed before one solution to this is to generate several optimized version of the query and select at run time the fastest for the current conditions. This however can involve very high overhead in run-time dispatch of queries, and may not be the best solution. Other alternatives include periodically reoptimizing queries with new statistics, perhaps based on the pace at which the relevant statistics change. Path expressions perhaps the most important aspect to inspect when trying achieve speedup by query optimization, involving the entire optimization process. Their optimization involves all the mechanisms described above, with semantic algebraic rewrite rules playing a large part. There has been considerable research into building path indices for these expressions, since indices are critical for the speed of real databases. The construction and maintenance of such indices is of course much complicated by the presence of arbitrary methods on the path.
2.5. Extensible Optimizers
Since OODBMSs have been, and still are, very much a moving target, with no
standard data model, and only recently with a standard query language
(the ODMG-93 OQL of chapter
28), any optimizer system needs to be able to
change with time. Hence a lot of work in query optimization, e.g.
[Fer93], has been in building ``Optimizer
Generators'' which build optimizers based on information on the data model,
query language, algebra, calculi and cost model. The systems going furthest
in this direction define all aspects of the OODBMS and optimizer as objects
and allow extensions through inheritance and method overloading.
The optimization of OODBMS query languages is inherently much more complicated than the optimization of conventional query languages. There are numerous issues involving types and method evaluation which are much harder to optimize due to the amount of inference the optimizer needs to make in order to be able to efficiently optimize the query.
Fortunately there has been tremendous work done on these issues already in the contexts of conventional programming languages, such as ML, and of general-purpose theorem proving and program verification. Researchers of OODBMS query optimization are beginning to acknowledge this, and when work such as [Mor95] begins to affect OODBMS query optimization to a larger extent we may expect to see some significantly improved results.
OODBMS query optimization is still very much an academic research topic. The OODBMS vendors are currently struggling to flesh out their products into real DBMSs, i.e., still in the stage where they are adding good query capabilities, not where they are trying to optimize the queries. In the academic community, however, there is little interest in query languages, per se (see chapter 28), but a great interest in query optimization. Therefore we will look at future research in this area from an academic point of view.
4.1. Candidate Research Problems
As stated above the area of OODBMS query optimization is currently in very
active academic research, and, given the extensive prior research in the field
of optimization, it is not likely that there are ground-breaking
paradigm-shifting major discoveries to be made in the area. In fact it seems
probable, at least to an outsider, that all the major problems areas have
already been identified and are being worked on today.
Some of these problem areas likely to be worked on in the present, and in the future, include the following:
It is hard to lay down a schedule for the research above. The results in the area tend to build upon prior work, with many attempts at the same goal usually taking place in parallel, often with close working knowledge of one another, proceeding partially by trial-and-error as various different methodologies are brought to bear on the problems. It is very likely that there will still be active research in this area in 5 to 10 years, just as there still is active research in RDBMSs query optimization, although the emphasis will probably have shifted to specific subproblems at that time.
It is hard to provide a scheme for evaluating the success of research into query optimization, the only real criterion being faster average execution time of queries. We can however say that for the last proposed research topic, candidate 8, that the research has been successful if powerful and complex queries can be evaluated quickly on large pools of unstructured data in five years in OODMBSs and on the World Wide Web.
This paper describes KOLA, (arguably) the most advanced query algebra and rewrite rule system for OODBMSs, in a reference manual form. The algebras variable-free rule-based rewrite calculus is based on the author's experience with the AQUA algebra [Leu93]. This algebra is used in the MIT Thor research OODBMS.
Fegaras, L. and Maier, D., ``Towards an Effective Calculus for Object Query Languages'', ACM SIGMOD International Conference on Management of Data, San Jose, California, May 1995.
This paper describes a query language calculus based on monoid comprehensions, which claims to be more effective than other calculi, and describes how to map ODMG-93 OQL, [Cat96], into this calculus.
Fegaras, L., ``Efficient Optimization of Iterative Queries'', Proceedings of the Fourth International Workshop on Database Programming Languages, Manhattan, NY, August 1993.
This paper describes a object-oriented query algebra based on fold iterations, and supplies a normalization algorithm for the algebra reducing queries into canonical forms.
Fegaras, L., Maier, D. and Sheard, T., ``Specifying Rule-based Query Optimizers in a Reflective Framework'', Proceedings of the 3rd International Conference on Deductive and Object-Oriented Databases, Phoenix, Arizona, December 1993.
This paper describes a rule-based query optimizer framework based on syntax-directed query annotation and rewriting. The paper's contribution is directed at the last two steps of figure 1, algebra optimization and plan generation.
Leung, T.W., Mitchell, G., Subramanian, B., Vance, B., Vandenberg S.L. and Zdonik, S.B., `` The Aqua Data Model And Algebra'', Technical Report CS-93-09, Brown University, March, 1993.
This paper describes AQUA, an object-oriented query algebra that served as the precursor to the KOLA algebra, [Che95]. This algebra has a rule-based calculus, like KOLA, but is not variable-free.
Mitchell, G., Zdonik, S.B. and Dayal, U., `` Object-Oriented Query Optimization: What's the Problem?'', Technical Report CS-91-41, Brown University, June 1991.
This paper is an excellent early summary of the issues in optimizing object-oriented query languages. The paper does not focus on any particular solution or methodology for optimization, but rather tries to give a clear statement of the problems involved.
Morrisett, G., `` Compiling with Types'', Technical Report CMU-CS-95-226, Carnegie Mellon University, December 1995.
This thesis advocates the use of a strongly typed intermediate language for the compilation of programming languages. Type-directed compilation with intermediate language dynamic-type dispatch, as discussed in the thesis, were used to develop an ML compiler which generated code twice as fast as that of the excellent SML/NJ compiler.
Özsu, M.T. and Blakeley, J.A., `` Query Processing in Object-Oriented Database Systems'', Technical Report, Computer Science Department, University of Alberta, Canada, 1994.
Özsu, M.T., Straube, D.D. and Peters, R., `` Query Processing Issues in Object-Oriented Knowledge Base Systems'', Technical Report, Computer Science Department, University of Alberta, Canada, 1994.
Straube, D.D. and Özsu, M.T., `` Execution Plan Generation for an Object-Oriented Data Model'', Proceedings of the 2nd International Conference on Deductive and Object-Oriented Databases, pp. 43, Munich, Germany, December 1991.
The three papers above together form a clear overview and summary of the issues and research in OODBMS query optimization. The last paper is the earliest one, and presents the query processing methodology of figure 1, while the first two present the state of 1994 query optimization research in great detail, focusing on cost-based methods.
Su, S.Y.W., Guo, M. and Lam, H., ``Association Algebra: A Mathematical Foundation for Object-Oriented Databases'', IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 5, pp. 775, October 1993.
This paper gives a very complete mathematical specification of a query algebra for the OQL of [Ala89], with proofs of completeness.