CS 732 : Spring 1997
Topics in Database Systems : Advanced Query Processing
Lectures: 3:30 -- 5:30PM Tuesday
Place: Upson 111
Contents
Course
Description
CS 732 is seminar-style course that will meet every week.
The goal is to look at an important collection of related topics in
database systems. For Spring 97, the course will focus on
"advanced query processing".
The most exciting developments in database systems over
the next decade will provide greater functionality in querying
data and greater performance in doing so. Relatively new topics
like data mining and OLAP are already buzz words, but are they
well understood? What about the supposedly well-understood
traditional area of relational query optimization? Can anybody build or buy
a query processing system with guaranteed query response
time? What does it take to build the query processing systems
of the future that will work on distributed repositories,
deal with complex data types, and provide quality and service
guarantees to the consumers?
This course will not give you
the answers, but you may get to think about some of these
questions. We will read and discuss research papers from
recent database conferences.
There is also the opportunity to actually
implement some solutions in PREDATOR,
an object-relational database system that is being developed
here at Cornell. This is actually a pretty novel opportunity to
work on a system that is small enough for you to make significant
research contributions, and yet large enough to actually test
your ideas in a realistic database environment.
In terms of workload, here's what the course involves:
- There will be one meeting every week for probably a couple of
hours -- long enough to discuss two papers. The meetings will
hopefully be lively and fun.
-
After the first few weeks, and depending on enrollment, the papers
will be presented by the students.
-
There will be one research project required of everyone.
There is a wide variety of possible project areas, and you are free to
come up with a project that interests you. I will encourage
you to implement your projects on PREDATOR.
Pre-Requisites
To participate meaningfully in CS732, you should have done
an introductory database systems course.
For PhD students, the Q-exam database syllabus is sufficient (don't
skip the paper on query optimization!!). For MEng and
undergrad students who attended CS537 (which I just
taught in Fall 96), please talk with me about joining CS732.
If you are not in any of these categories, you should certainly
talk with me before enrolling.
Possible Topics to Cover
Here is a tentative list of topics to cover; we may not be able
to cover all of them. If there are any suggestions to include other
topics not in this list, we could do that too. The current list is
just to give you an idea of the kinds of topics that we will
look at.
- Overview of Database Query Processing (Graefe93)
- Join Methods (Hashing, Sorting, Nested-Loops) and Join Orders
- Selections and Projections
- Aggregation and Grouping
- Subqueries
- "External" functions
- New data types and new data repositories
- Query Optimization
- Dynamic programming (Selinger79)
- Polynomial-time optimization (KBZ86)
- Rule-based optimization (Graefe93)
- Randomized optimization (Ioannidis90, Swami89)
- Expensive predicates (Hellerstein94)
- Object-Oriented (OO) and Object-Relational (OR) queries
- OO optimization framework (Cluet95)
- OR optimization framework (Seshadri96)
- Estimation Techniques
- Errors in Join Size Estimation (Ioannidis)
- Optimal histograms (Poosala96)
- Sampling techniques (Naughton)
- Adaptive Query Optimization
- Parametric Optimization (Ioannidis)
- Dynamic Optimization (Graefe94)
- Query Scrambling (Franklin96)
- Heuristic Optimization
- Query Rewrite (Pirahesh92)
- Eager/Lazy Aggregation (Yan95)
- Decorrelation (Seshadri96)
- Data Warehousing
- Incremental View Maintenance (Blakeley)
- Using Materialized Views for Query Processing (Gupta95)
- OLAP (Cube paper: SIGMOD96)
- Data Mining
- Association Rules (Agrawal95)
- Classifiers and Clustering
- Mining Image Data (Fayyad95)
- Non-Centralized Query Processing
- Distributed DB: Query processing in R* (Lohman85)
- Parallel DB: Gamma (DeWitt90)
- Heterogeneous DB: Garlic (Haas95)
- Client-Server Querying (Franklin95)
- Query Benchmarks
- Wisconsin Benchmark (Turbyfill85)
- TPC-D Benchmark
- Bucky OR-Benchmark
Papers to be Covered
- Graefe93: Goetz Graefe, Query Evaluation Techniques for Large Databases,
ACM Computing Surveys, June 1993.
- Selinger79: Selinger et al Access Path Selection in a Relational Database Management System, SIGMOD 1979
- KBZ86: Krishnamurthy, et al, Optimization of Non-Recursive Queries, VLDB 1986.
- Graefe93: Graefe et al, The Volcano Optimizer Generator:
Extensibility and Efficient Search, IEEE Data Engg, 1993
- Ioannidis90: Ioannidis et al
- Hellerstein94: Joseph Hellerstein, Practical Predicate
Placement, SIGMOD 1994
-
-
-
-
Professor
Office: 4108 Upson
Phone: 255-1045
E-Mail: praveen@cs
Dr.
Matt Morgenstern of the Xerox DRI, who is a Princpal Scientist at
Xerox and a senior researcher in the field of database systems,
will also provide direction to the course.