This is a tutorial and an online demonstration of PREDATOR. It leads you through 
several examples of database queries you can run on a PREDATOR 1.0 server. 
This tutorial should help you figure out the syntax for various commands and acts 
as the user manual for the system. We expect to keep modifying this tutorial as 
new functionality is introduced into the system. Details of design decisions and 
implementation issues are in a different document.

The layout is as follows:

  1. Overview
  2. SQL data definition.
  3. SQL data manipulation
  4. SQL queries
  5. Enhanced ADTs
  6. Transactions
  7. Other Useful Commands
  8. Monitor commands
  9. Studying query processing

Overview

PREDATOR is a client-server database system. Most of the database functionality 
exists at the server, which accepts SQL queries and returns the results to the client. 
Several different kinds of clients have been implemented (including a Java applet 
client, a Java application client, and clients build on ODBC and JDBC). All the 
examples presented here work from all the clients, unless explicitly mentioned.

If you click on the queries in buttons, you can interactively run them against a 
sample database maintained on a server at Cornell. The results of the interactive 
version of the tutorial will differ slightly from the text of the tutorial, because all the 
tables are pre-created and the data is pre-loaded. Update, create, delete and drop 
statements are not provided with buttons.

In order to try these examples on your local installation, start a server and then use 
one of the provided client programs to connect to the server. Type in any of the 
ample SQL queries and commands. Most of the examples in this document are also 
provided as part of the regression test scripts in the release. The regression tests 
provide more exhaustive examples of various features.

In order to use PREDATOR, you need to take three steps:

Client-specific functionality is described in the individual client tutorial pages listed 
below. This tutorial will only describe the simplest client, the "Text Client".

1.2: Text Client Program

The Text Client Program is a very simple command prompt SQL interface to the 
PREDATOR
server. You can start the Text Client Program in its own window by 
typing

txtclient -s [Port Number]
         <-h [Server Host Name]>

The port number you supply should be the port number of a running PREDATOR  
server. If no host name is provided, the server is presumed to be running on the 
same host as the client. If the connection to the server is made, you will see 
the "Q>" prompt at the client, while there will be some diagnostic messages at the
server window to indicate the connection establishment. Type "list rels;" at the 
client. If there were the appropriate environment variables provided to the server, 
you should see the names of the catalog files in that device. Multiple clients can 
connect at the same time to the same server. They must all connect to the same 
server port. All commands typed into the Text Client Program must be terminated 
with a semi-colon ";". You can terminate the client connection by typing "_quit;".

TroubleShooting

Here are some possible causes of problems, and their symptoms:


Section 2: SQL Data Definition

[ Listing Information | Creating Tables | Creating Indexes ]

Listing Information

The most useful initial command is "list rels" -- the sample run is shown below



Q> ;

_STABLES
_SATTRS
_SSTATS
_SASTATS
_SINDXS
_SVIEWS

This tells you that there are five tables present (all of which happen to be default 
catalog tables created automatically by the system). If you want to see the schemas 
and catalog information for all the tables, type ;. These two are very useful 
commands, and we recommend that you try them at various stages through the 
tutorial to understand the effects of various commands on the underlying tables. 
Other useful commands are:

list views;          ----- lists all views defined in the system
list schemas;        ----- lists all schemas of tables
list env;            ----- lists the execution environment settings

Creating Tables

New tables can be created using the CREATE TABLE command.

CREATE TABLE Foo (field1 int, field2 double) KEY(field1);

This creates a table called Foo -- check using "list rels" to see that it was indeed 
created. You can also check the schema of the new table using "list cats". The SQL 
parser is case-insensitive with respect to keywords, but table names and field names 
are case-sensitive. The key of this table is "field1". Physically, this means that a unique
 index is built on this table on "field1". The current support for keys is very minimal --- 
keys can be specified only on a single attribute, and only on integer, double or string 
data types. A table can also be created without a key specified.

CREATE TABLE Bar (field1 int, field2 double);

The fields or attributes of a table can belong to any of the data types known to the 
system. The following primitive data types are supported:

  int:        4 byte integer
  double:     8 byte double precision floating point
  char[X]:    fixed length (X) character string
  string:     variable length character string (upto 
		 XXX_TEXTSTRING_MAX:5000 chars)
  boolean:    boolean value

The type system is extensible and new data types can be added. The current 
release also has several other data types, as described
here.

CREATE TABLE Locations (field1 int, field2 polygon("1000"), field3 string);
list rels;

Here, we create a table called Locations, which has a field of a polygon data type. 
Note that the polygon field has additional meta-information which specifies further 
information about the polygons -- in this case, it tells the system to expect an average 
of about 1000 points in each polygon.

Partial support for NULL values has been added; NULL values can be stored and 
created, but the specification of certain fields as "NOT NULL" is not supported. 
Further, indexes may not work correctly on NULL values.

Creating Indexes

An index can be created on any existing table, whether data has already been loaded 
into it or not. PREDATOR supports extensible indexing --- new kinds of indexes can 
be added to the system. Therefore, when creating an index, the nature of the index 
needs to be specified too. Currently, two types of indexes -- B+ tree ("B") and R* 
tree ("R") --- are supported. If the index type is not specified, it is assumed to be a 
B+ tree.

CREATE UNIQUE B INDEX Loc_fld1 ON Locations (Locations.field1);

CREATE B INDEX Loc_fld1plus ON Locations (Locations.field1 + 100);

CREATE R INDEX Loc_fld2 ON Locations (Locations.field2);

CREATE B INDEX Loc_fld3 ON Locations (Locations.field3);

;

B+ tree indexes can be created on integer, string, and double fields. They can also 
be built on any expression that returns one of these types. Note that the fields must
be prefixed by the table name in the indexing expression. R* trees can be built on any 
data type that provides a bounding box. This is defined in the ADT method interface, 
and we will not go into the details here. Many of the geographic data types like point, 
polygon and box support this capability. All indexes can be defined to be "unique" --- 
this means that duplicate index key values will cause an insertion error.


Section 3: SQL Data Manipulation

[ Loading Data | Inserting Data | Updating Data | Deleting Data ]

Loading Data

There are two ways to add data to a table: by loading it from a file, or by inserting it 
one tuple at a time. Loading from a file is the preferred route, especially when there 
is a large amount of data to load. Since we are yet to implement bulk-loading of 
indexes, the tuples are still inserted into the indexes one at a time.

LOAD TABLE Foo FROM "data/foo.data";

The filename "foo.data" must represent a relative pathname from the directory where 
the server is running, or an absolute path name on the server machine. The next 
release will allow a URL to be specified instead. The format of "foo.data" is a simple 
tab-separated file with each tuple on a different line.

Inserting Data

INSERT INTO Bar VALUES (1, 2.2);

The use of "insert into" allows specific values to be added to a table. The values can 
be constructed using any arbitrary expressions that produce the correct types. To 
further illustrate this, consider the following:

INSERT INTO Bar VALUES (1+2, 3.5*7.2);

INSERT INTO Bar VALUES ("wrong", "types");

The first statement succeeds, while the second gives an error.

INSERT INTO Bar SELECT * FROM Foo;

The last example shows that the results of a query can be treated as input for 
insertion. "Select * from Foo" is the simplest kind of SQL query --- more complex 
ones are in the later sections. However, it is useful to be able to look at the data you 
just inserted. Try:

; 

; 

Updating Data

The values of columns of tuples in a table can be updated using the "update" 
statement.

UPDATE Bar SET field2 = field2*2 WHERE field1 = 1;

SELECT * FROM Bar;

Deleting and Dropping Data

You can delete data by deleting individual tuples or by totally removing a table.

DELETE FROM Bar WHERE Bar.field1 = 1;
SELECT * FROM Bar;

This deletes all tuples in Bar that have 1 as the first field. If you wish to remove the 
entire table:

DROP TABLE Bar;
list rels;

Of course, this is a persistent database system, and this means that data persists 
across multiple invocations of the system. Try quitting the client ("_quit;") and starting 
it again. "list rels" should show you the same tables. Try quitting both the 
client("_quit;") and the server("shutdown"), and then restarting both. The state of the 
database should be unchanged.


Section 4: SQL Queries

[ Projection | Selection | Joins and Cross-Products | Aggregation and Grouping | Ordering Results | Subqueries | Views | Table Expressions ]

We will use the two tables Foo and Bar created in the previous section. 
Since we dropped Bar at the end of the section, let us recreate it as follows:

CREATE TABLE Bar (field1 int, field2 double);

INSERT INTO Bar VALUES (1, 2.2);

INSERT INTO Bar SELECT * FROM Foo;

INSERT INTO Bar VALUES (1+2, 3.5*7.2);

;

Ensure using list rels that both Foo and Bar exist.

Projection

Projections are specified in the SELECT clause target list. We have already seen 
a simple version (select *) of a projection --- this projects all the attributes of the 
tables in the FROM clause. Here are some more sample projections: try them out:

;

;

; 

; 

#note the type cast from int to double
; 

; 

; 

; 

"B.#0" refers to the 0'th field of B. Note that in almost all cases, we have referred to 
attributes using the . syntax. We recommend sticking to this syntax wherever 
possible. Although the very last query works without this syntax, we do not guarantee 
that other more complex queries will work if the table name is omitted. We do not yet 
support nested subqueries in the SELECT clause.

Selection

The WHERE clause is used to specify selection predicates. Here are some 
examples. If you like, combine them with the projection examples:

; 

; 

; 

; 

; 

; 

The predicates in the WHERE clause can get arbitrarily complicated, with nested 
AND, OR and NOT. We also support nested subqueries in the WHERE clause, but 
more about those later.

Join and Cross-Product

The FROM clause can specify multiple tables which will be joined using predicates 
specified in the WHERE clause.

; 

; 

; 

; 

The last query is an example of a cross-product; there is no predicate in the WHERE 
clause that connects the two tables.

Aggregation and Grouping

Entire tables (or tables resulting from queries) may be aggregated to produce 
composite answers.

; 

; 

; 

The aggregates supported are sum(), avg(), prod(), max(), min() and count(). Note 
that the count(*) syntax of standard SQL is not supported. Also, "distinct" aggregates 
are not yet supported.

Groups of tuples may also be aggregated using a GROUP BY clause. All tuples that 
have the same value of the expression in the GROUP BY clause form a single group 
which is aggregated over, producing a single answer tuple. Duplicate elimination is a 
special case of grouped aggregation that has a special syntax: all identical tuples 
form a group, and only one of them is retained.

; 

; 

; 

; 

; 

SQL places certain restrictions on the contents of the GROUP BY, HAVING and 
SELECT clauses. Any non-aggregates in the SELECT clause must be named in the 
GROUP BY clause. Likewise for any non-aggregates in the HAVING clause. Our 
implementation does not allow aggregates in the HAVING clause.

Ordering Results

The results of any query can be ordered for presentation to the user by adding an 
ORDER BY clause. The results can be ordered by an attribute that is projected out 
of the final result! The results can also be ordered by any complex expression that 
generates an ordered data type.

; 

;

; 

; 

; 

; 

SubQueries

PREDATOR supports nested subqueries in the WHERE clause of an SQL query. 
The subqueries may be correlated or uncorrelated with the enclosing blocks. We 
begin with uncorrelated subqueries. Try the examples below and understand their 
meaning.

## Example of IN subquery
;

## Example of ANY subquery
;

## Example of ALL subquery
;

## Example of scalar subquery (subquery cardinality must be 1)
## Not handled yet.
SELECT * FROM Foo F WHERE F.field1 = 
(SELECT max(B.field1) FROM Bar B);

Now for the correlated subqueries --- these subqueries can reference "correlation" 
columns from the enclosing query block. The best way to think of this is to mentally 
model each subquery invocation as a function call. The result of the function call is 
the table produced by computing the subquery. In the uncorrelated examples so far, 
the function takes no parameters (in other words, the subquery is constant across 
multiple invocations). In fact, many commercial systems (but not PREDATOR yet) 
will cache the results of such an uncorrelated subquery so that it does not need to 
be recomputed. Correlated subqueries, on the other hand, take one or more 
parameters. Try the examples below.

## Example of EXISTS subquery (F.field1 is the correlation value)
;

## Example of NOT EXISTS subquery 
;

## Example of UNIQUE subquery (true if subquery has exactly 1 tuple)
;

Finally, for those of you who have endured an introductory database course, the 
famous "division" query. Assume there is a table FooBar that relates Foo tuples with 
Bar tuples. We want to find those Foo tuples that are related to all Bar tuples that 
satisfy certain conditions. Some real-world analogies

  1. Find employees who have worked in all departments with budgets > 500,000.
  2. Find those beanie baby toys that are carried by all the large toy stores in the 
    Ithaca area.
CREATE TABLE FooBar (FooField int, BarField int);

INSERT INTO FooBar VALUES (0, 0);
INSERT INTO FooBar VALUES (0, 2);
INSERT INTO FooBar VALUES (0, 3);

;

Views

So far, all the queries have involved a single query block. This makes it tough to 
express complex queries. One option is to use views. A view is just a name for the 
result table defined by a query -- except that it is just defined, not executed. Once 
you define a view, that view may be used to construct another query, just as if it is a 
stored table. Use the
diagnostic tool described later to see how views are actually 
processed. Here is a simple example.

CREATE VIEW PartialResult AS
 (SELECT F.field1 as attr1, max(F.field2) as attr2
  FROM Foo F, Bar B
  WHERE F.field2 > B.field2
  GROUP BY F.field1);

; 

You can construct arbitrarily complicated queries using views.

Table Expressions

Table expressions are supposed to be "transient views" -- in other words, if you do 
not want to have persistent views, but just want the convenience of structuring your 
query into understandable blocks, table expressions are the mechanism of choice.

; 

Section 5: Enhanced ADTs

[ Table and Index Creation | Load and Insert | Using Methods | ADT Aggregates | Ordering | Standalone Functions ]

Methods of ADTs

Every ADT can define methods which can then be used in SQL queries. The type 
system is extensible, of course, allowing new data types to be added. Currently, there 
are quite a few data type distributed with the system --- look
here for a brief 
description of them. In this and the next section, we will show you how to use ADTs 
using a couple of canonical examples. One is of the image ADT. The other is of the 
complex number ADT (one of the simplest possible ADTs). Issues to consider 
include:

In this section, we address these questions.

Table and Index Creation

Table creation is straightforward -- in fact, in Section 1, we had an example of a 
creating a table with a polygon field. Here, we use image and complex number as 
examples.

CREATE TABLE ImageTest (id string, picture image("format jpeg")) key(id);

CREATE TABLE ComplexTest (id int, value complex) key(id);

CREATE B INDEX ComplRealIndx on ComplexTest (ComplexTest.value.real());

The key of a relation must be a field on which a unique B+ tree index can be built. 
Currently, this is restricted to a few data types (int, double and string). Consequently, 
an image or a complex number cannot be the key of a relation. We hope to ease this 
restriction in the next release. As the example shows, an index can be created on the 
result of the real() method of the complex number ADT. This method returns an 
integer, which is the real component of the complex number. Indexes can be built on 
any complex expression that returns a suitable type.

Load and Insert

Loading data to one of these tables is no different from the earlier examples. 
However, the only question is: how are the ADT values represented? The ADT 
developer usually makes a choice of whether the ADT objects are going to be 
"large" or not. if they are "large" (as in the case of the image ADT), it doesn't make 
sense to place the actual image within the load file. Instead, each image is placed 
in its own data file, and the load file only contains the file name. Similarly, in order to 
insert a single tuple, a "constant" image value can be specified by specifying a type 
castof a file name. This file name could instead be a URL and it would be 
downloaded into the database.

LOAD TABLE ImageTest from "data/renoir.data";

INSERT INTO ImageTest VALUES ("girl", [image("format jpeg")]"data/image_data/watercan.jpg");

; 

Complex number values, on the other hand, are pretty small, and it is silly to put each 
complex number into its own file. Instead, they are specified inline in the data file. It is 
the responsibility of the ADT developer to decide how the values are going to be 
represented.

LOAD TABLE ComplexTest from "data/complex.data";

INSERT INTO ComplexTest VALUES (10, [complex]"10.0+20.1i");

; 

Using Methods

We have already seen the use of an ADT method in creating an index. The use of 
methods in queries is similar. Wherever an expression, column value, or constant 
occurs, a method expression that produces a valid data type may appear. Here are 
some examples:


; 

; 

; 

ADT Aggregates

ADTs can define their own aggregates. For instance, the complex number ADT 
defines "csum" which computes the sum of a group of complex numbers. These can 
be used like the other aggregates. The second example below shows the application 
of functions to the result of an aggregation.

; 

# this computes the imag() method on the sum of the complex numbers
; 

Ordering

PREDATOR uses sorting in order to compute GROUP BY clauses. Obviously, 
sorting is needed for ORDER BY clauses. The data types of grouping or ordering 
fields can belong to any ADT that is an "ordered ADT". The ADT developer specifies 
this property. If the ADT is ordered, internal methods are provided to compare two 
values. This method is used for sorting. Consequently, you can specify an ADT field 
in an ORDER BY or GROUP BY clause, as long as it is of an ordered ADT. Neither 
image nor complex fall into this category (though one could provide sort orders for 
both, our implementation of the ADTs has not). However, try to construct your own 
example with the date ADT and observe the sorting on dates.

Standalone Functions

Apart from ADT methods, there is also support for standalone functions. While the 
mechanism is in place, we have only provided a couple of such functions as a proof 
of concept, since our focus has been on ADTs. One function, intplus() adds two 
integers, producing their sum. Another, matches , checks if the first string argument 
is a substring of the second string argument. If so, it returns 1, else 0.

; 

; 

Section 6: Transactions

[ Single Statement Xacts | Multiple Statement Xacts ]

Single Statement Xactions

By default, every "request" sent from the client to the server is treated as an ACID
transaction (i.e it either commits on success or aborts on failure). Failure could occur 
due to errors or lock timeouts. If you are using the txtclient, a "request" is usually a 
single command terminated by a semi-colon. From the Java GUI, it is possible to 
send multiple commands as a single "request" -- however, if you want to do this, try 
using explicit multi-statement xactions instead (see next section). Here are some 
examples of single-statement transactions:

INSERT INTO Foo VALUES (10, 5.5);

#Hit a CTRL-C, kill the server, and restart it
#The inserted tuple will be in Foo
SELECT * FROM Foo;

#Now look for abort/rollback.
# Insertion of duplicate key is an error that causes
# a rollback
INSERT INTO Foo VALUES (10, 10.5);

SELECT * FROM Foo;

Multiple Statement Xactions

If you want to bracket several statements as a single transaction, you can do so.

#Begin a transaction
trans begin;

DELETE FROM Foo WHERE Foo.field1 = 10;

SELECT * FROM Foo;

#Abort the transaction
trans abort;

# Voila, the deleted tuple is back in place
SELECT * FROM Foo;

If there is an error in the middle of an multi-statement xaction, the transaction is 
aborted. Any subsequent request is then run as a separate xaction.


Section 7: Other Useful Commands

[ Setting Preferences | Sundry Commands ]

Its often convenient to set flags to specify different kinds of behavior in the system. 
This is especially useful in measuring the effects of various features. The preferences 
for the relational engine can be viewed and set as follows:

Setting Preferences

#list the current settings
; 

#many of these are for internal use and debugging.

# Turn off display of results
setenv show_query_results 0;


# Turn on display of results
setenv show_query_results 1;

When running performance experiments, it is often very useful to set 
"show_query_results" to 0. If you are extending the system you may want to add to
 these flags. Look at the design document for details of how to do so.

Each E-ADT can provide its own set of options to control its behavior. Currently, there 
are the following default options for large E-ADTs.

# show the options for the "image" E-ADT
;

# now prevent images from being displayed
set_adtenv image show_output 0;

One of the issues with large E-ADTs like images is: how will they be displayed when 
a query result involves them. The solution to this is for the server to treat the large 
E-ADT value as a MIME-type value and refer to it as a URL. As you see if you run the 
query interactively through this page, the hyperlinks to images appear as long URLs 
which have the form of hexadecimal strings. This is a simple encoding of the binary 
image handle as an ASCII URL. The PREDATOR server also acts as an HTTP 
server purely for the purpose of serving these large objects.

Sundry Commands

#collect statistics on a specific table Foo
runstats Foo;

#collect statistics on all tables
runstats;

The runstats command collects data distribution statistics which are used in query 
optimization. As new data is inserted or existing data is deleted, the statistics become 
inaccurate. Consequently, one should periodically execute runstats for good query 
optimization.


Section 8: Monitor Commands

The window in which the server was started is called the "monitor" window --- here, 
there is a separate thread reading input requests and processing them. Actually, the 
language supported here is called "XXX" and it is processed a different engine from 
the relational engine. Here are the commands that can be run from the monitor 
window:

        adts       : list available ADTs
        clients    : list current clients
        engines    : list available engines
        help       : shows this list
        shutdown   : terminate server
        sys_stats  : print server statistics
        sys_disk   : show disk volume stats
        sys_shore  : show SHORE stats
        sys_reset  : reset all stats

The most useful of these is, of course, "shutdown", which is the way to shut down the 
server (Ctrl-D works just as well). Some of the commands to print statistics are useful
 for system developers trying to understand where disk space is being used, etc. 
However, the only reasonable statistics currently gathered are by SHORE -- we hope 
to put in statistics for the rest of PREDATOR in the next release.


Section 9: Studying Query Processing

We have a simple tool, QPDisp, that displays the effects of query optimization and 
processing. In essence, it is a tree drawing package written in Java. When any query 
is executed at the server, a log of query optimization and processing information is 
dumped out to a diagnostic file (in a file query_proc_info.clientXX.txt in your 
XXXTMPDATADIR
directory). Any such information is in the form of query trees. 
Currently, three categories of trees are monitored, corresponding to three different 
phases of query optimization. More categories will be added in the future. I really 
encourage you to try this tool -- it is almost like opening up the innards of the system.

QPDisp is a Java applet. You should set up a web server to run on the same machine 
as the database server (there are a number of free database servers --- choose any 
one that supports Java applets). We have provided a template HTML page 
qpdisp.html
that embeds the QPDisp applet. You will need to specify the name of 
the server dump file that will be visualized.

For example, consider the query:

; 

Here is an example of the QPDisp applet when applied to the dump file generated 
when executing this query. When the applet opens up, click on the OPEN SESAME  
button. This brings up a window with a View menu. In this menu, focus on the 
transformation of the image method expression in the SELECT clause. You can 
walk step-by-step through the E-ADT transformations that occur in optimizing this 
expression.

This tool is at an early stage of development. We plan to enhance it to visualize the 
query optimization process and query execution as well. Currently, it behaves well with
Netscape 4.0, but oddly with IE 4.0 (we'll try to find out why and fix it).