STARTS
Stanford Protocol Proposal for Internet Search and Retrieval
Reference Implementation
Implementation Notes
This page describes some features of the STARTS reference implementation of interest to
developers and testers. There is a separate document for information on how to install and run the STARTS reference implementation.
Information contained within is:
The Java STARTSServer application is java code organized into seven packages,
and an unpackaged main class. The main class is called corbaServer,
which is a wrapper to the STARTSServer class. The packages are as
follows:
- parsing - classes used for parsing ranking and filter expressions in STARTS
queries
- query - classes used for non-engine and source specific processing of STARTS
queries
- regexp - methods for manipulating Perl-like regular expressions.
- resource - classes specific to the resource and source used in the reference
implementation.
- results - classes used for query results
- STARTSConfiguration - general server configuration data
- wais - classes specific to the freeWAIS search engine.
Note: no resource or search engine specific code is located outside the wais and
resource packages. Full documentation (it may
be slightly out of date) on these packages is available in javadoc format.
The CORBA STARTS client source code is in the same ZIP file as the Server source code,
so it appears to be in package client. The client consists
of:
- queryInputFrame.java -- the window where STARTS queries are constructed
- resultsFrame.java -- the window where messages or results are displayed
to the user
- STARTSClientApplet.java -- the wrapper java file to run the client as
an applet
- STARTSClientGUI.java -- the guts of the client application
- startspage.htm -- a very simple html page for launching the client as
an applet (note that 1.1 event handling is required)
- symantecClasses.jar -- a handy package of the small number of Symantec
Cafe class files used by the client
The client is written so that it can be run as a java applet or application. If
it is to be run as an applet, the applet class is call STARTSClientApplet.
If it is to be run as an application, the main class is in STARTSClientGUI.
The client requires a few class files from Symantec Visual Cafe, which have been
included in symantecClasses.jar.
The CORBA STARTS IDL is in the same ZIP file as the
Server source code, so it appears to be in package IDLfiles.
Obviously both the client and server require access to the class files generated
from the IDL.
The Reference implementation was built and/or uses as runtime a number of publicly
available software packages:
- CORBA software. Unfortunately, ORBs are not free, but they are available! We used
OrbixWeb 3.0 for development, so we can only claim that our code works for an OrbixWeb 3.0
flavored ORB. If you have a different ORB, you will probably need to tweak the java
code, especially that in corbaServer.java and in STARTSClientGUI.java.
Also, look for source files with import statements for packages IE.Iona.OrbixWeb
or org.omg.
- freeWAIS-sf search engine
freeWAIS-sf as used in the STARTS reference implementation is unchanged from the
distributed version.
- jb - Java Bison Parser
generator from the CU Arcadia Project.
jb is used to generate java source files from .lex and .y files. The respective .y and
.lex definitions are included in the StartsServer distribution, in the parsing/ranking
and parsing/filter directories (representing the scanning and parsing
definitions for both filter and ranking expressions respectively). These directories also
contain the generated java source files - YYparse.java, YYlex.java, and YYtokentypes.java.
If you want to regenerate these files, you will need to read the jb documentation,
available at jb - Java
Bison Parser. Also note that after you generate the files using jb you will have to
make a few manual changes to integrate the files into the StartsServer code:
- Add to each file the proper package definition - this is package
parsing.filter for the generated filter expression files and package
parsing.rankingfor the genreated ranking expression files.
- Replace the import jbf.* statement in all files with import parsing.*.
- In the generated YYparse.java file for both filter and ranking expressions,
change the declaration of yyval from protected to public.
- Jonathan Payne's Regular
Expression Package for Java
freeWAIS does not natively handle separate ranking and filter expressions. The behavior
of separate ranking and filter expressions is handled as follows. Both the STARTS filter
and ranking expressions are translated to wais queries. The two distinguishing components
of STARTS ranking expressions are handled as follows:
- The list operator is translated to an "ored" set of terms - for
example, the STARTS ranking expression list((body-of-text
"distributed")(body-of-text "database")) is translated to the WAIS
query (bd=distributed) or (bd=database).
- The weighted ranking syntax in STARTS is converted to a term repetition that
corresponds to the integer factor of the weight from the lowest weight. That is, the
STARTS ranking expression list(("distributed" 0.7)("databases"
0.3)) is translated to the WAIS query (distributed or distributed or database).
- Following this translation, both the filter and ranking expressions are submitted to the
wais engine. All "hits" from the filter expression are returned with their
scores modified as follows. If the "hit" appears in the results set from the
ranking expression, the score is set to that of the ranking expression. Otherwise, the
score is set to 0.
Since our ORB was for NT, we needed to have the client and server running on NT.
But since freeWAIS has not been ported to NT, we needed to freeWAIS on
UNIX. We used the vestigal testing/development setup from release 1.0. (Note
that Carl already knew he could run the freeWAIS server on Solaris and communicate
with it from NT via sockets. But he didn't want to port the waissearch component to
NT.) So in order to do this, we have the following:
- A few places in the StartsServer code where the code
varies according to whether it is run on UNIX or NT (we use the NT option for the CORBA
release)
- A dummy server to bridge between an NT resident StartsServer
and Solaris resident freeWAIS.
Let's review the components of the reference implementation:
- The STARTS client application, which uses the STARTS IDL to specify STARTS
requests which it sends to ...
- The CORBA ORB, which accepts STARTS requests and communicates via IIOP with ...
- The CORBA StartsServer application, which does all STARTS processing except for
the actual searching, which then communicates via a port specified in ...
- The dummy server java class, which runs on UNIX and accepts the ASCII WAIS
search strings over the socket and then communicates via a Java native method call with...
- The modified freeWAIS waissearch code, which takes the ASCII WAIS search string
from the dummy server and turns it into a WAIS (quasi-Z39.50) query, and which
turns the WAIS results into an ASCII result list for use by StartsServer --
communicates via a TCP socket (hard-coded as 5000 in StartsServer) with ...
- The freeWAIS server.
There are a few code fragments that are specific to the platform that StartsServer
is running on. These fragments are all delimited by the comments:
/* !!!!! PLATFORM SPECIFIC CODE !!!!! */
/* !!!!! END PLATFORM SPECIFIC CODE !!!!! */
The NT specific code is preceded by the comment:
/* !!!!! NT VERSION !!!!! */
The Solaris specific code is preceded by the comment:
/* !!!!! SOLARIS VERSION !!!!! */
These code fragments are located in the following files:
- wais/WAISSourceDescription.java (2 fragments)
- wais/WAISSearchSend.java
- resource/CSTRDocument.java
- resource/LINUXDocument.java
- DummyServer.java (NT version only -- see below)
You should go through these files and uncomment the code for the appropriate platform
and comment out the code for the other platform.
In the main StartsServer source directory, you will find a Java source file
called DummyServer.java. This is a simple server that listens on port 6790. This
corresponds to the port opened in the NT specific code in wais/WAISSourceDescription.java.
This dummy server accepts the ASCII WAIS search strings over the socket and then uses a
native call to talk to the waissearch code.
STARTS Release 1.1 : Extended
Attribute Set Support, with Dublin Core demonstration
STARTS 1.0.
Send questions to help@ncstrl.org