STARTS
Stanford Protocol Proposal for Internet Search and
Retrieval
Reference Implementation
Implementation Overview
The STARTS reference implementation is composed of five major parts:
- A short Perl CGI script that interfaces between a WWW server and remainder of the
reference implementation.
- The core of the reference implementation written in Java, which will be referred to as StartsServer.
This Java code operates as a stand-alone server that accepts four service requests that
correspond to the services provided by STARTS:
- QUERY - Takes a SOIF input of type SQuery and returns a SOIF output of
type SQResults and zero or more SOIF outputs of type SQDocument (depending
on the number of "hits" in the result set.
- SOURCEMETA - Returns a SOIF output of type SMetaAttributes containing
metadata for the respective source.
- SOURCECONTENT - Returns a SOIF output of type SContentSummary containing
data about the contents of the respective source.
- RESOURCEMETA - Returns a SOIF output of type Sresource containing metadata
for the respective resource.
- A modified version of the freeWAIS waissearch utility with which the Java STARTS
server communicates as a native method. The modifications to waissearch are of two
types:
- Rather than acting as a stand-alone program, it is a function that takes an ASCII wais
query and returns an array of strings, each of which is a "hit" for the query.
- Argument and return types are conversions of Java types as required for Java native
methods.
- The data returned for a search "hit" has been modified to include the pathname
of the respective document (this allows mapping from the "hit" to the actual
document so that data such as author, title, etc. can be extracted by StartsServer
for the STARTS query return.
- The unmodified freeWAIS-sf
search engine that runs as a stand-alone server.
- Two sets of document sources.
- In summary, the control flow of the reference implementation is:
- The WWW server receives a request. For all but the QUERY service this is simply a
GET on a URL that is mapped by the WWW server to the Perl CGI script. For the QUERY
service this is a POST request, where the input of the POST is the Squery
SOIF that specifies the query. The WWW server maps this POST request to the Perl
CGI script.
- The Perl CGI script turns the WWW request into a STARTS service request, which is one of
the four defined above, and sends this request via a socket to the Java StartsServer.
- The Java StartsServer receives the request via a socket and process it. For all
but the QUERY service this processing is done internal to the StartsServer,
with data drawn - when necessary - from the freeWAIS indexing files (dictionary, inverted
index, etc.). For the QUERY service, the StartsServer makes a native method
call to the modified waissearch utility, which sends the query (translated by StartsServer)
to the freeWAIS engine.
- The freeWAIS engine processes the query and
returns the query "hits" to the modified waissearch utility.
- waissearch returns the query "hits" as a string array to StartsServer.
- StartsServer processes the hits (e.g., extracting required information from the
documents) and writes the constructed SOIF(s) to the WWW socket.
Send questions to help@ncstrl.org