STARTS
Stanford Protocol Proposal for Internet Search and Retrieval

Reference Implementation


Modification and Customization

Adding New Attribute Sets to STARTS

The STARTS server (Release 1.1 and up) is designed to support multiple attribute sets. The reference implementation currently supports two attribute sets:

Additional attribute sets can be added easily by following these steps:

  1. Create a new attribute set class in the package query:

    This new class should extend the abstract class AttrSet. The new attribute set class should define the static array "AttrSetFields" that lists all of the fields that constitute the new attribute set. The format for this array is: {"<field>", "<boolean value>"}, as in the following example:

    public static String [] [] AttrSetFields = {

    {"TITLE", "true"},

    {"CREATOR", "true"},

    {"SUBJECT", "false"}

    };

    The boolean values "true" and "false" are required-field indicators. Enter "true" if the field must be recognized by a source and "false" if the field may optionally be recognized by a source. Currently, the STARTS implementation does not interrogate this boolean value, so it exists for reference only. The implementation does load this array into a hashtable and perform a simple key lookup to determine whether a field is part of a particular attribute set (see method check() in class Field ).

    Additionally, the new attribute set class should define instance variables that indicate which of these fields are "activated" in a particular source-specific instance of the attribute set class. Attribute-set classes are instantiated in the context of a particular source -- each instance of a source carries around an attribute set object that lists all possible fields for the attribute set, as well as a list of those fields that map to data elements in the source. These "source supported" fields should be defined in the "fieldsTranslation" array in the source classes found in the package resource (see #2 below). This array should also contain each field's WAIS equivalent, and a list of languages supported for each field.

    When creating a new attribute set class, use the existing classes AttrSetBasic1 and AttrSetDcore1 for reference. The variables and methods implemented in these classes will serve as a guide in creating a new attribute set. Also, note that the attribute-set constructor method receives the "fieldsTranslation" array as an argument from the source:

    public AttrSetDcore1(String [][] fieldsTranslation) {

    LoadFieldTable();

    LoadSourceFields(fieldsTranslation);

    LoadFieldXlate(fieldsTranslation);

    }

  2. Modify the "fieldsTranslation" static array in each source class:

    Each source must identify the fields it supports. While an attribute-set class will statically define the fields that are part of that attribute set in general, a source class will statically define only those attribute-set/field combinations that are supported by the source. (See classes CSTRSourceDescription and LINUXSourceDescription.)

    When a new attribute set is defined, existing sources should be examined to determine which fields have the same meaning as the new attribute-set fields. To enable the use of a new attribute set for queries against an existing source, the new attribute-set fields must be mapped to the data elements that exist for the source. Each new attribute-set field should be entered into the "fieldsTranslation" array, along with its WAIS translation and a list of languages supported in the following format:

    {"<attribute set>", "<field>", "<WAIS translation>", "<languages supported>"}

    Example:

    protected static String[][]fieldsTranslation = {

    {"basic-1", "author", "au", ""},

    {"basic-1", "title", "ti", ""},

    {"basic-1", "linkage", "id", ""},

    {"dcore-1", "CREATOR", "au", ""},

    {"dcore-1", "TITLE", "ti", ""},

    {"dcore-1", "IDENTIFIER", "id", ""}

    };

  3. Modify the "modifiersSupported" and "modifiersTranslation" static arrays in the WAISSourceDescription class.

    Since modifiers are designated by attribute set according to STARTS, but are not associated with the attribute set class at the present time, when an attribute set is added, we need to add associated modifiers to the modifier tables.  There are currently two relevant static arrays named "modifiersSupported" and "modifiersTranslation" in class WAISSourceDescription which will need to be modified. 

    "modifiersSupported" contains the attribute-set qualified modifier and "true" or "false" to indicate if the modifier is supported at this field, followed by a list of languages:

    {"<attribute set> <modifier>", "<true or false>", "<list of languages>"}

    Example:

    protected static String[][]modifiersSupported = {

    {"basic-1 <", "true", ""},

    {"basic-1 =", "true", ""},

    {"basic-1 stem", "false", ""},

    {"basic-1 phonetic", "true", ""},

    {"dcore-1 <", "true", ""},

    {"dcore-1 =", "true", ""},

    {"dcore-1 stem", "false", ""},

    {"dcore-1 phonetic", "true", ""}

    };

    "modifiersTranslation" should have an entry for supported modifier (indicated as "true" in the "modfiersSupported" array.  "modifiersTranslation" contains the attribute-set qualified modifier and the WAIS equivalent of the modifier:

    {"<attribute set> <modifier>", "<WAIS equivalent of modifier>"}

    Example:

    protected static String[][]modifiersTranslation = {

    {"basic-1 <", "<"},

    {"basic-1 =", "="},

    {"basic-1 phonetic", "SOUNDEX"},

    {"dcore-1 <", "<"},

    {"dcore-1 =", "="},

    {"dcore-1 phonetic", "SOUNDEX"}

    };

  4. Modify the "WAISFieldSupportedForModifier" static array in each source class.

    Each source must identify the modifiers it supports for each field. As with the "fieldsTranslation" table in number 2. above, the modifiers supported for each field are statically defined at the source. (See classes CSTRSourceDescription and LINUXSourceDescription.)

    When a new attribute set is defined, existing sources should be examined to determine which modifiers should be associated with which WAIS fields.  To enable the use of a new attribute set modifier for queries against an existing source, the new attribute-set qualified modifier must be mapped to the data elements that exist for the source. Each supported attribute-set modifier (see number 3. above) should be entered into the "WAISFieldSupportedForModifier" array, along with the WAIS fields it can modify in the following format:

    {"<attribute set> <modifier>", "<WAIS field names>"}

    Example:

    protected static String[][]WAISFieldSupportedForModifier = {

    {"basic-1 <", "dm"},

    {"basic-1 =", "au ti dm any id bd"},

    {"basic-1 phonetic", "au ti id bd"},

    {"dcore-1 <", "dm"},

    {"dcore-1 =", "au ti dm any id bd"},

    {"dcore-1 phonetic", "au ti id bd"}

    };

  5. Add an entry in the hashtable "attrSetsTable" in each source class:

    Each source class should contain the static hashtable "attrSetsTable" with entries for each attribute set that the source supports. The key to the hashtable is the attribute-set name, and the value is an attribute-set object. A static method initializes the "attrSetTable" hashtable by creating an entry for each attribute set and instantiating a new attribute-set object. The new attribute-set object is instantiated with the source-specific field information in the "fieldsTranslation" array:

    static Hashtable attrSetsTable = new Hashtable();

    static {

    attrSetsTable.put(

    "basic-1",

    new AttrSetBasic1(fieldsTranslation));

    attrSetsTable.put(

    "dcore-1",

    new AttrSetDcore1(fieldsTranslation));

    }

  6. Watch for date field problems in TermToFilter() method in WAISSourceDescription and in Check() method in query/LString

STARTS specifies that date fields must be in ISO-1807 yyyy-mm-dd format.  Our code is a little clunky with this one:  it checks for "date" or "Date" or "DATE" in the fieldname.  So if your attribute set has a date field that doesn't meet this criteria, there are two places in the code you will need to change.

Providing access to other sources

The StartsServer is structured in manner that permits easy addition or modification of the WAIS sources. The reference implementation provides access to two sample sources:

Steps for changing the sources are as follows:

  1. Follow the instructions for freeWAIS-sf for creating and indexing a new set of documents (let's call the wais database for these documents db) using the "fields" type for waisindex. A quick summary of the steps to do this are:
    1. Place the documents themselves within a directory tree - let's call that directory dbdocs.
    2. Create a directory in which the wais indexes should be placed - let's call that directorydbiind.
    3. Create the wais field description file (in this case db.fmt) in the dbind directory. This field must includes fields that semantically match the required STARTS fields, which are title, linkage, and date/time last modified. Refer to the STARTS specification for a complete list of fields as a guide to what you might want to specify in your wais field description file.
    4. Index the sources using the waisindex utility.
  2. Create a new StartsServer class, in the package resource, to represent the new source. This class should extend the class WAISSourceDescription, which is an abstract class sub-classing all varieties of WAIS sources. The source description class does things such as define where the indexing files are, what the mapping from STARTS fields to source fields is, etc. Take a look at the classes CSTRSourceDescription and LINUXSourceDescription for examples of how to create this sub-class.
  3. Create a new StartsServer class, in the package resource, to represent the documents in the new source. This class should extend the class WAISDocuments, which is an abstract class sub-classing all varieties of WAIS documents. The main function of the document class is to extract document information (e.g., title, author) from the document files you have indexed with WAIS. The code to do this is idiosyncratic to each type of document. Take a look at the classes CSTRDocuments and LINUXDocuments for examples of how to create this sub-class.
  4. Modify the class ResourceDescription, in the package STARTSConfiguration, to provide access to the new source(s). Specifically, you should modify the static code that loads the sources for the resource. In the reference implementation this is coded as:
// Load the sources hashtable
static {
   sources.put("cstr", new CSTRSourceDescription());
   sources.put("linux", new LINUXSourceDescription());
}

You should modify this code so that the keys in the hashtable correspond to the names of your sources, and their values the class that is the description of that source.

Using a native search engine other than freeWAIS

Using the reference implementation to support another native search engine (not freeWAIS-sf) is, by nature, a more complicated task. However, StartsServer is structured in a manner that allows this to be done via sub-classing rather than rewriting source and engine independent pieces of the code. All wais-specific code is isolated to the package wais. The two core classes in this package are:

  1. WAISSourceDescription - an abstract sub-class of the abstract class resource.SourceDescription. This class describes generic attributes and methods of a wais source.
  2. WAISResultDocument - an abstract sub-class of the abstract class results.Document.html. This class describes generic attribute and methods of a wais document that is part of a query results set.

You will need to create two such sub-classes for the engine to which you wish to provide access. You can then add new sources for this engine, in a manner similar to that described above.


Send questions to help@ncstrl.org