Dienst protocol version 4.1

DRAFT

Introduction

This document describes version 4 of the Dienst protocol, which provides an open, distributed digital library. The word "Dienst" is used in various places to refer to a conceptual architecture for digital libraries, a protocol for communication in that architecture, and a software system implementing that protocol.

The Dienst system (in all three senses) is the foundation for NCSTRL, the Networked Computer Science Technical Report Library, but the protocol in no way depends on or is specialized for this particular collection. You could just as well use the Dienst system for a distributed library of reports about highway construction or insect communication.

This version of the protocol is incompatible with the previous Dienst 3.5 protocol, yet it is interoperable with it, as explained below.

Dienst architecture Overview

In the Dienst architecture there are four classes of services. A Repository Service stores digital documents, each of which has a unique name and may exist in several different formats. An Index Service server searches a collection and returns a list of documents that match the search. A single, centralized Meta Service (also called a Contact Service) provides a directory of locations of all other services. Finally, a User Interface service mediates human access to this library. All these services communicate via the Dienst protocol.

A group of sites sharing the Dienst protocol form a single distributed collection. Each site will typically run repository, index, and UI services for documents issued by that site. One of the sites will run a Meta service, thus defining the set of sites that make up the collection. The NCSTRL collection is currently the most significant collection using Dienst protocol, and the only one know to the author to be available on the Internet.

handles

A handle is string which uniquely specifies a document. Unlike a URL, a handle is location independent.

A handle has two parts, a naming authority and a string. A naming authority is an entity that is authorized to create new handles and store them in the handle system. Naming authorities are hierarchically organized, with periods used as the separator. For example, handlecorp, handlecorp.research, handlecorp.sales, and handlecorp.sales.hardware. A handle is written with these two parts separated by a slash, for example handlecorp.research/doc1. The character set for handles used in Dienst is restricted to alphanumeric characters, underscore, period, and hyphen (except for the slash separator). (Note that this restriction is specific to Dienst handles and not handles in general). Case is not significant in handles.

Handles are similar in intention to WWW URNs, if you are familiar with them.

formats

The Dienst architecture incorporates a document model that allows a document to be stored in many "formats", e.g. PostScript or TIFF. Formats are named with reserved keywords, which are listed below. Each protocol request for a document always specifies a format. Formats describe the intended purpose, rather than the representation of a document, which is described by a MIME type. (Note that the mapping from format in a document request to MIME type of the return is not necessarily one-to-one).

postscript
The entire body of the document, in a form suitable for display. Typically this is sent as application/postscript.
text
plain ASCII text, sent as text/plain
ocr
ASCII text produced by OCR, sent as text/plain
scanned
scanned page image, usually TIFF, at no less than 300 spots per inch.
inline
a page image, suitable for screen display. Usually a GIF, at about 72 dots per inch, four bits per pixel.
structure
A document structure file
html
an HTML document, sent as text/html

There are also some internal formats. They are not documented as part of the protocol.

Dienst HTTP embedding

The Dienst protocol is (currently) embedded in HTTP, which thus imposes some restrictions on the protocol that are specific to HTTP, not to Dienst.

HTTP request methods

All Dienst requests must be expressed with either the GET or HEAD HTTP methods. In general, GET returns full information, and HEAD returns only meta information. Not all Dienst requests support HEAD.

Special characters

The syntax rules for URLs restrict a few characters to special roles. and require that if these characters are used in any other way that they be written as an escape sequence, a percent sign followed by the character code in hexadecimal. The reserved characters are:

/ - separates components in the URL.
? - separates optional arguments from the rest of the URL
# - indicates reference to a named anchor within a document
= - separates name from value in an argument list
& - separates multiple arguments after a ?

Note that the slash character used in handles must be encoded when expressed in a URL. (The encoding is %2F, by the way.)

Finally, the space character may not appear anyplace. It must be written with a "+" (or with a percent sign escape sequence.)

Standard record list header

All Dienst messages that returns lists of results use a common format for the lists. Such lists are always prefaced with a standard header consisting of two lines:

Version: version
Where version is a version number, e.g. "2.0".
Count:N message
Where N is the number of records that follow and message is an optional error message string.

Such records often consist of a set of data elements. It is often (but not always) the case that the separator between tokens is the ASCII FS character (octal 034). The inter-record delimiter depends upon the message. Some messages return fixed size (e.g. four lines) records, others do not.

The MIME type of a record list is always text/plain.

Message format

All messages are encoded into URLs where the path portion of the URL consists of the following tokens, in the following order:

Dienst
This token appears literally in the URL.
Service Name
The name of the service which is to handle the message, one of Repository, Index, UI, or Meta. A special service name Info is used in requests that return general information about the server.
Version
The version takes the form of two integers, separated by a period. This version reflects the version of the particular message, not the Dienst system as a whole. Including a version number in the message allows for backward-compatible extension to the Dienst system. A server receiving a message with an older version number must either reply using the old syntax and semantics, or reply with an error. If a server receives a message with a newer version number, then if its reply includes a version (e.g., is a standard record list) then it may, but need not, reply with its current version. Otherwise, it must return an error.
Verb
This is the name of the message, e.g. "SearchBoolean". A verb is unique within a Service.
Fixed arguments
Each verb has a certain number of fixed arguments, which must always be supplied, and must appear in the order cited.
Optional arguments
There are two different forms for optional arguments in this protocol: keyword and positional:

The keyword syntax is used unless explicitly otherwise noted.

The separator between tokens in the path is the slash, except that the separator before the optional arguments is a question mark.

Example

If the Repository service implemented the Shred verb, and if version 1.2 of that verb accepted two optional arguments (delay and amperage), then an example path is /Dienst/Repository/1.2/Shred?time=259&amperage=7.4.

Error returns

A normal response from a Dienst protocol message is signalled with the 200 or 302 (Redirect) reply codes. Error returns are signalled with a 4xx or 5xx response.

Compatibility with Dienst 3.5

The messages of the Dienst 4.0 protocol are completely distinct from the messages of the Dienst 3.5 protocol, even though many of them have counterparts in the older protocol. There are three reasons for this incompatibility.

  1. The conceptual model has changed. Where Dienst 3.5 used a docid as the location independent document identifier, 4.0 uses handles.
  2. The new protocol removes limitations to expansions from the old protocol, in particular by adding explicit versions to all messages, which should allow for future changes to the protocol to always be backward compatible.
  3. The new protocol is nevertheless interoperable with the old because the two are totally disjoint. It is always possible to distinguish a 4.0 messages from a 3.5 message. Thus it is possible for a server to support both old as well as new messages. New servers can send old messages to old servers.

The Messages

For each class of service we list the messages it implements. The classes of service are Repository, Index, Meta, User Interface (UI), LibMgt, Registry and Info. Each message has a Name (which is used for purposes of discussion), a Verb (a unique name for the message), a Version, a list of Fixed arguments and a list of Optional arguments. For every message we include an example. In these examplees, we uniformly use handlecorp.sales%2Fdoc1 (note the encoding of the slash character) as the handle (where appropriate).

Repository Service

The repository allows a given document to be stored in many different formats, and provides messages to obtain the document or pieces of the document in any of the stored formats.

List Contents

Verb: List-Contents
Version: 2.0
Fixed args: none
Optional args: none

A list of the handles for documents stored in this repository. More precisely, those handles for which there is at least one format stored in this repository. Each record is exactly one line.

Example:
Dienst/Repository/2.0/List-Contents

Get Document Body

Verb: Body
Version: 2.0
Fixed args: handle, format
Optional args: none

Return the body of the document, in the selected format. The MIME type of the returned document varies according to the format specified.

Example:
Dienst/Repository/2.0/Body/handlecorp.sales%2Fdoc1/postscript

Get Page

Verb: Page
Version: 2.0
Fixed args: handle, format, page number
Optional args: none

Return a single page, where the document is available in discrete pages, in the selected format. Reasonable values for format are scanned or inline. The MIME type of the returned document depends on the format specified in the argument list.

Example:
Dienst/Repository/2.0/Page/handlecorp.sales%2Fdoc1/inline/1

Get Page Count

Verb: NPages
Version: 2.0
Fixed args: handle, format
Optional args: none

Return the number of pages for this document, when it is available in discrete pages. Pages are numbered from 1.

Example:
Dienst/Repository/2.0/NPages/handlecorp.sales%2Fdoc1/inline

List Formats

Verb: Formats
Version: 2.0
Fixed args: handle
Optional args: none

Returns a record list, where each record takes the form

format size content-type

where format is the format, size is in bytes, if it can be determined, or * if unknown. (Servers are under no obligation to take the trouble to measure the size of files.) There is no guarantee that, if the data is retrieved in this form, that this is the number of bytes that will actually be transmitted, as it is possible that the file might be stored compressed, but be transmitted uncompressed, or vice versa. content-type is the MIME content type that would be used to transmit the format.

Example:
Dienst/Repository/2.0/Formats/handlecorp.sales%2Fdoc1

Index Service

The index service searches a set of descriptions of documents and return handles for those that match. Document descriptions (bibliographic information) are stored in the RFC 1807 format.

Get Bibliographic Records

Verb: List-Contents
Version: 2.0
Fixed args:none
Optional args: file-after

Returns a record list of bibliographic information, in RFC-1807 format, for documents on the service. The optional argument file-after limits the list to those for which the bibliographic file was added or modified since time, a universal time expressed in RFC 1036 format. Note that this is distinct from any dates encoded internal to the bibliographic record, e.g. the date the document itself was written. The MIME type of the returned document is /text/plain.

Examples:
/Dienst/Index/2.0/List-Contents
/Dienst/Index/2.0/List-Contents?file-after=1+Aug+95

Get bibliographic information for a document

Verb: Bibliography
Version: 2.0
Fixed args: handle
Optional args: none

Returns the bibliographic information, in RFC-1807 format, for the document specified by the handle. The MIME type of the returned document is /text/plain.

Example:
/Dienst/Index/2.0/Bibliography/handlecorp.sales%2Fdoc1

Search Boolean

Verb: SearchBoolean
Version: 2.0
Fixed args: none
Optional args: see below

Searches the collection. Optional arguments are a set of keywords and values specifying the search criteria. Returns a record list where each record begins with a blank line, then has handle, title, author, date each on a separate line.

allowable keywords
title
words from the title.
author
author's last or first name.
abstract
words from the abstract.
boolean
The connective between the above operators, either and (the default) or or.

Two additional keywords may be used:

additional keywords
authority
the naming authority. Defaults to "any". This argument may be repeated.
name
The name of the document (from the handle), e.g. "TR95-259". Note that this keyword was called number in the Dienst 3.5 protocol.
Rules for bibliographic keyword matching

Words in the three bibliographic keyword fields (author, title, abstract) are matched to bibliographic entries according to the following rules:

Examples:

/Dienst/Index/2.0/SearchBoolean?author=davis+or+fox
/Dienst/Index/2.0/SearchBoolean?author=donald&title=robot
/Dienst/Index/2.0/SearchBoolean?author=donald&title=robot&boolean=or

Meta Service

Get Publishers

Verb: Publishers
Version: 2.0
Fixed args: none
Optional args: none

Returns a record list of the publishers in the collection. Each record is a single line, and consists of three tokens

symbolic name
The "publisher" as used in Dienst 3.5 protocol, e.g. CORNELLCS.
pretty name
A string suitable for display to people, e.g. "Cornell University Department of Computer Science"
handle naming authority
e.g. ncstrl.cornell

The token separator is the ASCII FS character (octal 034). The MIME type of the returned document is text/plain.

Example:
Dienst/Meta/2.0/Publishers

Get Index Servers

Verb: Indices
Version: 2.0
Fixed args: none
Optional args: none

Returns a record list of the Index services. Each record consists of five fields separated by the ASCII FS character (octal 034):

host
 
port
 
protocol
The protocol running at the server, either 3 or 4.
authorities
List of naming authorities separated by colon
priority
an integer. Low numbers are higher priority.

The MIME type of the returned document is text/plain.

Example:
Dienst/Meta/2.0/Indices

Get Repositories

Verb: Repositories
Version: 2.0
Fixed args: none
Optional args: none

Returns a record list of the Repository services. Each record consists of four fields, separated by the ASCII FS character (octal 034):

host
 
port
 
protocol
The protocol running at the server, either 3 or 4.
authorities
List of naming authorities separated by colon

The MIME type of the returned document is text/plain.

Example:
Dienst/Meta/2.0/Repositories

Get Lite Sites

Verb: Lite
Version: 2.0
Fixed args: none
Optional args: none

Note This protocol request is subject to change without notice. Returns a record list of the Lite sites. Each record consists of four fields, separated by the ASCII FS character (octal 034):

publisher symbol
(e.g. AUBURN_ENG)
pretty name
 
handle authority name
 
refer.bibs
URL of the refer.bibs file for this site.

The MIME type of the returned document is text/plain.

Example:
Dienst/Meta/2.0/Lite

UI Service

The UI service presents information from the other services in a human readable form. This section lists the minimal UI Service messages supported as external interfaces to Deinst. Each Dienst UI service is free to implement any additional user interface messages that the local site finds helpful.

Show Search Form

Verb: Search
Version: 2.0
Fixed args: none
Optional args:

Return a HTML form allowing fielded search of the collection. The type of the returned document is text/html.

Example:
Dienst/UI/2.0/Search

Simple query

Verb: QueryNF
Version: 2.0
Fixed args: none
Optional args: search keywords (see below)

Search the distributed collection based on the keywords supplied as arguments. The format of the argument is keyword=words, where words is a "+" separated list of words. Search hits are those documents in the distributed collection that have any of the words in their title, author, or abstract bibliographic fields. The hits are returned as hyperlinks to the Describe (see below) verb for the document. The type of the returned document is text/html.

Example:
Dienst/UI/2.0/QueryNF?keywords=robot+vision

Describe report

Verb: Describe
Version: 2.0
Fixed args: handle
Optional args:

Return a HTML page (for human readers) summarizing information about the document. The precise contents is left undefined. Additional optional arguments are allowed but not defined. The MIME type of the returned document is text/html.

Example:
Dienst/UI/2.0/Describe/handlecorp.sales%2fdoc1

Browse collection by year

Verb: BrowseYears
Version: 2.0
Fixed args: none
Optional args: none

Return a document with hyperlinks that permit the user select a span of years for the ListYears verb (see below). The MIME type of the returned document is text/html.

Example:
Dienst/UI/2.0/BrowseYears

Browse documents in a span of years

Verb: ListYears
Version: 2.0
Fixed args: span of years (see below)
Optional args: none

Return a document with hyperlinks to the Describe verb (see below) for documents in the collection published within the specified year span. The method of determining this is site specific. The argument is a string consisting of two hyphen separated years (e.g. 1975-1985). The MIME type of the returned document is text/html.

Example:
Dienst/UI/2.0/ListYears/1968-1978

Browse collection by author

Verb: BrowseAuthors
Version: 2.0
Fixed args: none
Optional args: none

Return a document with hyperlinks that permit the user select a span of letters for the ListAuthors verb (see below). The MIME type of the returned document is text/html.

Example:
Dienst/UI/2.0/BrowseAuthors

Browse documents by authors in a alphabetic range

Verb: ListAuthors
Version: 2.0
Fixed args: span of letters (see below)
Optional args: none

Return a document with hyperlinks to the Describe verb (see below) for documents in the collection authored by people whose last names are in the range specified by the argument. The argument is a string consisting of one letter or two hyphen separated letters specifying a range. The MIME type of the returned document is text/html.

Example:
Dienst/UI/2.0/ListAuthors/A-C

Add User to Registry

Verb: RegistryAdd
Version: 1.0
Fixed args: none
Optional args: see below

Optional arguments necessary for anything useful to happen are name, email, password and verifypwd. Other optional fields that may be specified are: institution, address, and phone.

If the optional arguments pass muster, the user is added to the registry and an html table is returned, showing the new registry record's fields and their values. If there was a problem, an error message is returned on an html page. The MIME type of the returned document is text/html.

For more information, see the Add verb in the Registry service below.

Example:
Dienst/UI/1.0/RegistryAdd?name=Carl+Snarl&email=carl@cornell.edu&password=blah&verifypwd=blah

Describe Registry Entry

Verb: RegistryDescribe
Version: 1.0
Fixed args: none
Optional args: userid, password

Returns an html table showing the registry record's fields and their current values. The MIME type of the returned document is text/html.

Example:
Dienst/UI/1.0/RegistryDescribe?userid=carl&password=blah

Modify Registry Entry

Verb: RegistryModify
Version: 1.0
Fixed args: none
Optional args: see below

Returns an html table showing the indicated registry record's fields and their current (freshly updated) values or, if there was a problem, returns an error message on an html page. The MIME type of the returned document is text/html.

Optional fields necessary for anything useful to happen are userid and password. Other optional fields that may be specified are: name, email, institution, address, phone, newpwd and verifypwd. If a field value is specified as "none", then that field will be deleted. Note that name and email fields cannot be deleted.

For more information, see the Modify verb in the Registry service below.

Example:
Dienst/UI/1.0/RegistryModify?userid=carl&password=blah&institution=CornellLib&phone=911

Library Management Services

These are implemented by the LibMgt service, which is a new addition to Dienst.

Submit

Verb: Submit Version: 2.0
Fixed args: none
Optional args: varies

Accepts a document to be conditionally added to the collection. The precise nature of the optional args is site dependent, but typically would include the title of the document, name of author(s), and location of a PostScript file for the document. Typically this verb will run additional verification tests before actually adding the document, and these tests might include asking a human to check the submission.

 

Registry Service

The registry service provides a persistent database of dienst users and is required by the subscription service. It will also be used by the reviewer service and remote document submissions, when they are added to dienst. Given a userid and password, it authenticates users, lists registry records, and processes additions, modifications and deletions to the registry database.

Add User

Verb: Add
Version: 1.0
Fixed args: password, name, email
Optional args: see below

Given a password, name and email the user is added to the registry service if s/he isn't already in it. Returns a record list of one record, where each line is a field added to the registry database. The dienst userid is the first field listed.

The userid may be specified by the user as an optional parameter. If the specified userid is not unique in the Registry database, it will be made so by appending the digit one on the end, or adding one to the digit on the end until it is a unique id.

If no userid is supplied by the user, the service tries to use the portion of the supplied email address before the "@". If that is unique, the same process as mentioned above is followed. For example, if "carl" and "carl1" are already in the database, and the userid is carl@cornell.edu, then first the system will try "carl", then "carl1" then will settle on "carl2" as a unique id.

optional fields
userid
if this is provided, the user is hoping it is a unique userid. If the provided userid is not unique, it will be changed to be unique. If it is not provided at all, dienst will generate a userid.
institution
the user's institution, such as "Cornell University Computer Science Department".
address
the user's address (snail mail).
phone
the user's phone number.

Example:
Dienst/Registry/1.0/Add/blah/blah?userid=carl&e-mail=carl@cornell.edu&name=Carl+Snarl&institution=CornellCS

Authenticate User

Verb: Authenticate
Version: 1.0
Fixed args: userid, password
Optional args: none

Returns 1 if the user is authenticated (userid is in the Registry database and password matches). Return a negative number otherwise:

-91: incorrect password
-92: userid is not in Registry database
-93: no userid was supplied
-94: no password was supplied
-99: unspecified error

Example:
Dienst/Registry/1.0/Authenticate/carl/blah

List Registry Entry

Verb: List
Version: 1.0
Fixed args: userid, password
Optional args: none

Returns a record list of one record, where each line is a field of the registry database entry.

Example:
Dienst/Registry/1.0/List/carl/blah

Modify Registry Entry

Verb: Modify
Version: 1.0
Fixed args: userid, password
Optional args: see below

Returns 1 if the record was updated successfully. Returns a negative number as an error code otherwise.

Optional fields that may be specified are name, email, institution, address, phone and newpwd. The userid may not be changed, and the init-time and mod-time may not be modified by the user.

newpwd changes the password to the new one specified.

If a field value is specified as "none", then that field will be deleted. Note that name and email fields cannot be deleted.

Error codes:

-91: incorrect password
-92: userid is not in Registry database
-93: no userid was supplied
-94: no password was supplied

-95: no fields to update
-96: tried to delete a name or email field
-97: illegal registry name-value pair

-99: unspecified error

Example:
Dienst/Registry/1.0/Modify/carl/blah/institution=CornellLib&phone=911


Delete Registry Entry

Verb: Delete
Version: 1.0
Fixed args: userid, password
Optional args: none

Returns 1 if the record is successfully deleted, otherwise returns a negative number as an errorcode. Entry will not be deleted unless all subscriptions (and reviews and document submission permissions) can be deleted as well.

Error codes:

-91: incorrect password
-92: userid is not in Registry database
-93: no userid was supplied
-94: no password was supplied

-98: there are subscriptions for this userid so it didn't delete

Example:
Dienst/Registry/1.0/Delete/carl/blah

Info Messages

These messages return general information about the server.

Version

Verb: Version
Version: 2.0
Fixed args: none
Optional args: none

returns the version of the service, e.g. Dienst v4-0-1. The MIME type of the returned document is text/plain.

Example:
Dienst/Info/2.0/Version

Get Log file

Verb: Log
Version: 2.0
Fixed args: none
Optional args: class, start, end

Returns a document containing entries from the log file that are of type class and were written on or after start and before end. If class is omitted, then all classes are returned. If start is omitted then the start time is the first time in the log file. If end is omited then a value of Jan 1, 2059 is used.

The value of class must be one of Admin, Error, Network, Statistics, Transations, or Warning. The MIME type of the returned document is text/plain.

Example:
Dienst/Info/2.0/Log?class=Error&start=1+Aug+95

Log Summary Reports

Verb: Log_Summary
Version: 2.0
Fixed args: none
Optional args: report, start, end

Returns an html formatted report (specified by type report) which summarizes the logs within the period of time beginning with the date specified by start and ending on the date before end. If the value of report is "DISPLAY-FILES", a listing (MIME type text/plain) of the summary files available at this server is returned.

The value of report must be one of DAILY-LOG-SUM, DAILY-TRANS-SEARCH, TRANS-SUM, SERVER-STATS, DOCUMENT-STATS, ORIGINATING_SITE-STATS, SEARCH-CRIT-STATS. The MIME type of the returned document is text/html.

Example:
Dienst/Info/2.0/Log_Summary?report=daily-log-sum&start=3+Mar+96&end=10+Mar+96

List Services

Verb: List-Services
Version: 2.0
Fixed args: none
Optional args: none

Returns a record list. Each record is a single line containing the name of one of the defined Dienst services. The intended purpose of this verb is to assist other systems to interface to Dienst. Example:
Dienst/Info/2.0/List-Services

List Verbs

Verb: List-Verbs
Version: 2.0
Fixed args: service
Optional args: none

Returns a record list. Each record is a single line containing the name of a verb defined by that service. Example:
Dienst/Info/2.0/List-Verbs/Repository

Describe Verb

Verb: Describe-Verb
Version: 2.0
Fixed args: service, verb
Optional args: none

Returns a record list. Each record is a single line containing two elements, version and args, separated by a single space character. The version is a version number, as used in Dienst protocol, and args is a colon-separated list of the names of the fixed arguments, if any, accepted by the verb in that version. Note that a service may implement more than one version of a verb. Example:
Dienst/Info/2.0/Describe-Verb/Repository/Formats

Acknowledgements

This work was supported in part by the Advanced Research Projects Agency under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives (CNRI). Its content does not necessarily reflect the position or the policy of the Government or CNRI, and no official endorsement should be inferred. This work was done at the Design Research Institute, a collaboration of Xerox Corporation and Cornell University, and at the Computer Science Department at Cornell University.

Up to Main Information Menu


Updated February 26, 1998

NCSTRL Documentation
Any comments or questions?
Contact us at help@ncstrl.org.