Document Version 2000-05-19 08:19:57 -0400

the Open Archives home page

the Santa Fe Convention : The Open Archives Dienst Subset

Jim Davis (jrd3@alum.mit.edu)
David Fielding (fielding@cs.cornell.edu)
Carl Lagoze (lagoze@cs.cornell.edu)
Richard Marisa (rjm2@cornell.edu)

Introduction
Protocol Features
    Unique Identifiers
    Partitions
    Verbs and Versions
    HTTP embedding of Dienst requests
    Dates
Protocol Messages
    Disseminate
    List-Contents
    List-Meta-Formats
    List-Partitions
    Structure

0. Changes

Any changes to this specification after February 15, 2000 will be noted here.

1. Introduction

This document describes the portion of the Dienst protocol that is used for basic interoperability within archives in the Open Archives initiative, as recommended in its Santa Fe Convention.  The goal of the Open Archives initiative is to provide the mechanisms for interoperability among distributed e-print archives.  The protocol described in this document allows harvesting of metadata for uniquely identified records in an archive.  The word document is purposely avoided and the notion of a record is purposely imprecise. Some archives may just provide access to metadata, others may also provide access to metadata and full content in some form, others may provide other services associated with the metadata and content such as access to the full content in various manifestations (formats) or structural decompositions (e.g., individual pages, chapters, and the like).

The protocol described in this document is a subset of the full Dienst protocol, which provides for communications with services in a distributed digital library. When this subset of Dienst needs to be differentiated from the full Dienst protocol, it will be referred to as the Open Archives Dienst Subset for the remainder of this document. Readers will notice the use of the word Repository in the Dienst protocol requests.  This follows from the use of the term Repository in the broader Dienst system in lieu  of the term Archive

2. Protocol Features

2.1 Unique Identifiers

All archives participating in the Open Archives Initiative have a unique archive identifier.  This identifier is restricted to alphanumeric characters.  Registration of this identifier is part of the Open Archives registration process for data providers, described in Step 6 of the Santa Fe Convention of the Open Archives Initiative.  All records in an archive have a unique record identifier - unique within the scope of that archive.  These two identifiers - the unique archive identifier and the unique record identifier - can then be concatenated (separated by any printable non-alphanumeric character) to form a unique full identifier (referred to as a fullID in the protocol documentation).  For example, the unique archive identifier handlecorp can be combined with the unique record identifier 11223 and separated with the / (slash) character to form the full identifier handlecorp/11223.   The full identifier is then used and returned by Dienst requests.  

2.2 Partitions

The Dienst protocol defines the notion of a partition within an archive. A partition is an administrator-defined subset of the archive.  Each partition has a (one token) name and a (possibly) longer description.  Depending on the policy of an archive an individual record may exist in one or more partitions.  Note that there is, in general, no way to predict the partition in which a record appears from its full identifier, or even given full knowledge of the record.

An archive may have one or more partition hierarchies.  For example, an administrator may decide to partition an archive into two hierarchies, one based on institutional affiliation and one based on subjects as follows: 

The partition hierarchies in an archive are available via the List-Partitions  request .

2.2.1 Partition specifications

The List-Contents verb includes, as an argument, a partition specification. Partition specifications are expressed in the following grammar where partitionname is the short one token name for the partition:

partitionspec := partitionlist
partitionlist := partitionsel | partitionsel;partitionlist
partitionsel  := partitionname
partitionname := [A-Za-z0-9-_]+

Example:

Institutions;Florida;Frenetics

Where Florida is the short name for the partition Valley View University of Florida and Frenetics is the short name for the partition Department of Frenetics. 

2.3 Verbs and Versions

Individual Dienst protocol requests are called Verbs. There may be more than one version of a verb, with each version differing in syntax or semantics. A version takes the form of two integers, separated by a period. This version applies to the individual verb, not the protocol as a whole. (The protocol as a whole does not have a version number.  The date on the protocol document indicates the set of verbs that are defined as of that date.)  Including a version number in the message allows for backward-compatible extension to the Dienst system. 

An archive might  support verbs in various versions. An archive receiving a message with an older version number must either reply using the old syntax and semantics, or reply with an error. If an archive receives a message with a newer version number, then it must return an error.

Software supporting the Open Archives Dienst Subset may or may not be versioned.  If a software version number exists, that number is independent of the Dienst protocol verbs and versions of those verbs that the software supports.  

2.4 HTTP embedding of Dienst requests

Dienst protocol requests are expressed as URLs embedded in  HTTP requests.  A typical implementation uses a standard Web server, such as Apache, that is configured to dispatch Dienst URLs  to the software implementing these requests. The remainder of this section describes the aspects of the protocol that are specific to the HTTP embedding.

2.4.1 Message format

All messages are encoded into URLs where the path portion of the URL consists of the following tokens, in the following order:

Dienst
This token appears literally in the URL.
Service Name
The name of the service which is to handle the message.  The only service implemented in Open Archives Dienst Subset is Repository.
Version
The version of the verb being invoked. 
Verb
This is the name of the message, e.g. List-Contents. A verb is unique within a Service.
Fixed arguments
Each verb has a certain number of fixed arguments, which must always be supplied, and must appear in the order cited.
Keyword arguments
Keyword arguments take the form key=value. If there is more than one keyword argument, they are separated by an ampersand. Arguments may appear in any order.  Unless specified, keyword arguments are always optional.

The separator between tokens in the path is the slash, except that the separator before the keyword arguments is a question mark.

Example

If the Repository service implemented the Shred verb, and if version 1.2 of that verb accepted two keyword arguments (delay and volume), then an example request is:

/Dienst/Repository/1.2/Shred?delay=9&volume=7.4.

The full URL for this request at a particular Web server might be:

http://bar.com/Dienst/Repository/1.2/Shred?delay=9&volume=7.4.

2.4.2 Special characters

The syntax rules for URIs  restrict a few characters to special roles in certain contexts and require that if these characters are used in any other way that they be written as an escape sequence; a percent sign followed by the character code in hexadecimal. The reserved characters are.

    

Character Role Escape Sequence

/

Path Component Separator

%2F

?

Query Component Separator

%3F

#

Fragment Identifier

%23

=

Name/Value Separator

%3D

&

Argument Separator in Query Component

%26

: Host Port Separator %3A
; Authority Namespace Separator %3B

Finally, the space character may not appear anyplace in a URL. It must be written with a "+" (or with the percent sign escape sequence %20.)  

As a result, use of these characters must be escaped within a Dienst protocol request if their use does not correspond to their established URI role.  Note that in the examples used throughout this document, special character escaping is shown.  

2.4.3 Message Responses

Responses to messages are formatted as HTTP responses, with appropriate HTTP header fields. The return type specified for each protocol request in this document will, therefore, correspond to the MIME type included in the HTTP Content-Type header field 

2.4.3.1 MIME Types

The responses to all Open Archives Dienst Subset requests are structured streams with MIME type text/xml.  An appendix to this document lists the DTD (Document Type Definition) for every verb.  All XML responses to Dienst protocol requests have the following uniform features.  

  1. The first tag output is a XML declaration where the version is always 1.0 and the encoding is always UTF-8.
  2. The remaining content is enclosed in a root element that has the same name as the verb of the respective request.  The element has a single attribute named version, which has a value that is the version of the verb of the respective request.  For example, a Disseminate verb with version 2.0 will produce text/xml content with an wrapped in a tag like
        <Disseminate version="2.0">

2.4.3.2 Status Codes

Status codes and error returns correspond to those defined for HTTP (refer to that protocol documentation). A normal response from a Dienst message in HTTP is signaled with the 200 reply code. Error returns are signaled with the appropriate 4xx code as specified in the HTTP protocol. The use of HTTP error codes is as follows:

For each error return, the HTTP reason-phrase returned with the code should provide additional information useful to a human reader.

2.5 Dates

All dates in the protocol (requests and responses) are encoded using the "Complete date" variant of ISO8601.  This format is CCYY-MM-DD where CC is the century, YY is the year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of month and whether it is a leap year.

3. Protocol Messages

This section lists the messages (verbs) implemented by the Open Archives Dienst Subset. Each message has a Name (which is used for purposes of discussion), a Verb (a unique name for the message, used in the protocol to name the message), a Version, a list of Fixed arguments, a list of Keyword arguments, a Return MIME type and return status codes. The documentation for every message includes an example request and response (where appropriate) and the meaning of HTTP error codes that may be returned. These examples uniformly use the full identifier handlecorp/970101.  

To make reading of this document easier, the DTDs for responses to verbs that return text/xml are separated from the main body of the document into an appendix.

Disseminate Metadata for a Record

Verb: Disseminate
Version: 1.0
Fixed args: fullID, meta-format, content-type
Keyword args: none
Return MIME type: text/xml
Return Status Codes: 200, 400, 404

Request the metadata in a specific format from a record.  

In addition to the fullID , the required fixed arguments are:

Example Request:

Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml

Example Response:

<?xml version="1.0" encoding="UTF-8"?>
  <Disseminate version="1.0">
    <oams:oams xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm">
      <oams:title>A protocol for Interoperable Archives</oams:title>
      <oams:accession date="1994-06-24" />
      <oams:fullId>ncstrl.cornell/TR94-1418</oams:fullId>
      <oams:author>
        <oams:name>
James R. Davis</oams:name>
        <oams:organization>
Xerox</oams:organization>
      </oams:author>

      <oams:author>
        <oams:name>Carl Lagoze</oams:name>
        <oams:organization>Cornell</oams:organization>
      </oams:author>
    </oams:oams>
  </Disseminate>

List Contents

Verb: List-Contents
Version: 4.0
Fixed args: none
Keyword args: partitionspec, file-after, meta-format
Return MIME type: text/xml
Return Status Codes: 200, 400

Return a structured list of the full identifiers for records stored in this archive.  Without any arguments the list includes all stored records. 

The meaning of the keyword arguments is as follows:

Example Request:

List the full identifiers of records added or modified after January 15, 1998 in the high energy (hep) partition within the physics partition.

/Dienst/Repository/4.0/List-Contents
       ?partitionspec=physics;hep&file-after=1998-01-15

Example Response:

<?xml version="1.0" encoding="UTF-8"?>
  <List-Contents version="4.0">
    <record>arXiv:hep-th/9801001</record>
    <record>arXiv:hep-th/9801002</record>
  </List-Contents>

Example Request:

List the Open Archive Metadata Set format along with the full identifiers

/Dienst/Repository/4.0/List-Contents
        ?partitionspec=physics;hep&meta-format=oams&file-after=1998-01-15

Example Response:

Note that every record includes an oams metadata record.  If another meta-format were requested (e.g., rfc1807) there might be instances where an empty metadata record was returned (with no data between the metadata format tags) indicating that there is no metadata in that format for the record.

<?xml version="1.0" encoding="UTF-8"?>
  <List-Contents version="4.0">
    <record>
      ncstrl.cornell/TR94-1418
      <oams:oams xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm">
        <oams:title>A protocol for Interoperable Archives</oams:title>
        <oams:accession date="1994-06-24" />
        <oams:fulId>ncstrl.cornell/TR94-1418<oams:fullId>
        <oams:author>
          <oams:name>James R. Davis</oams:name>
          <oams:organization>Xerox</oams:organization>
        </oams:author>
        <oams:author>
          <oams:name>Carl Lagoze</oams:name>
          <oams:organization>Cornell</oams:organization>
        </oams:author>
      </oams:oams>
    </record>
    <record>
      hdl://cnri.dlib/june96-varian
      <oams:oams xmlns:oams="http://www.openarchives.org/sfc/sfc_oams.htm">
        <oams:title>Pricing Electronic Journals</oams:title>
        <oams:accession date="1996-06-24" />
        <oams:fullId>hdl://cnri.dlib/june96-varian<oams:fullId>
        <oams:author>
          <oams:name>Hal R. Varian</oams:name>
          <oams:organization>UC Berkeley</oams:organization>
        </oams:author>
      </oams:oams>
    </record>
  </List-Contents>

Get Metadata Formats

Verb: List-Meta-Formats
Version: 1.0
Fixed args: none
Keyword args: none
Return MIME type: text/xml
Return Status Codes: 200, 400

Returns the metadata formats that are supported by this archive.  Note that the fact that a metadata format is supported does not mean that it is available for all records in that archive.  For each metadata format, the following information is returned:

Example Request:

/Dienst/Repository/1.0/List-Meta-Formats

Example Response:

<?xml version="1.0" encoding="UTF-8"?>
  <List-Meta-Formats version="1.0">
    <meta-format name="rfc1807"
       namespace="http://info.internet.isi.edu/in-notes/rfc/files/rfc1807.txt" />
    <meta-format name="dc" 
       namespace="http://purl.org/dc" />
    <meta-format name="oams"
       namespace="http://www.openarchives.org/sfc/sfc_oams.htm">
  </List-Meta-Formats>

  

List Partitions

Verb: List-Partitions
Version: 2.0
Fixed args: none
Keyword args: none
Return MIME type: text/xml
Return Status Codes: 200, 400

Return a structured list of the administrator-defined partitions for this archive. The list contains the hierarchy of partitions and sub-partitions. For each partition, both the short name and long description is returned. Depending on the policy for a particular archive, a record may be a member of more than one partition.

Example Request:

/Dienst/Repository/2.0/List-Partitions

Example Response:

The following response indicates a partition hierarchy with two top level partitions - Oceanside and ValleyView - each with partitions hierarchies within them.

<?xml version="1.0" encoding="UTF-8"?>
  <List-Partitions version="2.0">
    <partition name="Oceanside">
      <display>Oceanside University of Nebraska</display>
      <partition name="CompEnt">
        <display>Department of Computational Entomology</display>
      </partition>
      <partition name="MetPhen">
        <display>Department of Metaphysical Phenomenology</display>
      </partition>
    </partition>
    <partition name="ValleyView">
      <display>Valley View University of Florida</display>
      <partition name="Fren">
        <display>Department of Frenetics</display>
      </partition>
      <partition name="Hist">
        <display>Department of Histrionics</display>
      </partition>
    </partition>
  </List-Partitions>

List Metadata Formats available for a Record

Verb: Structure
Version: 2.0
Fixed args: fullID
Keyword args: view
Return MIME type: text/xml
Return Status Codes: 200, 400, 404

This verb returns a structured response that describes the metadata formats available for a record. A client may use this information as the basis for metadata requests using the Disseminate verb. 

There is one required keyword argument that can only take one value (the same verb in the full Dienst protocol has more keyword arguments that take more values):

Example Request:

/Dienst/Repository/2.0/Structure/handlecorp/970101?view=%23

Example Response:

<?xml version="1.0" encoding="UTF-8"?>
  <Structure version="2.0">
    <meta-formats>
      <rfc1807 />
      <dc />
    </meta-formats>
  </Structure>

This response says that the record can disseminate two metadata formats  rfc1807 and dc (Dublin Core).  

Appendix - DTDs for Messages