Dienst - An Architecture for Distributed Document Libraries

Carl Lagoze - Cornell University

James R. Davis - Xerox Corporation

As one of the five universities participating in the ARPA-sponsored Computer Science Technical Report project, we at Cornell have developed a digital library architecture called Dienst. Dienst is a protocol and implementation that provides Internet access to a distributed, decentralized multi-format document collection. The collection is managed by a set of interoperating Dienst servers distributed over the Internet. These servers provide three digital library services: repositories of multi-format documents; indexes into the document collection and search engines for these indexes; and user interfaces for browsing, searching, and accessing the collection.

Dienst models the distributed digital library as a flat set of documents, each of which has a unique location-independent identifier, exists in multiple formats (e.g., TIFF, GIF, Postscript, HTML), and consists of a set of named parts. These parts may be physical such as pages, or logical such as chapters, tables, etc.

The architecture provides a number of helpful abstractions for the Dienst user. First, all elements of the collection are uniformly searchable and accessible without regard to their actual location. Second, multiple representations of a document are logically linked. Finally, documents are structured objects that can be viewed in part or as a whole. Using publically available WWW clients, users may search the document collection, browse "thumbnail" images of documents, read individual documents in any of their available formats, and download or print a document.

A distinguishing feature of Dienst is that indexes are distributed and searches are processed in parallel across each index site. The current Dienst implementation provides two types of searching - bibliographic and full-text. Users may search for documents by number, title, author, abstract keywords, or other bibliographic information using an HTML forms interface. A user may search the full-text of documents through two interfaces - by directly entering the text to be searched or a "click-to-search", where the user selects a paragraph from a document as the basis of the search. The Dienst protocol can be extended to include other search types and engines in the future.

The Dienst software also provides site administrators with tools for managing their collections. These include, among others, automated document submission procedures, indexing tools, database integrity checkers, and format conversion tools.

Dienst servers are accessed through gateways from any World Wide Web (WWW) server that supports the Common Gateway Interface. Dienst protocol requests are packaged within HTTP, the WWW protocol. In this manner, Dienst exploits all the current features of the WWW - widely available multi-architecture clients, MIME typing of documents, support for embedded images, and the like - and will be able to leverage future developments in areas such as user authentication and support for new graphics standards.

Dienst servers are currently running at ten sites, providing common access to several thousand CS technical reports. The Cornell server is available at http://cs-tr.cs.cornell.edu. We continue to work at Cornell on the Dienst protocol and implementation. In the future we plan to provide easier installation and maintenance tools for site administrators, develop and incorporate more powerful search techniques, and extend the system to enforce copyright restrictions.


From:Communications of the ACM, April 1995, Vol 38 No 4 page 47

Copyright 1995 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that new copies bear this notice and the full citation of the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc, fax +1 212 869-0481 or <permissions@acm.org>

The documents contained in these directories are included by the contributing authors as a means to ensure timely distribution of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.