Dienst Architecture 2000-02-29 15:32:35 -0500

Dienst Architecture
Summary Description

Introduction

This document describes the Dienst architecture, a conceptual framework for creating distributed digital libraries.  This architecture forms the basis for two other entities that share the use of the word "Dienst":

The Dienst system (in all three senses) is the foundation for NCSTRL, the Networked Computer Science Technical Report Library.

This document gives a relatively brief overview of the major components of the architecture.  There are:

Many of these features are described in more detail in other documents.  Refer to the References for more information.

Document Model

The Dienst document model has a number of features that allow for storage of content in multiple forms (e.g., text, images, video, audio) and dissemination of that content in multiple variations.  The features of the document model are:

These features are logically exposed through the Dienst protocol requests that provide access to documents. This makes it possible to produce multiple disseminations (manifestations) from an individual document, in contrast to protocols like FTP, which is strictly file based, or HTTP, which allows multiple disseminations but in an unprincipled manner using CGI.

Service Structure

The Dienst architecture is built on the notion of individually defined services that when combined together create a distributed digital library.  The word distributed is used since the services and the resources in a Dienst digital library may be located anywhere on the Internet.  The functionality of a Dienst digital library includes storage and access to resources (digital objects), deposit of new resources, discovery and browsing of those resources, and user registration. 

Communication with and among individual Dienst services takes place via an open protocol, which makes it possible to combine the services in innovative ways, or build other mediator services that make use of the pre-defined Dienst services.  

The services defined in the protocol are as follows:

Human interaction with these services and their protocols is mediated by a user interface service.

The following figure illustrates the interaction of repository services, index services, query mediators (labeled QM), naming services labeled NS, and user interfaces to provide searching and access to documents.

Collections

The previous section described individual Dienst services and how they interact to provide basic digital library functionality - search for documents and provide access to them.  Implicit in the interaction is the ability of an individual service to route requests to the appropriate services or instances of  services.  Dienst introduces the notion of a collection service, which is responsible for providing the information that allows sets of services and other mediator services to interact together in the fashion of a digital library.

The current specification of the collection service provides the following information:

The last item, query routing, is fundamental to the Dienst definition of collection in a distributed digital library.  The notion of physical collocation, which is the basis of collection definition in pre-digital libraries, doesn't apply in the context of distributed digital artifacts.  In contrast, the Dienst architecture defines a collection as a predicate on services and resources, and views a resource to be in a digital library collection if it can be directly discovered using the resource discovery tools of the digital library.  Using this collection definition, a definition of a "computer science" collection might include the set of index servers to which queries should be routed and the set of attributes that should be associated with those queries that would limit their result set to items "in" the computer science collection (those attributes might differ for each index server).  This query routing and query filtering is but one aspect of collection definition - future enhancements to the Dienst architecture might include other collection specific functionality such as thesauri and stop word lists (all of which enhance collection specific resource discovery functionality).  

The following figure illustrates the current use of the collection service in the Dienst architecture.  As shown the query mediator service uses information from the collection service to route queries to the appropriate index services (where appropriate is a function of the characteristics of the query).

Regions

The growth of NCSTRL, especially outside the United States, raised reliability and performance problems due to connectivity characteristics of the global Internet.  To obtain good performance, Dienst defines a set of connectivity regions, which are sets of servers (network nodes) with relatively good mutual network connectivity.  Indexing information from servers outside the region is replicated onto servers within the region.  Regional query routing is implemented by collection views, collection metadata customized for a specific regions, and regional collection servers, which distribute a collection view for their respective region.  Each user interface service is then assigned, at configuration time, to a regional collection server so that its queries are routed within the connectivity region.

The following figure illustrates the functionality of connectivity regions.  In the example, the user interface server in region R1 "believes" that the primary source for indexing information on partition 1 of the collection is at the index server labeled I1.   On the other hand, the user interface server in region R2 "believes" that the primary source for that information is at the index server labeled I1,2.  In other words the collection view, the meta-information about the contents of the collection, of the R1 user interface differs from that of the R2 user interface. 

References

Predicting Indexer Performance in a Distributed Digital Library, Cornell University Technical Report and draft of submission to European Digital Library Conference, May 1999.

Using Query Mediators for Distributed Searching in Federated Digital Libraries, Draft of submission to ACM DL'99, August 1999.

A Characterization Study of NCSTRL Distributed Searching, Cornell University Technical Report, January, 1999

Defining Collections in Distributed Digitial Libraries, D-Lib Magazine, November 1998.

NCSTRL: Design and Deployment of a Globally Distributed Digital Library, Draft of submission to Journal of the Society of Information Scientists (JASIS) 1999.

Making Global Digital Libraries Work:  Collection Services, Connectivity Regions, and Collection Views.  ACM DL'98, June 1998.

The Networked Computer Science Technical Reports Library. Cornell Computer Science Technical Report, July 1996.

Dienst: Building a Production Technical Report Server. Chapter 15 in Advances in Digital Libraries, Springer Verlag 1995.

Dienst: implementation reference manual. Cornell Computer Science Technical Report, May 1995.

Dienst - An Architecture for Distributed Document Libraries. Communications of the ACM, April 1995, Vol 38 No 4 page 47.

"Drop-in" publishing with the World Wide Web. 2nd Int'l WWW Conference 1994.

A protocol and server for a distributed technical report library. Cornell Computer Science Technical Report, June 1994.