Notes
Outline
Building Reliable Distributed Information Spaces
Carl Lagoze
CS 430
10/22/2002
Characteristics of a library
Functions
Selection
Access
Organization
User support
Preservation
Characteristics
Standardized
Professionalized
Service-oriented
In it for the long-haul
Conservative
Trustworthy
Expensive (human centric)
Perspective on the Budget
Library in current environment
“I don’t do libraries” – anonymous Cornell undergrad to Bob Constable
How do you use the library?
Go to the library to study?
Go to the library to do research?
Talked to a reference librarian?
Use the library gateway or electronic resources?
Characteristics of the Web
Decentralized/Anarchic/Illegal
Agreements are technical (at best)
Roles are undefined and fluid
Immediate
Ephemeral
Integrity not established
Anonymous (or “no one knows you are a dog”)
Slide 6
What is a Digital Library?
What is a Digital Library?
Many facets of the problem/solution
Technical Trade-offs
National Science Digital Library
(NSDL)
Goal: Reform science education in the US in the digital age
$25M in funding 2002-2006
Over 80 institutional grants for collections, services, core infrastructure (technical, economic, organizational)
Cornell is primary technical development partner
Carl Lagoze, Director of Technology
http://www.nsdl.org
Building service and knowledge layers over a variety of resources for a variety of users
Slide 13
Slide 14
Resources for Core Integration
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Levels of interoperability
Slide 22
Function versus cost of acceptance
Z39.50 principles
State
Slide 26
Open Archives Initiative Protocol for Metadata Harvesting
Low-barrier protocol for exposing structured information (metadata) from cooperating repositories
Provides opportunity for building comprehensive service network
http://www.openarchives.org
OAI-PMH: A simple two party model for sharing structured information
Resource discovery over distributed collections
OAI-PMH Key technical features
Deploy now technology – 80/20 rule
Simple HTTP encoding
Foundation of established XML standards
Multiple metadata formats
Repository partitioning (sets)
Selective harvesting (sets and dates)
Clean partition between core and implementation-specific extensions
Multiple item-level metadata
Collection level metadata
OAI Verbs
Identify – repository characteristics
ListMetadataFormats – DC required
ListSets – repository paritioning
ListRecords – (selectively) harvest metadata
ListIdentifiers – (selectively) harvest metadata identifiers
GetRecord – known item retrieval
Slide 32
Metadata Repository
Importing metadata into the MR
Exporting metadata from the MR
Slide 36
The Metadata Repository as a Resource
Records are exposed through Open Archives Initiative harvesting protocol.
Core Integration team will provide some services based on the metadata repository.
The architecture encourages others to build services.
Building on the basics
Gathering resources from the open web
Automated collection aggregation
Automated metadata generation
Content of resource
Context of resource
Automated quality assessment
Annotation, review, and aggregation environment
If you find this all interesting
CS502 – Architecture of Web information Systems