Distributed Searching and Resource Discovery

Cornell Digital Library
Research Group

Distributed Searching and Resource Discovery

We believe that searchable resources on the Internet will continue to expand, making centralized indexes and centralized searching facilities (such as those provided by Yahoo! or AltaVista) increasingly unwieldy. We are focusing on some of the issues that arise when users are trying to access multiple servers that may or may not interoperate smoothly. Our work with NCSTRL has illustrated some limitations of distributed systems, such as the difficulty of providing a fault tolerant system over the Internet. Another difficulty is choosing among replicated servers -- which one will provide the best service?

We are also interested in metadata schemes that inform resource discovery, such as STARTS and GlOSS.

Distributed Searching

Performance Based Query Routing

Given a set of servers that are distributed, disjoint, replicated and overlapped (as in NCSTRL), how do we choose to route our query? Given local information about previous performance of a remote server (did it respond? if it responded, how quickly did it respond?), we will predict server behavior. These predictions can be used to choose among servers with replicated data when queries are routed.

tlpf.gif (4850 bytes)

Resource Discovery

We have built a reference implementation of the Stanford Protocol Proposal for Internet Retrieval and Search (STARTS). STARTS provides a mechanism for unifying query interfaces to multiple search engines, while also providing metadata about those indexes that may be used for query routing decisions.

We are revising the reference implementation to use CORBA as a transport layer and plan to use it in combination with the other CORBA-based digital library services that we are developing. This work is still in progress.

back to Cornell Digital Library Research Group home page