We believe that searchable resources on the Internet will continue to expand, making centralized indexes and centralized searching facilities (such as those provided by Yahoo! or AltaVista) increasingly unwieldy. We are focusing on some of the issues that arise when users are trying to access multiple servers that may or may not interoperate smoothly. Our work with NCSTRL has illustrated some limitations of distributed systems, such as the difficulty of providing a fault tolerant system over the Internet. Another difficulty is choosing among replicated servers -- which one will provide the best service?
We are also interested in metadata schemes that inform resource discovery, such as STARTS and GlOSS.
|Performance Based Query
Given a set of servers that are distributed, disjoint, replicated and overlapped (as in NCSTRL), how do we choose to route our query? Given local information about previous performance of a remote server (did it respond? if it responded, how quickly did it respond?), we will predict server behavior. These predictions can be used to choose among servers with replicated data when queries are routed.
|We have built a reference implementation of the Stanford Protocol Proposal for
Internet Retrieval and Search (STARTS). STARTS provides a mechanism for unifying
query interfaces to multiple search engines, while also providing metadata about those
indexes that may be used for query routing decisions.
We are revising the reference implementation to use CORBA as a transport layer and plan to use it in combination with the other CORBA-based digital library services that we are developing. This work is still in progress.
back to Cornell Digital Library Research Group home page