Dienst: distributed searching
improvements - "track one"
(DRAFT: do not copy or redistribute)
Goal: new release with distributed searching improvements
ready by September 30, 1997
Problems we're trying to solve:
- searches take too long
- poor predictions of whether an indexer will respond
before the timeout value
- continued research in distributed searching
Proposed changes
Improve timeout values for
phase one and phase two
Rework method for
Reliability retry
Improve reliability
metric
Overlap phase one and
phase two searching
Indicate phase two searches in
dienst logs
Tabulate phase
statistics in log summaries and in
Dienst/htdocs/dienst_runtime/logs.html

Improve timeout values for phase one
and phase two
Determine better timeout values for each phase by
- Examining search response data in logs
- Changing value on ncstrl and/or cs-tr and examining
logs after change
New value will still be a hardcoded constant in
config_constants.pl
Rework method for reliability retry
If an indexer is deemed "unreliable" and is
demoted, we currently put it back in service; we retry
the reliability of the indexer at the expense of our
users. The new method will fork a process that calls the
remote indexer with a dienst Version request. If the
Version request doesn't respond before the timeout then,
a) re-set the retry interval for this indexer in the
indexer state database (ISDB) and b) if we are able to
learn the e-mail address of maintainer@remote.indexer and
it's not a network problem (ping works) then we might
send a message indicating failure of dienst server at
remote site. (Dave's notion is to have these e-mail
addresses at MMS and at RMS; also, we would need to be
careful to avoid flooding mailboxes with messages.) If
the remote indexer Version request does respond before
the timeout, then reinitialize ISDB for this indexer
(perhaps after doing further response time testing with a
search request at the remote indexer).
Improve Reliability metric
This will involve a redesign of the data kept in the indexer state
database (ISDB) as well as a reworked reliability metric.
The goal is to greatly improve our prediction of
"will this indexer respond before the timeout
value?" and use the predictions to reduce the number
of searches that enter phase two.
Overlap phase one and phase two
searching
Continue listening for phase one indexers after phase two
has started. We will need to maintain a list of indexers
called in phase one and remove an indexer from the list
whenever we receive valid results from said indexer. Once
we hit the phase one timeout, we keep track of the phase
one indexers (authorities?) that havent yet
responded and we also keep track of the phase two
indexers (authorities?) were calling. At this point
we take results from either phase one or phase two, and
finish taking results when we get to the first of the
following conditions:
- all authorities are accounted for (phase one or phase
two responses)
- phase two timeout
We need to make sure we dont deliver duplicate
results for any authority, we need to keep response time
information even if we dont use results from a
particular indexer, and we need to keep track of how
phase two ended in the logs (see next proposed change).
Indicate phase two searches in dienst logs
We want to make sure we are always capturing the data to
indicate when we are entering phase two, which indexers
were used in phase two, and how phase two ended. Possible
methods for indicating this:
- Have separate STATISTICS log entries for phase one
and phase two results
- Add a fake indexer to the STATISTICS log entry to
indicate start of phase two? (indexer=phase.two)
- Add message to log indicating phase two is entered,
which indexers were called, and how it finished;
still have one STATISTICS log entry per search
Tabulate phase statistics in log
summaries and in Dienst/htdocs/dienst_runtime/logs.html
This will include both of the following:
- code to reflect the new phase two indications as
noted above, but that will work fine on older
dienst logs as well
- code that will infer phase two searching for
older dienst logs