The Indexer State Database (ISDB) and reliability metric:
Background and Current (old) Design
(DRAFT: do not copy or redistribute)
![]()
In order for the dienst user interface to be able to simultaneously service different search requests, we need a place to store the most valid, up-to-date information about remote indexer performance. We keep this information about remote indexer performance in the indexer state database (ISDB), a file that is accessed by dienst processes spawned by the dienst user interface.
The reliability metric is the test we apply to determine if we should use a remote indexer for a particular search. The test asks if the remote indexer is "reliable" -- will it respond before the timeout value? If the answer is yes, then we use this remote indexer. If the answer is no, we "demote" the indexer: we don't use it for this search (or for any others for a while) and we try to use another indexer for the authority we're searching.
An indexer is "demoted" if it fails the reliability test -- if the algorithm predicts it will not respond before the timeout value. When an indexer is "demoted," it isn't used in any searches for a period of time. When an indexer is demoted, dienst is fault tolerant: it tries to find another indexer for the desired authority.
Once an indexer is demoted, it isn't used for any searches for a period of time indicated by $fail_retry_time, a global constant set in Config/config_constants.pl. Once that period of time has passed, we retry the reliability of the indexer: will it now respond to a search request before the timeout value? In the context of ISDB design, we need to know how to tell when we should perform the reliability retry.
![]()
In the current system, the ISDB has four pieces of information for each remote indexer:
The host and port can be thought of as the "key" of this data; the other information is "status" information.
In the current release of dienst, the reliability metric is "has this indexer failed to respond before the timeout for the last five (default) consecutive searches?" We can tell this from the data in the ISDB -- if the number of consecutive failures is less than five, then the indexer passes the reliability metric and is used. If the number of consecutive failures is five or greater, then the indexer is used only if enough time has gone by to indicate a reliability retry is in order (see below).
If an indexer fails the reliability metric (and is not yet ready for a reliability retry), then we do not use this indexer -- a secondary indexer is sought for the authority in question.
Some problems with the reliability metric in the current system:
An indexer is demoted if a "failure" is recorded that causes the ISDB entry for this indexer to fail the reliability metric in the future. In other words, if the ISDB entry has four failures, and we just got a fifth one, then when we update the ISDB entry with this information, we demote the indexer. The demotion is indicated by setting the reinitialization timestamp to the time of the demotion -- it is zero otherwise.
When the ISDB is read, if an indexer fails the reliability metric AND if the reinitialization timestamp for an ISDB entry is older than $fail_retry_time (a global variable set in Config/config_constants.pl) then we do a reliability retry. In the current system, a reliability retry means we reset both the number of consecutive failures and the reinitialization timestamp to zero, use this indexer in the pending search and begin tracking data for this indexer all over again..
Some problems with the reliability retry in the current system:
Details on how reliability retry will be improved with track one changes.