Dienst: distributed searching improvements - "track two"

 

(DRAFT: do not copy or redistribute)

 

Here are somewhat random, somewhat informed thoughts about what improvements to focus on after implementing the "track one" dienst distributed searching changes.

Given Jim French's work on the mathematical distribution of our response time data, it should be possible to have more accurate predictions of timeout values given a small number of data points, thereby making it possible to appropriately adjust timeout values as conditions change.

At a minimum, it might be useful to separate non-responding indexers from responding indexers in the algorithm. Given results from the simulator, we should be able to find an appropriate way to do this that maximizes performance for the users. We had originally intended to have a timed low pass filter for failure rate in addition to the timed low pass filter for response time, but it turned out that determining a reasonable reliability metric that combined these two filters would have required a great deal of computation in order to determine various constants and variables. The simulator will provide a way to perform these computations.

It is likely that an ideal system would have no constants in the reliability metric or ISDB formula: all would be variables. We might even want a reliability metric that changes as conditions change.

If we distinguish between indexers failing to respond at all and actual responses that are errors, and also between failures to respond due to connectivity problems and failures to respond due to server problems, then we can have better fault tolerance. Again, this informs our choices of indexers.

We could also pay attention to server load and network load, potentially.

There is a whole avenue to pursue here involving the sharing of performance metadata among remote indexers to improve fault tolerance.

If we choose indexers based on response time, then we probably want to add a random factor into the computation of expected response time. This is to avoid having all searches go to one or two fast indexers, thereby slowing them down. [thanks to Robbert van Renesse for this notion]

Jim French expressed an interest here; Dave Fielding is also interested, I believe.