CS5414 (Fall 2012) Homework 4
Replication Architectures

Due: 11pm, 10/15/2012
Relative weight: 10 units

General Instructions. You are expected to work alone on this assignment.

Submit your solution using CMS. No late assignments will be accepted.

To facilitate grading, format your solutions as follows.

Solutions that do not satisfy the formatting guidelines will be returned, ungraded.


Problem 1

A stage is a process that, repeatedly, receives a single input value, applies a function to that value, and then sends the result.
    DO FOREVER
       RECEIVE v FROM input
       val := F(v)
       SEND val TO output
       END
    
Here, F(v) denotes a function evaluation in the mathematical sense---the value of F(v) is unique, depending only on v and not on other state or history. So each output sent by a stage is completely independent of previous inputs it has read and processed.

Given is a collection of stages S_1, S_2, ..., S_N, where each stage potentially employs a different function for F. We can construct a stage pipeline

    S_1 | S_2 | ... | S_n
by routing the output of each stage to the input of another, in a linear fashion. In particular. we connect the outputs stream from each stage S_i (where i < N holds) to the inputs stream to another stage S_{i+1}. The inputs stream for the stage pipeline is defined to be the inputs stream to the first stage S_1; the outputs stream from the stage pipeline is defined to be the outputs stream from the last stage S_N.

  1. A causal message logging approach is being considered for implementing fault-tolerance of a stage pipeline comprising N stages. Assume Lamport's "happens before" relation is used to define causality for this system. Describe the set of messages whose determinants must be logged by S_N when it reads the kth input from its input channel.

  2. Assume that a complete semantic characterization for causality in this system is available and being employed by causal logging protocols. (By thinking about the logic of the system you should be able to derive such a characterization.) Describe the set of messages whose determinants must be logged by S_N when it reads the kth input from its input channel.


Problem 2

Replication and masking can be employed to create an ensemble that has the same input/output properties as a single stage but can tolerate as many as t stage replicas that exhibit Byzantine behavior, as follows.

You may assume that the "consensus protocol" never fails given and that no voter ever fails.

Here are two possible approaches for using replication to create a reliable N-stage pipeline.

  1. For each stage S_i, create an ensemble by using the construction outlined above. Then, connect these ensembles in sequence. Thus, the voter output from each ensemble (except the last) is read by the consensus protocol and becomes the inputs sequence to the replicas comprising the ensemble for successor stage S_{i+1}.

  2. A stage pipeline SP viewed in toto has a single inputs sequence and a single outputs sequence, so it can itself be considered to be a single (large) stage. Construct 2t+1 replicas SP_1, SP_2, ..., SP_N of stage pipeline SP, by using the construction outlined above. Thus, this new system has (only) a single voter, which generates the outputs stream based on the outputs stream produced by each of the 2t+1 stage pipeline replicas.

For both architectures, assume that processors executing pipeline stages can exhibit Byzantine failures but assume that voters, communication lines, etc are fault-free

  1. Characterize the sets of failures each architecture tolerates, as follows. What is the smallest number of faulty processors that could cause erroneous output? What is the largest number of faulty processors that could be tolerated without necessarily causing erroneous output?

  2. The delay in a stage pipeline is the time that elapses from when an input is read by the first stage until the corresponding output is produced by the last stage. Give a characterization of the delay exhibited by each of the two architectures when there are no failures but assuming that each different processor might run at some fixed but different speed. Is one architecture likely to be faster than the other? Justify this claim.