CS514 Fall 2000: Homework 1. Due in class on Tuesday, Sept. 12
 
 
1. Read the following paper: H. Saltzer, D. P. Reed, and D. Clark, "End-to-end arguments in system design", ACM Transactions on Computing Systems, vol. 2, no. 4, 1984. Note that you can obtain this online through our library, which has a digital library access agreement with ACM (http://www.acm.org). Hard copies of the journal itself are also available in Carpenter Library.
a.      Suppose
that you are developing a network for connection to mobile users who have
wireless receivers.  The loss rate on
the wireless link turns out to be several orders of magnitude larger than for
links within the “land-based” part of the network infrastructure.  According to the end-to-end philosophy,
should error correction be done on the wireless link, or should all
error-correction occur end-to-end?
 
 
b.      In
class, we saw that routers drop packets to signal to TCP when overload occurs.  Recall that routers apply this policy
“early”, under the random early detection (RED) approach.  In a strict end-to-end world, shouldn’t the
router be completely indifferent to the protocol running end-to-end near the
application layer?   Does RED violate
the end-to-end philosophy?
 
Actually, I would say that RED does violate the end-to-end philosophy. Obviously, the philosophy is expressed in terms of packet loss and recovery techniques, not throwing packets away. But the fact remains that RED is more or less a part of the end-to-end flow control mechanism that has been moved down into the network. I suppose that the counter argument is that we can’t do this equally well between the end-points, but personally, my feeling is that end-to-end yielded to pragmatism in this specific case.
 
2. Remote Procedure Call (RPC). Give pseudo-code in a Java-based notation to solve the following problem, or explain why the problem cannot be solved. A certain device is controlled by a set of 5 computers, each of which operates a motor (this is common in factories, where large pieces of equipment might have many controllers and motors to operate them). Your job is to program a central controller (a sixth machine) to start all the motors at once.
 
The
procedures available are:
 
     check_status(motor-number), returns {MTR_OK,
MTR_FAULT}
(for
motor-number in 0..4)
     set_motor_status(motor-number,
status) returns {MTR_OK,
MTR_FAULT}
 (for motor-number in 0..4; status in {MTR_ON,
MTR_OFF}
 
Your
solution should work as follows: if all the motors are OK,  then set all motors to MTR_ON,  checking the returned status code for
possible new faults.  If any motor is
faulty at the outset, or if any motor fails when turned on, all motors should
be turned off.  Assume that in addition
to the normal behavior of these procedures, they can throw a timing
exception, which means that the RPC has timed out.    You may assume that no real software
failures occur – the various control programs don’t actually crash.  However, with some non-negligible
probability the network can lose a packet, programs can execute slowly for lots
of reasons.  Assume that the motors
themselves have a significant failure rate.
 
Is
a problem such as this posed correctly? 
Can you suggest a better (“practical, and adequate”) specification that
avoids the issues raised by this specification?
 
As stated, the problem
definitely can’t be solved.  Consider
just the very last step: suppose that my code decides to turn on the
motors.  As things are stated here, the
network could drop all my packets to motor A while letting all my packets
through to motors B, C, D and E.  So
there isn’t any point in giving pseudo-code for the solution; it can’t be done.
 
But if we relax the rules we
can use 2-phase commit (I won’t reproduce the code, it looks just like the one
we saw in class).  The relaxed
specification should state:
 
a)      We
assume that if a packet is retransmitted enough times, it will get through with
arbitrarily high probability.  Thus, the
network doesn’t suffer partitioning failures. 
b)      We
assume that the control program is running on a different machine than the ones
on which the motors run, and that failures can be detected with some high
probability.
c)      Our
goal is that with some specified probability of error p, either
all motors are turned on, or no motor is turned on.  For example, the factory might specify p=1-10-9.  From this probability we can then compute
the appropriate retransmission rates and retry limits, timeouts, etc, to ensure
that indeed, the problem can be solved. 
However, note that our ability to accurately detect failure ultimately
limits the probability that the problem can be solved.
 
In summary, if we weaken the
goal to become probabilistic, a 2PC protocol suffices.