Homework 6:  Due in class on Thursday, November 2
 
Because of the project, this is the last non-project homework assignment you’ll receive in CS514. Solutions to this problem should be about a page of typed material – certainly not more. If you can give a complete answer in less than a page, this will be fine with Professor Birman!
 
Non-determinism. When solving homework 4 (the Nasa assignment), quite a few people proposed mechanisms that only work for programs that are deterministic.
a) List all the sources of non-determinism that arise in the programming language and operating system with which you are most familiar.
 
I’ll focus on UNIX using
C.  My definition of non-determinism
will be that a single program, run multiple times but under identical
conditions, would give noticeably different results or outputs or behavior in a
way that might mean that if NASA actually ran the thing twice, it wouldn’t be
possible to just compare the outputs because correct runs might produce
multiple, different, outputs.  On the
other hand, I’ll assume that these programs aren’t buggy or completely flakey.
 
This is everything I can think
of, but I suspect that I’ve left a number of items off!  Notice that not all of these necessarily
result in changes to the output behavior of the program.  It really depends on what the program was
doing when the event occurs!
 
The list for other mixtures of
language and operating system may be different but my guess is that most items
would be similar and that many would be identical.
 
b) For this same setting, explain what would need to be done to write a deterministic program.
 
A very simple program that
doesn’t do I/O or use system calls except, perhaps, to read an input file and
write its results would be deterministic. 
For example, if you run the sort program on the same input file, it will
always give the same output unless someone tries to interrupt it.  Most of the issues mentioned above arise in
somewhat more complex programs, and a number of them arise only with concurrency
or message-passing.  So: a very simple
program, given the same inputs, would produce the same results.  This assumes that the library procedures
used by that program are also very simple. 
One challenge, of course, is that we don’t really know what a library
does when we call one of its procedures. 
For example, who knows when “printf” will decide to flush the output
channel?  So when we say “very simple”
procedures, this may be quite a restriction!
 
It seems plausible that one
could develop a program like “Purify” that could check for these conditions and
warn that a program might give non-deterministic behavior.  But it might give warnings on just about
everything!  And you would need source
files to run this scan program – for the libraries used, too.
 
c) Revisit the Nasa application from homework 4. How likely is it that Nasa could live with the rules you proposed in part b?
 
Very unlikely.  Nasa has a general mix of computing
applications in mind, so some will be fairly sophisticated and many will be
non-deterministic for one or another of the reasons cited. Nasa developed them
long ago and may not even have the source files – for example, big image
compression algorithms that they use to reduce the amount of data they will
transmit to the ground system.  Concurrency is a remedy to slow performance, hence Nasa (having
optimized its programs) will probably have a fairly concurrent, and so
non-deterministic, job mix.  Most
likely, this means that when Nasa “runs the same program multiple times, then
compares the output” we’ll need a special purpose comparison program, written
separately for each application.  But
this type of special purpose comparison might be feasible for lots of
applications.  So perhaps Nasa can live
with non-determinism, even if it can’t do naïve things like just running the
program twice and checking to see if the behavior was absolutely identical.