Homework 6:  Due in class on Thursday, November 2

 

Because of the project, this is the last non-project homework assignment you’ll receive in CS514.    Solutions to this problem should be about a page of typed material – certainly not more.  If you can give a complete answer in less than a page, this will be fine with Professor Birman!

 

Non-determinism.  When solving homework 4 (the Nasa assignment), quite a few people proposed mechanisms that only work for programs that are deterministic.

a)       List all the sources of non-determinism that arise in the programming language and operating system with which you are most familiar. 

 

I’ll focus on UNIX using C.  My definition of non-determinism will be that a single program, run multiple times but under identical conditions, would give noticeably different results or outputs or behavior in a way that might mean that if NASA actually ran the thing twice, it wouldn’t be possible to just compare the outputs because correct runs might produce multiple, different, outputs.  On the other hand, I’ll assume that these programs aren’t buggy or completely flakey.

 

This is everything I can think of, but I suspect that I’ve left a number of items off!  Notice that not all of these necessarily result in changes to the output behavior of the program.  It really depends on what the program was doing when the event occurs!

 

The list for other mixtures of language and operating system may be different but my guess is that most items would be similar and that many would be identical.

 

b)       For this same setting, explain what would need to be done to write a deterministic program. 

 

A very simple program that doesn’t do I/O or use system calls except, perhaps, to read an input file and write its results would be deterministic.  For example, if you run the sort program on the same input file, it will always give the same output unless someone tries to interrupt it.  Most of the issues mentioned above arise in somewhat more complex programs, and a number of them arise only with concurrency or message-passing.  So: a very simple program, given the same inputs, would produce the same results.  This assumes that the library procedures used by that program are also very simple.  One challenge, of course, is that we don’t really know what a library does when we call one of its procedures.  For example, who knows when “printf” will decide to flush the output channel?  So when we say “very simple” procedures, this may be quite a restriction!

 

It seems plausible that one could develop a program like “Purify” that could check for these conditions and warn that a program might give non-deterministic behavior.  But it might give warnings on just about everything!  And you would need source files to run this scan program – for the libraries used, too.

 

c)       Revisit the Nasa application from homework 4.  How likely is it that Nasa could live with the rules you proposed in part b?

 

Very unlikely.  Nasa has a general mix of computing applications in mind, so some will be fairly sophisticated and many will be non-deterministic for one or another of the reasons cited. Nasa developed them long ago and may not even have the source files – for example, big image compression algorithms that they use to reduce the amount of data they will transmit to the ground system.  Concurrency is a remedy to slow performance, hence Nasa (having optimized its programs) will probably have a fairly concurrent, and so non-deterministic, job mix.  Most likely, this means that when Nasa “runs the same program multiple times, then compares the output” we’ll need a special purpose comparison program, written separately for each application.  But this type of special purpose comparison might be feasible for lots of applications.  So perhaps Nasa can live with non-determinism, even if it can’t do naïve things like just running the program twice and checking to see if the behavior was absolutely identical.