Homework 6: Due in class on Thursday, November 2

Homework 6: Due in class on Thursday, November 2

Because of the project, this is the last non-project homework assignment you’ll receive in CS514. Solutions to this problem should be about a page of typed material – certainly not more. If you can give a complete answer in less than a page, this will be fine with Professor Birman!

Non-determinism. When solving homework 4 (the Nasa assignment), quite a few people proposed mechanisms that only work for programs that are deterministic.

a) List all the sources of non-determinism that arise in the programming language and operating system with which you are most familiar.

I’ll focus on UNIX using C. My definition of non-determinism will be that a single program, run multiple times but under identical conditions, would give noticeably different results or outputs or behavior in a way that might mean that if NASA actually ran the thing twice, it wouldn’t be possible to just compare the outputs because correct runs might produce multiple, different, outputs. On the other hand, I’ll assume that these programs aren’t buggy or completely flakey.

Software interrupts (so-called “signals”). These can arrive at unpredictable times, even if identical programs receive the identical signal. If the program reacts to the interrupt in a way that changes its state, it might do different things in different runs (I won’t repeat this point again and again, but this is the basic issue for all of the scenarios that follow). For example, a program that receives an I/O is ready signal on UNIX might then read the message and start processing it. Perhaps, had it received that signal later the data used to process the message would be different and the program’s behavior would be changed.
Reading the processor clock: the value is unpredictable and even identical programs won’t see the same values
Paging and scheduling delays, L1 and L2 cache hit rates. Even a single-threaded program might run at “different speeds” depending on the conditions of the computer. This could mean that in some runs, the program “sees” certain data before it changes and on other runs it “sees” that data later. It might also impact when the program poses a question to the user, what message it pops up on the screen, etc.
Locking. When a file lock or mutex lock is requested, it is unpredictable how long it will take to be granted
I/O ready. When using “select” to detect that I/O is available, one can’t predict how long it will take, after a device or channel becomes read, before the data is visible to the application
Messages. When using message passing, even in a single machine, one can’t predict the order in which messages will be delivered, or even whether or not a message will arrive. This is a big source of unpredictable behavior!
Process id and other unique numbers. These will typically depend on the situation when the process was launched
Multithreading issues: scheduling order is unpredictable
Exceptions. Certain types of exceptions, like EINTR, are unpredictable and may not occur in the same way during every run
Reading data off a pipe or TCP stream. The number of bytes the read returns is not predictable: if a producer sends data at a constant rate the receiver might still see one read return 20 bytes, then the next return 1024 bytes – the outcome depends very much on the scheduling order for the processes
With multiple address spaces: relative scheduling order
With multiple address spaces: if mmap is used to share member, order in which writes are done is a side-effect of the scheduling order
User input: the input provided by a user, the speed at which he or she types, etc
Certain types of resource limits, which might impact the behavior of a program. For example, the amount of free space in /tmp
File names automatically generated for temporary files
If output is redirected, the target file system could run out of space, triggering an exception
Inode numbers assigned to files
Current processor load, if you use system calls to ask about the load (similar for many system calls)
Device status, e.g. if you check with ioctl() or stat()
Timeout for operations on remote-mounted file systems using soft mounts
Contents of files you’ll see if you read a file that some other program is currently writing
Values in variables (especially register variables) that haven’t been initialized
Certain system calls, like malloc, might fail if the OS is running low on resources – a condition provoked by the behavior of other programs, and hence not identical on different runs.

This is everything I can think of, but I suspect that I’ve left a number of items off! Notice that not all of these necessarily result in changes to the output behavior of the program. It really depends on what the program was doing when the event occurs!

The list for other mixtures of language and operating system may be different but my guess is that most items would be similar and that many would be identical.

b) For this same setting, explain what would need to be done to write a deterministic program.

A very simple program that doesn’t do I/O or use system calls except, perhaps, to read an input file and write its results would be deterministic. For example, if you run the sort program on the same input file, it will always give the same output unless someone tries to interrupt it. Most of the issues mentioned above arise in somewhat more complex programs, and a number of them arise only with concurrency or message-passing. So: a very simple program, given the same inputs, would produce the same results. This assumes that the library procedures used by that program are also very simple. One challenge, of course, is that we don’t really know what a library does when we call one of its procedures. For example, who knows when “printf” will decide to flush the output channel? So when we say “very simple” procedures, this may be quite a restriction!

It seems plausible that one could develop a program like “Purify” that could check for these conditions and warn that a program might give non-deterministic behavior. But it might give warnings on just about everything! And you would need source files to run this scan program – for the libraries used, too.

c) Revisit the Nasa application from homework 4. How likely is it that Nasa could live with the rules you proposed in part b?

Very unlikely. Nasa has a general mix of computing applications in mind, so some will be fairly sophisticated and many will be non-deterministic for one or another of the reasons cited. Nasa developed them long ago and may not even have the source files – for example, big image compression algorithms that they use to reduce the amount of data they will transmit to the ground system. Concurrency is a remedy to slow performance, hence Nasa (having optimized its programs) will probably have a fairly concurrent, and so non-deterministic, job mix. Most likely, this means that when Nasa “runs the same program multiple times, then compares the output” we’ll need a special purpose comparison program, written separately for each application. But this type of special purpose comparison might be feasible for lots of applications. So perhaps Nasa can live with non-determinism, even if it can’t do naïve things like just running the program twice and checking to see if the behavior was absolutely identical.