Lecture 26: Concurrency

SML code examples for this lecture

So far in this class we've been talking about sequential programs. Execution of a sequential program proceeds one step at a time according to the evaluation rules, with no choice about which step to take next. We saw this in the various SML semantics that we explored earlier. Sequential programs are somewhat limited because they are not very good at dealing with multiple sources of simultaneous input. For this reason, many modern applications are therefore concurrent (or multi-threaded, parallel): there are multiple threads of execution concurrently executing in parallel.

For example, a web browser must be simultaneously handling input from the user interface, reading and rendering web pages incrementally as new data comes in, and running embedded programs written in Java, Javascript and other languages. All these activities must happen at the same time, so separate threads are used to handle each of them. Another example of a naturally concurrent application is a web crawler, which traverses the web collecting information about its structure and content. It doesn't make sense for the web crawler to access sites sequentially, because most of the time would be spent waiting for the remote server and network to respond to each request. Therefore, a typical web crawler is highly concurrent, simultaneously accessing thousands of different web sites. This design uses the processor and network efficiently.

Concurrency is a powerful language feature that enables new kinds of applications, but it also makes writing correct programs more difficult, because execution of a concurrent program is nondeterministic : the order in which things happen is not known ahead of time. The programmer must think about all possible orders in which the different threads might execute, and make sure that in all of them the program works correctly. If the program is purely functional, nondeterminism is not a problem because evaluation of an expression always returns the same value no matter what. For example, the expression (2*4)+(3*5) could be executed concurrently, with the left and right products evaluated at the same time. The answer would not change. Imperative programming is much more problematic. For example, the expressions (!x) and (a := !a+1), if executed by two different threads, could give different results depending on which thread executed first, if it happened that x and a were the same ref.

Concurrent ML

A few modern languages directly support concurrent programming. Java is one. Languages like C and C++ don't directly support concurrency, though most operating systems allow concurrent programs to be written in these languages, somewhat awkwardly. It turns out that the SML distribution includes Concurrent ML (CML), an extension to SML that supports a relatively clean model of concurrent programming. Concurrent ML is found in the sml/src/cml directory of the distribution. To execute a program in CML, you use the function RunCML.doit:

structure RunCML = struct
  (* doit(f, t) evaluates the expression f() with thread quantum t. 
   * It returns the return status of the program. *)
  val doit: (unit->unit)*(Time.time option)->Word32.word
  ...
end

The thread quantum is the amount of time that a processor will work on executing any one thread before switching to another thread. Although we think of the machine as running all the threads at once, it is much more efficient for a processor to execute one at a time; for one thing, the various caches work better. As long as the quantum is sufficiently small (usually, a few milliseconds), it isn't noticeable. The machine may have multiple processors that can each work on running a separate thread, but the semantics of running a program don't change depending on the number of processors. A concurrent system has a scheduler that decides what thread to run on a given processor. When the current thread's quantum expires, the scheduler is invoked.

CML provides a special operation that creates a new thread:

structure CML = struct
  (* spawn(f) creates a new thread that evaluates the expression f()
   * concurrently with the current thread. It returns the thread
   * identifier of the new thread. *)
  val spawn : (unit -> unit) -> thread_id
  ...
end

For example, we can write a program that spawns two threads that generate output:

- fun prog() = (CML.spawn (fn() => print "hello!"); print "goodbye!");
- val q = Time.fromMilliseconds(1)
- RunCML.doit(prog, q)

There are two possible executions of this code: it might print "hello!goodbye!" or "goodbye!hello!", depending on whether the spawned thread gets to run first or its parent thread does. If we care which one we get, this code won't do.

Shared-memory communication

You've probably noticed that the computation of a thread is given type unit->unit, which doesn't give a lot of opportunity for a thread to send a result back to its parent thread. For example, if the web browser spawns a thread to read an image embedded in a web page, it needs to get the actual image data back from that thread. One obvious way to accomplish this is using refs. Here is a circuitous way to add two numbers:

fun prog() = let val result = ref 0 in
               CML.spawn (fn() => result := 2+2);
               print(Int.toString(!result))
             end

If we're lucky this will work, but what if the parent tries to access the contents of result before it is updated? In that case we'll read the original 0. Assuming that we know the result isn't zero, we could try to wait until it gets updated:

fun prog() = let val result = ref 0
               fun wait() = if !result = 0 then wait() else ()
             in
               CML.spawn (fn() => result := 2+2);
               wait();
               print(Int.toString(!result))
             end

This is an example of a primitive synchronization technique known as spinning. Two threads synchronize when they each figure out what the other thread is doing. In this case we don't want the printing thread to print until the computing thread is done. On a single-processor system, this is probably an unsatisfactory synchronization technique because the parent thread might waste processor time waiting for the result to arrive. It can make sense in a multiprocessor system if the expected spinning duration is small. (CML provides a function yield() that allows a thread to give up its quantum, which can be helpful.)

For real programs we need more powerful synchronization techniques. Consider what happens if we write a simple web server that allows money transfers between two accounts (represented as refs). A web server typically spawns threads to handle each incoming request. We could easily end up with code with an effect like the following:

 fun prog() = let val acct_from = ref 1000
                  val acct_to = ref 1000
                  fun transfer(n: int) = (acct_from := !acct_from - n; acct_to := !acct_to + n)
              in
                 CML.spawn(fn() => transfer(100)); (* thread 1 *)
                 CML.spawn(fn() => transfer(100)); (* thread 2 *)
                 print(Int.toString(!acct_from)^" "^Int.toString(!acct_to))
              end

Clearly, we would expect this to print out "800 1200". But it might not, because the threads can be scheduled in other ways. Each thread does a read and a write from each of acct_to and acct_from. Consider some possible orders of execution on a single-processor machine:

thread 1             thread 2
read acct_from (1000)
write acct_from (900)
read acct_to (1000)
write acct_to (1100)
                     read acct_from (900)
                     write acct_from (800)
                     read acct_to (1100)
                     write acct_to (1200)
Result: 800 1200

thread 1             thread 2
read acct_from (1000)
                     read acct_from (1000)
                     write acct_from (900)
write acct_from (900)
read acct_to (1000)
write acct_to (1100)
                     read acct_to (1100)
                     write acct_to (1200)
Result: 900 1200

With the second, entirely possible schedule of execution, $100 is manufactured from thin air. Worst yet, we could test this code quite a bit and have it return the right result every time. Yet when deployed as a product, it will occasionally create or consume money. The problem is that we really cannot allow two threads to execute the transfer code at the same time; it is an example of a critical section that only one thread should be able to run at a time.

This kind of problem is the reason for the synchronized statement and attribute in Java. In Java we could wrap synchronized around the whole transfer function, and prevent the interleaved executions shown above. Another language feature that can be used to prevent interleaved access is locks. One thread acquires a lock, does the transfer, and releases the lock. If a thread tries to acquire a lock that is currently held by another thread, it blocks waiting until the first thread releases the lock. This kind of simple lock is known as a mutex, for "mutual exclusion". Locks are difficult to program with if there is more than one lock, because of the possibility of deadlock when two or more threads can both try to acquire locks the other one holds, e.g.

thread 1              thread 2
acquire(L1)           acquire(L2)
...                   ...
acquire(L2)           acquire(L1)

In this example both threads will block and the program will stop. Debugging programs to eliminate deadlocks can be very difficult.

These mutual exclusion features (such as synchronized and mutexes) can be implemented using just refs, but it turns out to be amazingly difficult to get right; for this reason they are usually provided as primitives.

Message-passing communication

What we have just been describing is known as a shared-memory approach to thread communication, because the state of refs is shared among the various threads. Shared-memory communication does not work in all concurrent programming models; for example, the standard programming model of Unix (Linux, etc.) is based on processes rather than threads. The major difference is that processes do not share any state; a spawned process gets a copy of the state of its parent process.

CML discourages communication through refs; instead, it takes the other major approach to managing thread communication and synchronization, called message-passing. Message passing has the benefit of being easier to reason about, and also easier to implement in a distributed system. In CML, threads communicate and synchronize using channels, mailboxes, and events (These are terms specific to CML.) Channels and mailboxes provide the ability to deliver values from one thread to another. Events give a thread the ability to synchronize on activity by multiple other threads.

Channels

structure CML = struct
  ...
  type 'a chan
  val channel: unit -> 'a chan
  val send: 'a chan * 'a -> unit
  val recv: 'a chan -> 'a
  ...

A value of type T chan is a channel that transmits values of type T. A new channel is created using channel. The channel allows two threads to synchronize: a sending thread and a receiving thread. When a thread evaluates send(c,x) for some channel c and message value x, it then blocks waiting for some thread to receive the value by calling recv(c). Once one thread is waiting on send and another on recv, the value x is transferred and becomes the result of the recv. The two threads then both resume execution. Similarly, if a thread performs a recv(c) but there is no other thread doing a send already, the receiving thread blocks waiting for a sender. This is known as synchronous message-passing because the sender and receiver synchronize at the moment that the message is delivered.

Here is a simple example of using channels:

open CML
fun prog() = let val c1: int chan = channel() in
  spawn (fn() => send(c1,2));
  spawn (fn() => print(Int.toString(recv(c1))));
  ()
end

Mailboxes

struct Mailbox = struct
  type 'a mbox
  val mailbox : unit -> 'a mbox
  val send : ('a mbox * 'a) -> unit
  val recv : 'a mbox -> 'a
  ...
end

Mailboxes provide asynchronous messages: the sender does not wait for the receiver before going on. Otherwise they act like channels. A mailbox provides a FIFO message queue: messages are delivered in the order they were sent. This is important because a mailbox can contain a large number of messages. Mailboxes can be implemented using channels and threads; it's a good exercise to think about how to do this.

Events

Concurrent applications need the ability to select from several different possible input sources. CML provides this ability through the event abstraction:

structure CML = struct
  ...
  val recvEvt: 'a chan -> 'a event
  val select: 'a event list -> 'a
  ...
end
structure Mailbox = struct
  val recvEvt: 'a mbox -> 'a event
  ...
end

Read events

Given a channel or a mailbox, we can generate a corresponding event to synchronize on. Given a list of events, the select function blocks until one of the events arrives, then reads from the corresponding channel or mailbox. Without select the program can only test for incoming data on one channel at a time, blocking if there is no data. In Unix there is a system call select that provides similar functionality.

Using events we can write an extended version of the banking example from earlier. Since we want only one thread to be able to do the update at a time, we invent a thread whose job that is. This thread also processes requests to read the balance, because otherwise a read might be interleaved with an update, resulting in inconsistent account balances. Other threads communicate with it via channels:

open CML
fun prog() = let
  val c1: int chan = channel()
  val e1 = recvEvt(c1)
  val c2: int chan = channel()
  val e2 = recvEvt(c2)
in
  spawn(fn() => send(c1,100));
  spawn(fn() => send(c2,100));
        
  spawn(fn() =>
    let val acct_from = ref 1000
        val acct_to = ref 1000
        fun server() = (
          let val amount = select([e1,e2]) in
             acct_from := !acct_from - amount;
             acct_to := !acct_to + amount
          end;
          server())
    in
      server()
    end);
  print "main thread done"
end

(What if we wanted the server to send back results? What kind of channel could we use then?)

Send events

A thread may also want to select from a number of different channels to send output on. In this case it might want to choose the channel on which there is already a receiver waiting. Send events provide this functionality. A send event is created by using the sendEvt function:

val sendEvt: 'a chan * 'a -> unit event

Selection on a send event created with sendEvt(c,v) enables it to send the value v when there is a receiver waiting on the channel c. The select call then returns a unit value to indicate that the send has occurred.

Wrapping events

In general a CML thread may want to wait on various different events, with different associated types. The events cannot be put onto a common event list because the types are not equal. Events can be wrapped to give them a different type:

val wrap: 'a event * ('a -> 'b) -> 'b event

This allows simultaneous selection on receive and send events, for example. It also helps keep track of which of several channels delivered an event. In the server example above, we might want to know which client thread sent a value, which can be accomplished by tagging the request:

let val (client: int, amount:int) = select([wrap(e1, fn(a) => (1,a)),
                                            wrap(e2, fn(a) => (2,a))])

When a value arrives on the channel, the function wrapped around the event is automatically applied to that value.

Concurrent ML home page
John H. Reppy, Concurrent Programming in ML, Cambridge University Press, 1999.