So far in this class we have considered sequential programs. Execution of a sequential program proceeds one step at a time, with no choice about which step to take next. Sequential programs are limited, because they are not very good at dealing with multiple sources of simultaneous input and because they can only execute on a single processor. For this reason, many modern applications are expressed as concurrent programs. There are many different approaches to concurrent programming, but they all share the fact that a program is split into multiple independent threads of execution. Each thread runs a sequential program, but the collection of threads no longer results in a single overall predictable sequence of execution steps. Rather, execution proceeds concurrently, resulting in potentially unpredictable order of execution for certain steps with respect to others.
The granularity of parallel programming can vary widely, from coarse-grained techniques that loosely coordinate the execution of separate programs, such as pipes in Unix (or even the http protocol between a Web server and its clients), to fine-grained techniques where concurrent code shares the same memory, such as lightweight threads. In this lecture we will consider the simple concurrency mechanisms provided in Jane Street's async library.
Concurrency is a powerful language feature that enables new kinds
of applications, but it also makes writing correct programs more
difficult, because execution of a concurrent program is
nondeterministic: the order in which things happen is not known ahead
of time. The programmer must think about all possible orders in which
the different threads might execute, and make sure that in all of them
the program works correctly. If the program is purely functional,
nondeterminism is easier because evaluation of an expression always
returns the same value no matter what. For example, the
(2*4)+(3*5) could be executed concurrently,
with the left and right products evaluated at the same time. The
answer would not change. Imperative programming is much more
problematic. For example, the expressions
(x := !x+1), if executed by two different threads,
could give different results depending on which thread executed first.
The async library attempts to combine the best aspects threads and event loops. The simplest way to use async is through utop. (The first two commands update your OCaml packages and are optional.)
% opam update % opam upgradeThen invoke utop and load async:
% utop utop # #require "async";; utop # open Async.Std;;It is organized around a collection of primitives organized around the notion of a deferred computation. You can find documentation for async here.
A partial signature for the Async.Std module is as follows:
module Std : sig = module Deferred : sig = type 'a t ... end val return : 'a -> 'a Deferred.t val bind : 'a Deferred.t -> ('a -> 'b Deferred.t) -> 'b Deferred.t val map : 'a Deferred.t -> ('a -> 'b) -> 'b Deferred.t val both : 'a Deferred.t -> 'b Deferred.t -> ('a * 'b) Deferred.t val peek : 'a Deferred.t -> 'a option module List : sig val map : 'a list -> ('a -> 'b Deferred.t) -> 'b list Deferred.t val iter : 'a list -> ('a -> unit Deferred.t) -> unit Deferred.t val fold : 'a list -> 'b -> ('b -> 'a -> 'b Deferred.t) -> 'b Deferred.t val filter : 'a list -> ('a -> bool Deferred.t) -> 'a list Deferred.t val find : 'a list -> ('a -> bool Deferred.t) -> 'a option Deferred.t ... end ... endThe type
Deferred.trepresents a deferred computation. Initially the value encapsulated within a deferred computation will typically not be available. Such a deferred is called indeterminate. However when the becomes determined, it can be accessed and used by the rest of the computation just like an ordinary value.
As an example to warm up, consider the following program, which
defines an internal function
f that prints out an integer
and then returns a deferred unit.
open Async.Std let prog () = let f i = printf "Value %d\n" i; return () in Deferred.both (Deferred.List.iter [1;2;3;4;5] f) (Deferred.List.iter [1;2;3;4;5] f) ignore (Scheduler.go ())The function Deferred.List.iter iterates a function that produces a deferred value and combines the resulting list of deferred units into a single deferred unit. Similarly, the both function combines a pair of deferreds into a single deferred pair.
When read sequentially, this program would simply print the integers from 1 to 5 twice. However, when read concurrently, as in async, the calls to printf can be interleaved. For example:
# prog ();; - : (unit * unit) Deferred.t =The cause of this behavior is that the deferred values are executed concurrently, as determined by the scheduler. Hence, the values printed to the console may appear in a different order than would be specified using the normal sequential control flow of the program.
Value 1 Value 1 Value 2 Value 2 Value 3 Value 3 Value 4 Value 4 Value 5 Value 5
The simplest way to create a deferred computation is to use the return function:
let d = return 42;; val d : int Deferred.t =It produces a deferred value that is determined immediately, as can be verified using the peek function:
Deferred.peek d;; - : int option = Some 42
A more interesting way to create a deferred computation is to combine two smaller deferred computations sequentially. The bind operator, written infix as >>= takes the result of one deferred computation and feeds it to a function that produces another deferred computation:
let d = return 42 >>= fun n -> return (n,3110) val d : int * int Deferred.t =Operationally, execution of this expression proceeds as follows: when the first computation becomes determined, the value is supplied to the function, which then produces another deferred computation. The overall computation is determined when this second deferred is determined. The idiom used in the above code snippet can be used as the implementation of the both function described previously:
let both (d1:'a Deferred.t) (d2:'b Deferred.t) : ('a * 'b) Deferred.t = d1 >>= fun v1 -> d2 >>= fun v2 -> return (v1,v2)This function waits until d1 is determined and passes the resulting value to v1 the first function, waits until d2 is determined and passes the resulting value to v2 the second function, which returns the pair (v1,v2) in a new deferred computation.
A more interesting example of composing deferreds arises with programs that read and write from the file system. I/O is a particularly good match for concurrent programming using deferreds, because I/O operations can often block, depending on the behavior of the operating system and underlying devices. For example, a read may block waiting for the disk to become available, or for the disk controller to move the read head to the appropriate place on the physical disk itself. The async library includes variants of the standard functions for opening and manipulating files. For example, here is a small snippet of the Reader module:
module Read_result : sig = type 'a t = [ `Eof | `Ok of 'a ] ... end module Reader : sig = val open_file : -> string -> t Deferred.t val read_line : t -> string Read_result.t Import.Deferred.t ... endThe type Read_result.t is known as polymorphic variant, and uses some new notation. For the purposes of this lecture, it can be treated as an ordinary datatype (whose constructors are prefixed with the backtick symbol, "`").
Using these functions, we can write a function that reads in the contents of a file:
let file_contents (fn:string) : string Deferred.t = let rec loop (r:Reader.t) (acc:string) : string Deferred.t = Reader.read_line r >>= fun res -> match res with | `Eof -> return acc | `Ok s -> loop r (acc ^ s) in Reader.open_file fn >>= fun r -> loop r ""Note that each I/O operation is encapsulated in a deferred computation, so the async scheduler is free to interleave them with other computations that might be executing concurrently—e.g., another deferred computation also performing I/O.
Going a step further, we can write a function that computes the number of characters in a file:
let file_length (fn:string) : int Deferred.t = contents fn >>= fun s -> return (String.length s)This pattern of sequencing a deferred computation with a computation that consumes the value and immediately returns a value is so common, that the async library includes a primitive for implementing it directly:
val map : 'a Deferred.t -> ('a -> 'b) -> 'b Deferred.tThe map function can be written infix as >>|. Hence, the above function could be written more succinctly as:
let file_length (fn:string) : int Deferred.t = contents fn >>| String.lengthNote the use of partial application in String.length.
In the next few lectures, we will see further examples of creating and programming with deferred computations using async.