BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1285
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Higher-Order Concurrency
AUTHOR:: Reppy, John H.
DATE::  June 1992
PAGES:: 195
COPYRIGHT:: John Hamilton Reppy - All Rights Reserved
ABSTRACT::
Concurrent programming is a useful technique for structuring many important 
classes of applications such as interactive systems. This dissertation 
presents an approach to concurrent language design that provides a new form of 
linguistic support for constructing concurrent applications. This new approach 
treats synchronous operations as first-class values in a way that is analogous 
to the treatment of functions as first-class values in languages such as ML. 
The mechanism is set in the framework of the language Concurrent ML (CML), 
which is a concurrent extension of Standard ML. CML has a domain of 
first-class values, called events, that represent synchronous operations. 
Synchronous message passing operations are provided as the base-event values, 
and combinators are provided for constructing more complex events from other 
event values. This mechanism allows programmers to define new synchronization 
and communication abstractions that are first-class citizens, which gives 
programmers the flexibility to tailor their concurrency abstractions to their 
applications.

The dissertation is organized into three technical parts. The first part 
describes the design and rationale of CML and shows how first-class 
synchronous operations can be used to implement many of the communication 
mechanisms found in other concurrent languages. The second part presents the 
formal operational semantics of first-class synchronous operations and proves 
that the polymorphic type system used by CML is sound. The third part 
addresses practical issues. It describes the use of CML in non-trivial 
applications, describes the implementation and performance of CML on a 
single-processor computer, and discusses issues related to the use and 
implementation of CML on a shared-memory multiprocessor.
END:: CORNELLCS//TR92-1285
BODY::
Higher-Order Concurrency
John Hamilton Reppy
Ph.D Thesis
92-1285
June 1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
HIGHER-ORDER CONCURRENCY
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial FulfIllment of the Requirements for the Degree of
Doctor of Philosophy
by
John Hamilton Reppy
January 1992
COPYRIGHT Oc Joim Haini1to? Reppy 1992
ALL RIGHTS RESERVED
HIGHER-ORDER CONCURRENCY
John Hamilton Reppy, Ph.D.
Cornell University 1992
Concurrent programming is a useful technique for structuring many important classes of ap-
plications such as interactive systems. This dissertation presents an approach to concurrent
language design that provides a new form of linguistic support for constructing concurrent
applications. This new approach treats synchronous operations as first-class values in a way
that is analogous to the treatment of functions as first-class values in languages such as ML.
The mechanism is set in the framework of the language Concurrent ML (CML), which
is a concurrent extension of Standard ML. CML has a domain of first-class values, called
events, that represent synchronous operations. Synchronous message passing operations are
provided as the base-event values, and combinators are provided for constructing more com-
plex events from other event values. This mechanism allows programmers to define new
synchronization and communication abstractions that are first-class citizens, which gives
programmers the flexibility to tailor their concurrency abstractions to their applications.
The dissertation is organized into three technical parts. The first part describes the
design and rationale of CML and shows how first-class synchronous operations can be used
to implement many of the communication mechanisms found in other concurrent languages.
The second part presents the formal operational semantics of first-class synchronous oper-
ations and proves that the polymorphic type system used by CML is sound. The third
part addresses practical issues. It describes the use of CML in non-trivial applications,
describes the implementation and performance of CML on a single-processor computer,
and discusses issues related to the use and implementation of CML on a shared-memory
multiprocessor.
In memory of
David A. Reppy
1962--H1988
Acknowledgements
I would first, and foremost, like to thank my parents for all of their support and love during
my never-ending education.
My committee members, Tim Teitelbaum, Bard Bloom and Anil Nerode, provided me
with feedback on this research. I would especially like to thank Tim, my committee chair-
man, for teaching me CS1OO many years ago; it was this experience that set me on the road
to becoming a computer scientist. Robert Cooper, who was on my committee in all but
name also provided useful technical criticism. Emden Gansner, Lal George, Tim Griflin,
Dave MacQueen, and Cynthia McMillin helped with the proof reading of this dissertation.
I have benefited from a long collaboration with members of the Advanced Software
Department at AT&T Bell Laboratories. I would like to thank Dave Belanger, Chandra
Kintala, and David Korn, who provided support for my projects, and Emden Gansner and
Steve North, who were my principal collaborators. Many of the ideas in this dissertation
were born out of my work at Bell. I have also spent many hours discussing these, and other,
ideas with Andrew Appel, Dave MacQueen, Chet Murthy and Prakash Panangaden. And
my work on SML implementation has benefited from discussions with Bffi Aitken.
There are many other people who have helped me along the way. Pam and Cliff Surko
provided me with both encouragement and a home away from home. And I would like to
thank Bharathi and Chandra Kintala for their hospitality while I was working at Bell Labs.
I would like to thank Liz Maxwell for helping me with the difficult task of tracking my
advisor, and Jan Batzer, who runs interference with the Cornell bureaucracy for all of the
Computer Science graduate students.
The research presented in this dissertation was supported, in part, by the NSF and ONR
under NSF grant CCR-85-14862, and by the NSF under NSF grant CCR-89-18233.
111
iv
Contents
I Introduction
1			Introduction			3
1.1			The case for concurrency3
1.1.1			Interactive systems4
1.1.2			Distributed systems						5
1.2			Overview of this dissertation5
1.2.1			Design6
1.2.2			Theory6
1.2.3			Practice7
1.3			History7
2			An			Introduction to SML			9
2.1			Basics			. . 			. 			9
2.1.1			Basic values and expressions9
2.1.2			Tuples and records10
2.1.3			Punctions and polymorphism11
2.2			Datatypes and pattern matching			12
2.2.1			Datatypes12
2.2.2			Pattern matching13
2.2.3			Lists			14
2.2.4			Abstract types15
2.3			Imperative features16
2.3.1			References			16
2.3.2			Exceptions17
2.3.3			Continuations18
v
2.4 All example --H fuuctional queues
19
II			Design			21
3 Concurrent Programming Languages			23
3.1			Processes and threads			24
3.2			Shared-memory languages25
3.2.1			Low-level synchronization mechanisms25
3.2.2			Monitors			27
3.2.3			Shared-memory concurrency and ML27
3.3			Distributed-memory languages28
3.3.1			Asynchronous message passing . . 			29
3.3.2			Synchronous message passing			30
3.3.3			Asynchronous vs. synchronous message passing31
3.3.4			Request-reply message passing			32
3.3.5			Putures34
3.3.6			Message passing and ML34
3.4			Summary			35
4			First-class			Synchronous Operations			37
4.1			Basic concurrency primitives			38
4.2			Selective communication vs. abstraction39
4.3			First-class synchronous operations40
4.4			Other			synchronous operations42
4.5			Extending PML events43
4.5.1			Guards44
4.5.2			Abort actions 46
4.6			CML			summary47
4.6.1			Thread garbage collection49
4.6.2			Stream 1/0			49
5			Building Concurrency Abstractions			51
5.1			Buffered channels . . . .			51
5.2			Multicast channels. . .			53
vi
5.3			Condition v&iables . . . 			54
5.4			Ada-style rendezvous57
5.5			Futures62
III			Theory			65
6			Theory Preliminaries			67
6.1			Notation67
6.2			Formal semantics68
6.2.1			Syntax of Av68
6.2.2			Dynamic semantics of Av69
6.2.3			Typing Av			71
6.2.4			Properties of typed Av74
7			The Operational Semantics of A?			77
7.1			Syntax . 77
7.1.1 Syntactic sugar79
7.2			Dynamic semantics			79
7.2.1			Sequential evaluation79
7.2.2			Event matching81
7.2.3			Concurrent evaluation			83
7.3			Traces84
7.4			Fairness86
7.5			Extending A??			87
7.5.1			Recursion88
7.5.2			References88
7.5.3			Exceptions			89
7.5.4			Process join92
7.5.5			Polling93
8 Typing A?			95
8.1 Static			semantics97
8.1.1			Expression typing rules			99
8.1.2			Process typings			102
vii
8.2			Type soundness . . 102
8.2.1			The Substitution and Replacement lemmas . 103
8.2.2			Subject reduction104
8.2.3			Stuck expressions			108
8.2.4			Soundness108
Iv Practice
111
9 Applications			113
9.1			eXene: A multi-threaded X window system toolkit			113
9.1.1			An overview of eXene114
9.1.2			An X window system overview114
9.1.3			The architecture of eXene			114
9.1.4			Promises in eXene118
9.1.5			Menus120
9.2			Interactive applications			121
9.3			Distributed systems progranuning123
9.3.1			Distributed ML123
9.4			Other			applications of CML			124
10 Implementation			125
10.1			The implementation of SML/NJ			125
10.1.1			First-class continuations126
10.1.2			The compiler126
10.1.3			The run-time system			127
10.2			Implementing threads129
10.2.1			Threads129
10.2.2			Preemptive scheduling			130
10.3			Implementing channels130
10.4			Impku??nting events132
10.4.1			Event value representation133
10.4.2			Synchronization . . 			133
10.4.3			Base-event constructors137
Vili
10.4.4			Event combinators139
10.5			Implementing 1/0141
10.5.1			Low-level 1/0 support			. 						141
10.5.2			Stream 1/0142
10.6			Implementation improvements144
11 Performance			147
11.1			The benchmarks147
11.1.1			Timing results			148
11.1.2			Instruction counts149
11.2			Analysis150
11.2.1			Garbage collection overhead150
11.3			Comparison with the ?System150
12 Multiprocessors			153
12.1			Parallel programming in CML154
12.1.1			Pipelining and data-flow154
12.1.2			Controllingparallellsm			. . 			155
12.1.3			Speculative parallelism157
12.1.4			I-structures			159
12.1.5			M-structures159
12.2			Multiprocessor implementation161
12.2.1			Concnrrency control			161
12.2.2			Generalized selective communication161
12.2.3			Thread scheduling162
12.2.4			Memory management163
12.3			The outlook for multiprocessor CML			.			164
v Conclusion
165
13 Future Work			167
13.1			Design167
13.2			Theory						168
13.3			Practice168
ix
14 Conclusion
Bibliography
171
173
Appendix			183
A Proofs from Chapter 8			185
Proofs from Chapter 8			185
ProofofLemma8.5185
ProofofLemma8.8191
ProofofLemma8.12 . . .			192
x
List of Tables
2.1			SML ground types10
4.1			Relating first-class functions and events			42
11.1			Benchm&k machines			147
11.2			CML benchmarks149
11.3			MIPS instruction counts149
11.4			Cost of abstraction150
11.5			iSystembenchinarks151
xi
nj
List of Figures
2.1			A queue implementation19
3.1			Asynchronous message passing			. . 			. . 						29
3.2			Rendezvous30
3.3			Request-reply rendezvous32
4.1			Basic concurrency primitives38
4.2			Basic event operations			41
4.3			Other primitive synchronous operations43
4.4			CML concurrency operations47
5.1			CML implementation of buffered channels			52
5.2			CML implementation of multicast channels55
5.3			CML implementation of condition variables56
5.4			CML implementation of Ada rendezvous			57
5.5			A lock manager using conditional accept59
5.6			CML implementation of conditional entry abstraction60
5.7			CML implementation of futures 63
6.1			Type inference rules for Av
73
7.1			Grammar for A?? 			78
7.2			Rules for event matching82
7.3			liriplementing references89
7.4			Implementing references without recursion			89
7.5			Rules for matching events in process sets93
8.1			Core type inference rules for Acv100
xiii
8.2			Other type inference rules for A??101
9.1			The display message-passing architecture115
9.2			The screen message-passing architecture			116
9.3			The top-level window message-passing architecture117
9.4			The Copylrea operation118
9.5			Synchronous text scrolling			119
9.6			The implementation of copyArea120
9.7			Asynclironous text scrolling121
9.8			Graph-o-matica screen dump122
10.1			The representation of channels131
10.2			The implementation of send131
10.3			The representation of event values			134
10.4			The representation of event status135
10.5			Event logging136
10.6			The implementation of always						137
10.7			The implementation of transmit138
10.8			Low-level 1/0 support143
12.1			Work crew job decomposition156
12.2			CML implementation of M-structure variables160
xiv
Part I
Introduction
Chapter 1
Introduction
Abstraction is perhaps the most important tool that programmers have for managing the
complexity of software design and implementation. There are various language mechanisms
for promoting abstraction, such as procedures for hiding the details of computation, abstract
data-typ?? ft, hiding r?pre??itation inforiiiatiuii, and modules for grouping related types
and operations with an abstract interface. This dissertation describes a new language
mechanism for supporting abstraction of communication and synchronization in concurrent
programs. My approach is to treat synchronous operations as first-class values in a way that
is analogous to the treatment of functions as first-class values in languages such as ML.
By doing so, a small collection of primitive operations and combinators can support a wide
range of different concurrency paradigms. I call this style of programming "higher-order
concurrent programming," as an analogy with higher-order programming in languages such
as ML.
This work is set in the context of Standard ML (SML) [MTH9O]. I have developed
a language, called Concurrent ML (CML), that extends SML with multiple threads
of control and first-class synchronous operations. CML is implemented on top of the
Standard ML of New Jersey (SML/NJ) system [AM87, AM9l]. while the discussion
of this dissertation uses CML as the archetype, the language design principles are easily
applied to other higher-order languages (e.g., Quest [Car89]), and should also be applicable
to object-oriented languages such as Modula-3.
1.1 The case for concurrency
Concurrency is often touted as a source of improved performance and rightly so, but it is a
subtext of this dissertation that concurrency is an important programming tool independent
of the performance benefits from multiprocessing. Certain classes of applications, most
3
notably interactive applications, are naturally structured as concurrent programs. The
language design presented in this dissertation is motivated by the need to support the
programming of these applications.
Before going any further, it is useful to define a nomenclature. I distinguish between
parallel and concvrrent languages by whether they provide implicit or explicit concurrency.
For example, the futures found in some dialects of Lisp are a parallel language feature,
since they only specify the possibility of concurrent computation. Because I am interested in
programming systems with explicit concurrency, the focus of this dissertation is on providing
support for concurrent programming, and not on parallel programming.1
In the remainder of this section, I examine two important classes of applications that
benefit from the use of concurrent programming. These applications share the property that
flexibility in the scheduling of computation is required. Whereas sequential languages force
a total order on computation, concurrent languages permit a partial order, which provides
the needed flexibility.
1.1.1 Interactive systems
Providing a better foundation for programming interactive systems, such as programming
environments, was the original motivation for this line of research [RG86]. Because of
their naturally concurrent structure, interactive systems are one of the most important
application areas for CML. Concurrency arises in several ways in interactive systems:
User interaction. Handling user input is the most complex aspect of an interactive pro-
gram. Most interactive systems use an event-loop and call-back functions. The event-
loop receives input events (e.g., mouse clicks) and passes them to the appropriate
event-handler. This structure is a poor-man's concurrency: the event-handiers are
coroutines and the event-loop is the scheduler.
Multiple services. For example, consider a document preparation system that provides
both editing and formatting. These two services are independent and can be naturally
organized as two separate threads. Threads also provide easy replication of services;
if the user opens a new document for editing, then the system just spawns a new edit
thread. Multiple views of the same document can also be supported by replicating
threads.
Interleaving computation. A user of a document preparation system may want to edit
one part of a document while another part is being formatted. Since formatting may
?I do e?&nine some of the issues related to a multiprocessor implementation of CML in Chapter 12.
4
take a significant amount of time, providing a responsive interface requires interleav-
ing formatting and editing. If the editor and formatter are separate threads, then
interleaving comes for free.
Output-driven applications. Most windowing toolkits (e.g., Xlib [Nye90b]) provide an
input-driven model, in which the application code is occasionally called in response
to some external event. But many applications are output driven. Consider, for
example, a computationally intensive simulation that maintains a graphical display
of its current state. This application must monitor window events, such as refresh
and resize notifications, so that it can redraw itself when necessary. In a sequential
implementation, the handling of these events must be postponed until the simulation
is ready to update the displayed information. By separating the display code and
simulation code into separate threads, the handling of asynchronous redrawing is
easy.
The root cause of these forms of concurrency is computer-human interaction: humans are
asynchronous and slow.
While the use of heavy-weight operating-system processes provides some support for
multiple services and interleaved computation, it does not address the other two sources
of concurrency. Likewise, while event-loops and call-back functions provide flexibility in
reacting to user input, they bias the application towards an input-driven model and do not
provide much support for interleaved computation. A concurrent language, on the other
hand, addresses all of these concerns.
1.1.2 Distributed systems
Another application area in which concurrent programming is useful is distributed systems.
In fact, many existing distributed programming languages and toolkits provide support for
concurrent programming (e.g., Argus [LS83], Isis [BCJ+90], and SR [AOCE88]). Con-
currency is needed because interaction with remote processes is slow and naturally asyn-
chronous. Threads provide a useful abstraction for managing outstanding interactions and
for reacting to new requests dynamically [LHG86].
1.2 Overview of this dissertation
I believe that there are three important aspects to good language design. First, there should
be a real problem that needs solving, and a design that solves it. Second, there should be
a firm theoretical foundation for the design. And third, the feasibility and usefulness of the
5
design should be demonstrated in practice. The organization of this dissertation reflects
this philosophy. It is divided into five parts: introduction, design, theory, practice, and
conclusion, with the middle three parts addressing the above aspects.
The design part presents the rationale and design of my concurrency mechanism; the
theory part provides a formal understanding of the mechanism; and the practice part ad-
dresses the issues of feasibility and usefulness of the mechanism. The other two parts are
less technical: the introduction part includes this chapter and an introduction to SML,
which may be skipped by the reader who is familiar with ML notation; the conclusion part
describes areas for future research and summarizes the results of my research.
1.2.1 Design
The design part starts off with Chapter 3, which surveys existing approaches to concur-
rent language design. Chapter 4 is the heart of the dissertation; it provides the rationale
for first-class synchronous operations and introduces them in the context of CML. The
following chapter provides several substantial examples of the use of first-class synchroniza-
tions to bulld communication and synchronization abstractions, including several found in
other concurrent languages. This part of the dissertation is fairly self contained, although
familiarity with SML syntax is useful.
1.2.2 Theory
SML has set a precedent of both being a practical language with real implementations
and of having a detailed formal semantics. I have developed a dynamic semantics for a
small language, called A? that models the concurrency features of CML ?ep91b]. This
dissertation extends the work of [Rep9lb], by presenting a static semantics for A?? and
proving that it is sound with respect to the dynamic semantics.
Following a brief summary of basic notation, Chapter 6 illustrates the style of formal
semantics using a more familiar sequential language, which is a sequential subset of A?.
Chapter 7 presents the syntax and operational semantics of A??. The main results of this
part are in Chapter 8, where I present a polymorphic type system for A?? and show that
it is sound with respect to the dynamic semantics of Chapter 7. This result is important,
since the implementation of CML uses features of SML/NJ that are not type-safe. To my
knowledge, this is the first proof of the soundness for a polymorphic typing of concurrency
primitives.
6
1.2.3 Practice
In the final analysis, the true worth of a language design can only be determined "in-the-
field." Questions about the usefulness and practicality of language features can only be
answered by actual experience. I have developed and distributed an implementation of
CML for single processor computers2 [Rep9Ob], which has been used by myself and others
to implement several non-trivial applications. This experience demonstrates that CML is
a useful programming language and that it can have efficient implementations.
Chapter 9 describes the use of CML to construct a multi-threaded X window system
toolkit, called eXene [GR9l], and its use to build interactive applications on top of eXene.
I also briefly discuss the application of CML to the programming of distributed systems, and
applications of CML by other researchers. In Chapter 10, I describe the implementation
of CML in detail and describe some possible implementation improvements. Chapter 11
presents the results ofmicro-benchmarks that demonstrate the efficiency of CML (including
a head-to-head comparison with a C thread library). These data support the conclusion that
CML is competitive with thread libraries implemented in lower-level languages. Finally,
Chapter 12 discusses the use of CML for parallel programming, possible extensions to better
support parallel programming, and sketches the design of a shared-memory multiprocessor
implementation of CML.
1.3 History
The ideas in this dissertation have been evolving for several years and there have been
several instantiations of them in language designs. I first developed this approach in the
context of PML [Rep88], an ML-like language used in the Pegasus system at AT&T Beli
Laboratories [RG86, GR92]. I reimplemented the concurrency primitives of PML on top
of SML/NJ at Corneli Unlversity [Rep89]. This implementation evolved into the current
version of CML [Rep91a], which is described in this dissertation.
2The first version was released in November 1990.
7
8
Chapter 2
An Introduction to SML
While the ideas presented in this dissertation are largely language independent, they have
been developed in the context of Standard ML (SML). I use SML both as the sequential
core of my language design and as the implementation language. This chapter provides an
introduction to SML that should allow the reader to follow the examples; for a more
complete introduction see [Har86] or [Pau91]. The formal definition of SML can be found
in [MTH9o, MT91].
In the remalnder of this chapter, 1 first introduce the basic features of SML; then I
describe the datatype and pattern matching mechanisms; I follow this by a discussion of
the imperative features of SML; and finally I present a complete example.
2.1 Basics
SML is an expression language: the traditional statement constructs, such as blocks, con-
ditionals, case statements, assignment, etc., are packaged as expressions. Every expression
has a statically determined type and will ouly evaluate to values of that type (this is called
type soundness). Computation in SML is value oriented. Because of the central role of
values, there is a much larger range of values than found in more conventional languages.
2.1.1 Basic values and expressions
SML provides a fairly standard collection of ground types and values, which are summarized
in Table 2.1. The type unit, which has exactly one value (written ()), is often used as
the result type of functions that are executed for their side effects. Negative numbers are
denoted using using a leading tilde, which is also the unary negation operator.
In imperative languages, such as C, assigament is the principal mechanism used to as-
9
Table 2.1: SML ground types
Type			Sample literal values
unit			()
bool			true, ?alse
int			...,?2,?I, 0,1,2,...
string "abc11, 11hello world!\n"
real			I.0,I.OE6
sociate values with variables. While SML does provide updatable cells (see Section 2.3.1),
it uses binding as its principal mechanism for associating values with variables. lii SML,
variables are used to name values, and are immutable (this is sometimes called single as-
signment). For example, the binding
val x = I and y = `?I'm a string"
establlshes bindings for x and y. The static environment produced by this binding assigns
the type int to x and string to y (the type iMormation is inferred by the compiler). This
static environment can be summarized by the following specification:
val x : int
val y : string
The notation of specifications, which comes from the signatures in the module system, is a
natural and concise way to describe a set of bindings.
2.1.2 Tuples and records
In addition to these ground types and values, SML provides tuples and records. For
example, the expression (I, true) is a pair of the value I and true, and has the product
type int * bool. Records are labeled tuples. For example, PI might be defined to be the
point (1,2) by
val pi = +1 = I, y =
in which case pI has the type ?x : int, y : int>. Note that the order in which labeled
fields appear is insignificant, so that
val p2 =			= 2, x =
defines the same point as PI. A field labeled 1 of a record can be selected using the notation
#L The example in Section 2.4 further illustrates the use of labeled records.
10
2.1.3 Punctions and polymorphism
Functions play a key role in SML. Functions are declared using the leading keyword fun;
for example, the factorial function can be defined as:
fun fact n = if (11 = 0) then 1 else n * fact(n-1)
which has the specification:
val fact : int -> int
Tail recursion plays the role of looping in SML.1 For example, the iterative form of the
factorial function is written as a tail recursive function:
fun fact n = let
fun loop (i, result) = if (i = 0)
then result
else loop(i--H1, i*result)
in
loop (n, 1)
end
This example also introduces the let-expression, which is used for defining local variables
(the function loop in this case). Note that instead of destructive updates to loop variables,
the new values are passed to the next invocation of iterFact; each iteration has its own
copies of i and result.
The SML compiler uses type inference to determine the types of expressions. In the
case of functions, this can often be a family of types. For example, consider the identity
function:
fun identity x = x
The meaning of this function is independent of its argument type. It can be viewed as
a function on integers, or strings, or reals, or pairs of integers, etc. Thus, it has the
polymorphic type Va.(a H a), where a is a type variable ranging over all types. In SML,
type variables are denoted by a leading apostrophe. For example, the value identity has
the specification:
val identity : `a -> `a
1Thereisa while expression, but it isjust syntactic sugar for the application ofatailrecursive function.
11
where the V is implicit. The SML compiler always infers the most general type for a given
expression.
SML is a higher-order language, which means that functions are first-class values; they
can be passed as arguments, embedded in data structures and returned as results. A simple
example is i'iflx function composition, which is defined in SML as:
fun 0 (f, g) = fn x => (f (g x))
infix 0
The form "fn x => ..." is the way that function values are written in SML (for those
familiar with the A-calculus, fn can be read as A). The second line declares 0 to be an
infix operator. An illfix operator can be used as a normal identifier by prefixing it with the
keyword op (e.g., op +). Function composition can also be defined using a cnrned form:
full 0 (f, g) I = (f (g x))
These two declarations of composition are equivalent, and have the specification:
val 0 : (`a -> `b) * (`c -> `a) -> `c -> `b
As an example of its use, the expression
(fn x => (x*x)) 0 (ffl I => (x--HI))
evaluates to a function that computes (x --H 1)2.
2.2 Datatypes and pattern matching
In addition to the basic values, SML provides recursive data structures and abstract types.
Structured values are decomposed using a powerful pattern matching notation.
2.2.1 Datatypes
The datatype declaration introduces a new, possibly recursive, type. For example, the
representation of integer binary trees can be defined as:
datatype int?tree
= Empty
I Leaf of int
Node of (int * int?tree * int_tree)
12
This declaration says that a tree is either empty, a leaf consisting of an integer value, or a
node consisting of an integer and two sub-trees. The identiflers Empty, Leaf and Node are
called constructors, and are used to construct tree values. Datatype declarations can be
parameterized to define type constructors. For example, we can define a family of binary
tree types by the definition:
datatype `a tree
- Empty
I Leaf of `a
I Node of (`a * `a tree * `a tree)
hi addition to user defined datatypes, SML has a few predefined datatypes. The type
bool is actually defined as
datatype bool = true I false
Another important datatype that is predeflned by SML/NJ is the polymorphic option
type:
datatype `a option = NONE I SOME of `a
There is also a list type, which is discussed below.
2.2.2 Pattern matching
The power of the datatype declaration mechanism is enhanced by pattern matching. Pattern
matching is a mechanism for control-flow, value decomposition, and binding. For example,
the boolean negation function can be deflued using two clauses:
fun not true = false
I not _ = true
The flrst clause says that if the argument is true, then return false. The "?" in the second
clause is a wildcard, which matches anything (in this case, false is the only possibility).
Pattern matching is the standard binding and value decomposition mechanism in SML,
and we have already seen some examples of it. For example, the definition of function
composition
fun 0 (f, g) x = f (g x)
has a tuple pattern as its flrst argument, which binds f to the flrst element of the pair and
g to the second. A more interesting example is a function to compute the height of a binary
tree:
13
fun height Empty = 0
I height (Leaf ) = 1
I height (Node(?, ti, t2)) = max(height ti, height t2) + 1
In the third clause, the variables ti and t2 are bound to the subtrees. Unlike pattern
matching ill Prolog, SML patterns are linear (i.e., a variable can occur at most once ill a
pattern).
Pattern matching can be used to extract values from records. For example, the following
function swaps the ? and ? coordinates of a point:
datatype point = PT of ?x : int, y : int?
fun swap (PT?i=x1, y=yl?) = PT?x=y1, y=xl?
The pattern binds xl to the x field and yI to the y field. There are two shorthand forms
for pattern matching records, both of which are illustrated in the following example:
fun xCoord (PT?x, .. .?) = x
Here the field name x is being used as shorthand for "x=x," and the .... ." is in lleu of the
rest of the fields.
Pattern matching can also be used against llterals. For example, the recursive factorial
function can be coded as follows:
fun fact 0 = 1
fact n = u * fact(n-l)
In addition to equational defihitions of functions, pattern matching is used in a general form
of a case expression.
2.2.3 Lists
One of the most important recursive types is the polymorphic list type, which is defined as
datatype `a list = nil :: of (`a * `a list)
infix 5 ::
The datatype declaration defines a list to be either empty (nil), or the cons of an element
and alist. The infix declaration specifies that the cons operator (: :) is a right associative
infix operator with precedence level 5. Because of the importance of lists, SML provides
special syntax for list patterns and expressions. The syntax
(ei, e2, ...,
14
is syntactic sugar for
e1::e2:::nil
and likewise for patterns. The following function, which inserts delimiters between adjacent
list elements, is an example of the use of this notation:
fun insertDelim delim 1 = let
fun insert E] =
I insert (5 as Li) = 5
I insert (x::r) = x :: delim :: (insert r)
in
insert 1
end
The second clause of this function illustrates the as pattern form, which, in this case, binds
s to the single element list matched by "E?]."
There are a number of standard list functions that are provided by SML/NJ, and used
in this dissertation. These are:
val length			`a list -> int
val map			(`a -> `b) -> `a list			-> `b list
val app			(`a -> `b) -> `a list			-> unit
val rev			: `a list -> `a list
The function length returns the length of a list; map applies a function to a list, returning
the list of results; app applies a function to a list, discarding the results; and rev reverses
a list.
2.2.4 Abstract types
The abstype declarative form is a variation on the datatype declaration that limits the
visibility of the type's representation. The time-honored example of an abstract datatype
is the stack:
abstype `a stack = STK of `a list
with
val empty = sTKE]
fun push (x, STK s) = STK(x::s)
fun pop (STK(x::r)) = (x, STK r)
end
The representation of a stack is only visible in between the with and end; outside the type
stack is abstract. I use the abstype mechanism in this dissertation in lieu of the SML
15
module facility, since it is easier to understand. A more elaborate example of abstract types
is given in Section 2.4.
In addition to the abstype declaration, the local declaration can be used to limit the
scope of declarations. For example, the stack could be declared as
local
datatype `a stack = STK of `a list
ill
type `a stack = `a stack
val empty = STh(]
fun push (x, STK s) = STK(x::s)
fun pop (STK(x::r)) = (x, STK r)
end
There are some technical differences between these two declarations, but they are beyond
the scope of this dissertation.
2.3 Imperative features
Mthough SML is mostly appllcative, it does have a small collection of imperative features.
The most important of these are references and exceptions; in addition, SML/NJ provides
first?dass continuations.
2.3.1 References
References are mutable heap cells. They are created by the function ref,2 updated using
=, and examined by the ! function. As an example, the following binds two functions that
share a common reference cell:
val (get, put) = let
val cell = ref 0
in
((fn () => cell), (fn x => cell :=
end
The reference operations have the following signature:
val ref : `_a -> `_a ref
val : `a ref -> `a
val := (`a ref * `a) -> unit
2The ref function is really a constructor and can be used in pattern matching, but that feature is not
used in this dissertation
16
The notation "`?a" in the type of ref means that it has an imperative type, which is
"less polymorphic" than a similar applicative (non-imperative) type. This is a teclmical
restriction that is required to prevent type loopholes. The full techuical details ofimperative
types is beyond the scope of this introduction; Chapter 8 has some discussion of imperative
types and Tofte describes them in great detail in [Tof88] and [Tof9Oj.
SML/NJ uses a more flexible scheme for typing polymorphic references, called weak
polymorphism. The idea is to assign a rank (or strength) to type variables. Roughiy, the
rank of a type variable is the number of abstractions that "protect" a reference value of
that type; normal type variables have rank 00. For example, the type of ref in SML/NJ
is
val ref : `Ia -> `la ref
where the integer in the type-variable name denotes its rank. Since CML is implemented
on top of SML/NJ, its interfaces are presented using weak types. The details of weak
polymorphism are not important to this dissertation; it is only necessary to recognize that
fuiictions with weak types are not fully polymorphic. The theoretical treatment (Part III),
however, uses the more standard imperative type system.
2.3.2 Exceptions
SML has an exception mechanism for signaling run-time errors and other exceptional con-
ditions. There are two aspects to the exception mechanism: the representation of exception
packets, and the control-flow of raising and handllng exceptions.
The built-in type exn is the type of exception packets, which are created using a special
kind of datatype constructors. The declaration
exception Foo and Bar of int
declares two new exception constructors (exception specifications use the same syntax).
Since exception packets are values of a datatype, the handiing of exceptions can use
the pattern matching mechanism to match exceptions. For example, the following is an
implementation of integer division that returns 0 when the divisor is 0:
fun safeDiv (a, b) = (a div b) handle Div => 0
An exception is raised using the raise expression. For example, the following function,
which computes the product of a llst of integers, uses the exception Zero to short-circuit
the evaluation if 0 is encountered:
17
fun product 1 = let
exceptiou Zero
fun loop (E], n) =
I loop (O::?, )
loop (i::r, n)
n
= raise Zero
= loop (r, i*n)
in
(loop (1, 1)) handle Zero => 0
end
Although this dissertation only uses monomorphic exceptions, it is possible to declare
polymorphic exception constructors. As with references, fully polymorphic exceptions result
in type loopholes; therefore exceptions can only be weakly polymorphic (or have imperative
types).
2.3.3 Continuations
SML/NJ provides first-class continuations as an extension, and I use them heavily in the
implementation of CML (see Chapter 10). A continuation is a function that represents
the "rest of the program" [Gor79]. The programming language Scheme [RC86] makes
continuations accessible to the programmer as first-class values.3 The Scheme function call-
with-cnrrent-continuahon (call/cc for short) calls a function with the current continuation
as the argument. First-class continuations are supported in SML/NJ via an abstract type
and two primitive fimctions [DHM91]:
type `a cont
val callcc : (`ia cont -> `Ia) -> `ta
val throw : `a cont -> `a -> `b
These can be used to implement loops, back-tracking, exceptions and various concurrency
mechanisms, such ascoroutines [Wan8O] and engines [DH89]. For example, the following is
a continuation-based version (from [DHM91]) of the product function given in the previous
section:
fun product 1 = callcc (
fn exit => let
fun loop (fl, n) = n
I loop (O::?, ) = throw exit 0
I loop (i::r, n) = loop (r, i*n)
in
loop (1, I)
end)
This function uses the continuation exit to short-circuit the evaluation if 0 is encountered.
3The idea dates back to Landin's J operator [Lan65], [FeI87b]
18
2.4			An example			functional queues
To wrap up this introduction to SML, consider
queue type. The signature of this abstraction is:
type `a queue
val empty : `a queue
val isEmpty : `a queue -> bool
val insert : `a * `a queue -> `a queue
exception EmptyQ
val remove : `a queue -> `a * `a queue
tne implementation ot an
abstract FIFO
The value empty is the empty queue; isEmpty returns true if its argument is the empty
queue; insert adds an item to the end of the queue; and remove removes the head of
the queue. The exception Emptyq is raised if remove is applied to an empty queue. This
abstraction is functional; i.e., instead of mutating a sh&ed queue object, the operations
insert and remove return new queue values as results.
The implementation of this abstraction is given in Figure 2.1. Internally, a queue is
abstype `a queue = Q of +front : `a list, rear : `a list+
with
val empty = Q?front = (], rear =
fun isEmpty (??front=fl, rear=E]1) = true
I isEmpty - = false
fun insert (x, q?front, rear?) = ??front = front, rear = x::rear?
exception EmptyQ
fun remove (q?front = E], rear = = raise Emptyq
I remove (Q?front = E], rear?) = remove(Q?front = rev rear, rear = fl?)
remove (Q?front = x::r, rear?) = (x, ?+front = r, rear = rear?)
end (* abstype *)
Figure 2.1: A queue implementation.
represented by the constructor q applied to a record of two fields: front and rear, which
are stacks (represented by lists). The insert operation pushes a value onto the rear stack,
and the remove operation pops a value from the front stack. In the case that the front
is empty, then remove pushes the elements of the rear stack onto the front in reverse
order.
19
20
Part II
Design
21
Chapter 3
Concurrent Programming
Languages
In order to understand the trade-offs ill language design, it is necessary to know the alter-
natives. In this chapter, 1 survey a representative collection of concurrency features and
languages.1 For the purpose of this dissertation, the most important language character-
istics are the synchronization and communication primitives. These can be divided into
two main classes: shared memory primitives and distribitted memory (or message-passing)
primitives. In this chapter, following a brief discussion of process creation mechanisms, I
focus on these two different classes of concurrent languages, and discuss the appropriateness
of the various design alternatives for adding concurrency to SML.
There are a number of good surveys of concurrent language design. A comparison of
different concurrency mechanisms using two example problems can be found in [BD80]. An-
drews and Schneider [AS83] survey a broad range of concurrency mechanisms; Wegner and
Smolka compare CSP, Ada and monitors in [WS83]; Andrews covers concurrent program-
ming using various different languages in [And91]. And a collection of significant reprints
of papers on concurrent languages and programming (including [AS83] and [WS83]) can be
found in [GM88].
As discussed in Chapter 1, this dissertation is about concurrent language design, there-
fore I do not survey parallel implementations of lazy languages (e.g., GAML [Mar91]),
parallel languages (e.g., Id [Nik9l]), or distributed languages (e.g., Argus [LS83], or SR
[AOCE88]). I also do not discuss concurrent logic-programming and concurrent constraint
languages [SR9O].
1There are literally hundreds of different concurrent programing languages, so a complete survey is
impossible.
23
3.1 Processes and threads
The specification and creation of processes in a concurrent programming language is usually,
although not always (see Section 3.3.5), orthogonal to the communication and synchroniza-
tion mechanis?j?. Pru???,, c?atiuii c?i be t???? satic, where the se? of processes is fixed
by the text of the program, or dynamic, where some mechanism is provided for creating
new processes on the fly. Each process in a concurrent program has an independent thread
of control, hence, the term thread is often used instead of process. This has the added
advantage of avoiding confusion with the other meanings of the word process. I favor the
term thread, except in the context of the formal semantics where process is the conventional
term.
An example of static process creation is the cobegin statement, which has the form2
COBEGIN stmt1 II stmt2 II ... I I stmt? COEND
This statement proceeds by executing the n statements in parallel and then synchionizing
on the completion of all of the statements. In a language with recursion this statement can
be used to create dynamic tree parallelism, but it is still limited in that the lifetimes of
processes are tied to their children's lifetimes.
Dynamic process creation usually involves a fork operation (sometimes called spawn),
which takes a statement (or procedure) as an argument and creates a new process to execute
it. The fork operation is often accompanied by a join operation, which allows the parent
to synchronize on the child's termination. Using fork and join, the cobegin construct from
above can be implemented as:
Pi			FORK stmt1
P2			FORK stmt2
FORK stmt?
JOIN Pi
JOIN P2
JOIN p?
Dynamic process creation allows the flexible use of processes. For example, a server
might want to create a new thread to handle each request. In a language with a static set
of processes, this requIres preallocating a pool of server threads and reusing them. This is
awkward and limits the number of simultaneous requests that can be handled, which can
2For most of this chapter, 1 use an Algol 60 style notation, since most of the languages I discuss have
roots in the Algol family of languages.
24
lead to unnecessary delays when handling requests [LHG86]. Writing concurrent programs
in a language with static process creation is similar to the problem of writing programs
with dynamic data structures in a language that only provides static memory allocation.
In conclusion, there does not seem to be any strong reason to use static process creation,
and many reasons in favor of dynamic creation.
3.2 Shared-memory languages
Shared-memory languages use mutable shared state (e.g., shared variables) to implement
process communication. The key problem in these languages is preventing processes from
interfering with each other. This problem can be characterized by the following classic
example:
x			1; COBEGIN x			x+1 Ii x			x+1 COEND
Without some guarantee of atomicity, the resulting value of x is undefined. The assignments
are examples of critical regions; that is, regions of code that are potential sources of interfer-
ence without proper concurrency control. Shared-memory languages are distinguished by
the mechanisms they use to provide synchionization and concurrency control. To illustrate
these, I use a unique ID service as a running example.
3.2.1 Low-level synchronization mechanisms
The most basic synchroulzation mechanism is the semaphore, which is a special integer
variable with two operations: P and V. Given a semaphore s, the execution of P(s) by a
process p forces it to delay until s > 0, at which point p executes s: = s--H1 and proceeds; the
test and update of s is done atomically. Execution of V(s) results in the atomic execution
of 8 : =8 + 1. Using semaphores, the unique ID service can be implemented as
VAR x : INTEGER := 0;
5 : SEMAPHORE;
PROCEDURE getuld () : INTEGER =
VAR result : INTEGER
BEGIN
P(s);
result			x; x			x+1;
V(s);
RETURN result
END
25
with the semaphore s being used to guarantee mutual exclusion on accesses of x. The
problem with semaphores is that there is no liuguistic support for their correct use. For
example, a programmer can easily forget to apply one of the operators, or might forget to
protect shared state. Furthermore, implementing patterns of synchrouization that are more
complicated than mutual exclusion can be tricky.
A restricted form of the semaphore is the mittex loc? (also called a binary semaphore),
which is a variable that can be in one of two states, either locked or unThcked. One of
the advantages of mutex locks is that they are naturally supported by the test-and-set
instruction found on many multiprocessors. The language Modula-3 [Nel9l] supports the
use of mutex locks with the special syntax.4
LOCK m DO statements END
This statement is executed by first acquiring the mutex lock m, then executing the state-
ments in the body, and then releasing the lock. ff an exception occurs during the execution
of the body, the lock is also released. In Modula-3, the unique ID service can be imple-
mented as:
VAR x : INTEGER := 0;
m : NUTEX := NEW(MUTEX);
PROCEDURE getuid () : INTEGER =
VAR result : INTEGER
BEGIN
LOCK m DO
result := x; x := x+1;
END;
RETURN result
END
Each call to getuld first acquires the mutex lock, executes the critical section, and then
releases the lock before returning.
Mutex locks are sufficient for insuring mutual exclusion in critical regions, but do not
provide a general synchionization mechanism. For example, consider producer and con-
sumer processes that share a fixed size buffer. If the buffer is empty, then the consumer
must walt for the producer to add something to it; likewise, if the buffer is full, the producer
must walt for the consumer to remove something. Using mutex locks, this requires polling
the buffer, which is inefficient. To alleviate this problem, Modula-3 provides condition vart-
ables, which allow processes to wait for specific conditions (e.g., the buffer is non-empty).
3Mute: is a contraction of mutua? exciusion.
4Modula-: inherits these primitives from Modula-2+ [RLW85].
26
Condition variables, in effect, reintroduce the counting power of general semaphores that
was lost when moving to mutex locks. The C-threads package built on top of the MACH
operating system also provides this style of concurrency support [CD88].
3.2.2 Monitors
A monitor is a module that encapsulates shared state, providing a set of exported proce-
dures for controlled access to the state [Hoa74]. Monitors provide a more structured form
Each monitor has an implicit mutex lock that is
of mutual exclusion than mutex locks.
acquired on entry and released on exit by every monitor procedure. This guarantees that
a monitor-procedure call is mutually exclusive with any other call. Using a monitor, the
unique ID server can be coded as follows:
MONITOR Uld Is
VAR x : INTEGER
PROCEDURE getuld ()
VAR result : INTEGER
BEGIN
result
RETURN
END
BEGIN
x 0
END
INTEGER =
x; I := x+1;
result
The extra syntactic support provided by monitors leaves less room for programmer error
than in the case of semaphores or mutex locks. As with mutex locks, condition variables
are used to avoid polling.
A number of languages, such as Concurrent Pascal [Bri77], Concurrent Euclid
[Hol83b], and Mesa [MMS79, LR80] provide monitors along with condition variables. It is
interesting to note that there is a trend in concurrent language design away from the syn-
tactic sugar of monitors and towards explicit mutex locks (e.g., from Mesa to Modula-2+
and Modula.3). This trend represents a simplification of language design, since it separates
two orthogonal language features (i.e, modules and mutual exclusion).
3.2.3 Shared-memory concurrency and ML
Shared-memory concurrent languages rely on mutable state for inter-process commuruca-
tion. This leads to an imperative programming style, which goes against the traditional,
mostly applicative, style of ML programs. For this reason, shared-memory primitives are
notationally unsuitable as a general purpose concurrency extension to ML (although they
27
are useful for low-level implemen?ation work). In contrast, as I show below, message passing
fits quite naturally with the ML programming style.
Cooper and Morrisett, at Carnegie-Mellon University, have developed a concurrency
package, called ML-threads, which provides threads, mutex locks and condition variables
[CM9o]. The design of ML-threads is owed to the C-threads package [CD88], which in
turn owes its design to [RLW85]. The goals and approach of their work are significantly
different from those of my research. For example, one of the principal applications of ML-
tbreads is the construction of low-level operating system services, which requires heavy
use of shared state [CHL9l]. ML-tbreads has also been used to implement a subset of
CML's primitives. There is also an implementation of ML-tbreads for the SGI 4D/38o
multiprocessor [Mor].
3.3 Distributed-memory languages
The other major class of concurrency primitives is distributed-memory (also called message
passing). The basic operations in message passing are "send a message" and "accept a
message," and are used for both communication and synchronization. Message-passing
languages are distinguished by the naming mechamsm for the communication medium and
the amount of synchronization involved in sending a message.
The naming mechanism must specify both ends of the communication (i.e., sender and
receiver). The simplest naming convention uses process names to designate the communi-
cation partner. A slightly more general scheme introduces multiple commnnication ports
associated with the receiver. This can be further generallzed by making port names into
independent values, called channels. Any process that has access to a channel may use
it to send or accept messages (a variant on this scheme differentiates between input and
output access). As with the process structure, the naming mechanism can be either static
or dynamic. Although static naming is common in a number of languages, it has severe
limitations. For example, it is impossible to write procedures parameterized by a sender or
receiver name. When adding message passing to ML the communication medium must be
strongly typed. This requirement means that the use of process names to name communi-
cations is too restrictive, since under such a scheme, each process can ouly receive messages
of one fixed type. Using ports or channels to name communications avoids this problem,
since each port (or channel) can have its own message type. Given the dynamic nature of
ML values, it seems that a dynamic port or channel creation mechanism is most suitable.
The message accept operation is usually blocking, but some languages and systems
provide a polling mechanism to check for incoming messages. There are three basic choices
28
for message sending semantics: non-blocking send (or asynchronous send), blocking send (or
synchronons send), and send-reply. The first two of these are unidirectional, while the last
is bidirectional. I discuss each of these below in increasing order of synchronization.
3.3.1 Asynchronous message passing
In asynchronous message passing the communication medium is buffered and the send
operation is non-blocking.5 Figure 3.1 gives a pictorial description of asynchronous commu-
nication between two processes P and Q. In this diagram, each process has a "time-line,"
send
P
accep?
Q
accept
send
accept
Figure 3.1: Asynchronous message passing
running down the page; a communication is represented by an arrow from the sender's
time-line to the accepter's time-line. Notice that, in this picture, P and Q have different
views of the order of events.
Actor languages are an example of programming languages based on asynchronous mes-
sage passing [Agh86]. Message passing in distributed systems is also usually asynclironous.
In fact, in systems with arbitrary message delays and failure it is not possible to distin-
?uish between the failur?			Lti			d a slow lii			d t?
of a UUliHUuiU?c			alL			-. ?A?, all ?llb byllt??ollous
communication is impossible [FLP85].
5Some systems use finite buffers, in which case the send operation wffl block if the buffer is full.
29
3.3.2 Synchronous message passing
Hoare 5 seminal paper [Hoa78] introduced the notion of a set of sequential processes running
in parallel and communicating by synchronous message passing. Hoare's language, called
CSP (for Communicating Seq?ential Processes), provides input operations, P?z (read a
value from process P and assign it to x), output operations, P!e (send the value of cx-
pression e to process P), and a labeled cobegin statement for process creation. Both the
process and communication structures in CSP programs are static, since there is no dy-
nanuc process creation and process names are used to name communications. If a process P
executes Q!v, it must block until process Q executes P?z (and vice versa). The matching of
communications is called rendezvons, and is illustrated in Figure 3.2. The dotted time-lines
P			Q			Q
Q!v:			:P?x
v			v
(a) P waits for Q
Q!v'
(b) Q waits for P
Figure 3.2: Rendezvous
in this figure represent idle periods while waiting for a matching communication.
One of the key ideas found in CSP is the notion of selective communication (also called
gitarded communication). In [Hoa78], selective communication is presented as a general-
ization of Dijkstra's guarded commands [Dii75], with input operations allowed as guards.
When an input guard is matched, its action may be chosen; if more than one input guard
is matched, then one is chosen nondeterministically. This mechanism provides the ability
for a process to communicate with multiple partners when the order of commurucations is
unknown. For example, a server process that has multiple clients may not know which client
wili send it the next request. Languages can provide polling as an alternative to selective
communication, but the use of polling can result in busy waiting and so should be avoided
[GC84].
A natural generallzation of the selective communication mechanism of CSP is to al-
low both input and output operations. This is called generalized (or symmetric) selective
30
communication. As a simple example of why this is useful, consider a system with three
processes, A, B and C, where A is supposed to send a message to both B and C. With-
out generalized select, A must a priori choose which process to send the message to first.
If B sends a message to C before accepting a message from A and C is waiting for the
message from A before accepting a message from B, then A must send to C first to avoid
deadlock. In other words, the implementation of A depends on the communication patterns
of B and C. This example illustrates that the lack of generalized selective communication
has a negative impact on program modularity.6 Other arguments for the usefulness of se-
lective communication can be found in [Hoa78], [FY85], [Rep91a], and Section 5.1. The
only significant argument against generalized selective communication is the difficulty of
implementing it on multiprocessor machines [KS79] (Section 12.2.2 discusses this problem
in more detail).
The language occam [1NM84, Bur88] is derived from CSP, but includes channels and
a Innited form of dynamic process creation. And the higher-order language Amber [Car86]
provides generalized selective communication on typed channels, as well as dynamic process
and channel creation. Other languages that owe an intellectual debt to CSP include Joyce
[Bri89] and Pascal-m [AB86]. A pared down version of CSP, called TCSP, has been used
for theoretical study of concurrent systems [Hoa85].
3.3.3 Asynchronous vs. synchronous message passing
At first glance, asynchronous communicationmay seem to be the best choice for a distributed-
memory concurrent language, since it minimizes interprocess synchronization and does not
restrict parallelism (e.g., in Figure 3.2(a), P must wait for Q). But if the language has
dynamic thread creation, then it is possible to efficiently implement an asynchronous chan-
nel by using a thread to buffer communication (cf., Section 5.1). The big problem with
asynchronous communication is that the sender has no way of knowing when a message has
actually been received; introducing acknowledgement messages loses the parallelism that
was the main benefit of asynchronous communication. In synchronous message passing, the
sender and receiver have common knowledge of the message transmission (e.g., the sender
knows that the receiver knows that the sender knows that the message was accepted). This
property makes synchronous message passing easier to reason about [AS83].
This is also reflected in the typical failure modes of erroneous programs. In asynchronous
systems, the typical failure mode is an overflow of the memory used to buffer messages, which
is likely to be far removed in time (and possibly place) from the source of the problem. In
a language with dynamic thread creation, this example could be programmed by A forking two
threads to send the messages, but there are other examples where dynamic process creation is not sufficient.
31
synchronous systems, the typical failure mode is deadlock, which is immediate and easily
detected. Thus, detecting and fixing bugs is easier in a synchronous system.7
Using asynchronous message passing also increases the likelihood of timing sensitiv-
ity and race conditions. In a producer-consumer protocol, for example, if the producer is
faster than the consumer, then the number of buffered messages can grow arbitrarily. ff
the buffer is flnite, the system eventually degrades to a synchronous system; while, if the
buffers are unbounded, memory overflow may occur. This means that additional acknowl-
edgment messages must be used, which reduces the efficiency gains from using asynchronous
communication.
3.3.4 Request-reply message passing
A procedure call style interaction, called remote procedure call [Nel8I], can be implemented
using asynchronous or synchronous message passing. The procedure entry corresponds to a
request message from the client to the server and the procedure return corresponds to the
reply message from the server to the client. Figure 3.3 shows the timing diagrams for this
mechanism (assuming synchronous message passing). While the server is handling a call it
Client
call:
request
Server			Client
--H--H accep?
repLy
(a) The client waits
request
Server
caiJ --H			accep?
reply
(b) The server waits
Figure 3.3: Request-reply rendezvous
cannot accept other requests; thus, calls are necessarily mutually exclusive. Some languages,
such as Ada [DoD83] and Concurrent C [GR86], as weli as concurrency libraries such as
the ?System [BS9o], use this style of bidirectional message passing as their communication
7The author's personal experience backs this up. An early version of the Pegasus system [RG86] used
asynchronous message passing, but we had great difficulty in debugging our programs. Our experience with
the implementation of eXene [GR9l], on the other hand, demonstrates that large synchronous message
passmg prograrns can be debugged fairly easily, even without debugging tools.
32
mechanism. In these languages, a server thread plays a role very similar to a monitor (see
Section 3.2.2). In Ada, for example, a task (the Ada term for process) exports a collection
of transactions (called entriesin Ada) that clients can invoke like normal procedure calls.
The server uses a SELECT statement to enable multiple entries simultaneously (this is similar
to the CSP selective communication in that only input operations are allowed in a select
statement). To illustrate, the unique ID server example from Section 3.2 can be programmed
in Ada as follows:
TASK BODY Uniqueld IS
x INTEGER;
BEGIN
LOOP
SELECT
ACCEPT getuniqueld (result : OUT INTEGER) DO
result x; x x+1;
END getuniqueld;
END SELECT;
END LOOP;
END UniqueId
Since this example only has one operation, the SELECT statement is not really necessary, but
if the server supported other operations, then additional accept clauses would be added.
The language Concurrent C provides a richer form of Ada's select mechanism. The
syntax of an entry clause in a select statement is
accept entry [suchthat pred] [by e] statment
where the phrases enclosed in []are optional, predis a boolean expression and eis an integer
expression. ff the optional suchthat clause is present then only those requests that satisfy
predare accepted. If the optional by clause is present then the expression e is evaluated for
each outstanding request and the request with the minimum value is selected.
While the request-reply paradigm is qulte useful in concurrent programming, I believe
that it is too heavy-weight a mechanism to be the basis of a concurrent language. For exam-
ple, if one needs to program unidirectional communication, then a bidirectional mechanism
is unnatural. Of course, one might argue that pWgramn?g bidirectional communication
using unidirectional message passing is unnecessarily complex, but, as I show in the next
chapter, it is possible to support higher-level abstractions, such as RPC, as first-class citi-
zens in a language based on unidirectional message passing. The key is to provide a flexible
mechanism for bnilding new communication and synchronization abstractions.
33
3.3.5 Futures
Various concurrent dialects of Lisp and Scheme use a mechanism called futures [Hal85,
KH88] for specifying the parallel evaluation of expressions. Futures combine thread creation,
communication and synchronization into a single mechanism. The Lisp expression
(let
((: (future ezp)))
body)
evaluates by first spawning a thread to evaluate exp and binding aplaceholder to x in body.
When the computation of e?p is complete, the result is put into the placeholder. When
a thread attempts to access: (called touching), it must synchronize on the availability of
the value. An important aspect of this mechanism is that any variable can be bound to a
future (i.e., touches are implicit), and thus a run-time check is required on every variable
access (although compiler optimization can reduce this cost).
Futures are not designed to support concurrent programming, rather they are designed
to be a parallel programining mechanism. Their main limitation as a concurrent program-
ming notation is that they ouly provide one chance for communication and synchronization
between the parent and chlld threads. Multilisp [Hal85] provides shared memory and low-
level locking mechanisms (essentially test-and-set) for supporting other patterns of commu-
nication and synchronization. Using other communication and synchronization mechanisms
in conjunction with futures can lead to problems, since some implementations consider it
optional as to whether a new thread actually gets spawned for each future (for example,
Mul-T [KH88]). Although futures might be a useful addition to ML to support parallel
programing (see Chapter 12), they are not a reasonable base for a concurrent language
design.
3.3.6 Message passing and ML
In addition to my own work, there have been several other efforts to integrate message pass-
ing and ML; e.g., [Hol83a], [Mat89] and [Ram9O]. All of these have supported CSP-style
message passing (i.e., synchronous). Message passing is a useful base for concurrent pro-
gramming, because it can support the two most common styles of concurrency: pipelining,
in which threads are arranged in a data-flow network [KM77], and server-client interac-
tions. IBy treating communication channels as infinite streams, the individual threads can
be written in an applicative style (e.g., [AB8O]), which is consistent with the ML style of
programming. In fact, CML programs tend to use far fewer references than sequential
SML programs; this point is illustrated in the following two chapters.
34
A common argument against message passing is that, compared to shared-memory prim-
itives, it provides inferior performance. While this is true for single-processor systems, there
is recent empirical evidence that suggests message-passing programs can provide better per-
formance on shared-memory multiprocessors [LS90]. The reason for this is that message-
passing programs typically have better locality, and thus map better onto the non-uniform
memory structure of modern shared-memory muitiprocessors.
3.4 Summary
There are a number of design criteria that can be drawn from the above discussion. A
concurrent extension to ML shouid have the following characteristics:
o+ Dynamic thread creation.
o+ Synchronous communication on typed channels or ports. Since channels are more
general than ports, we prefer them.
o+ Dynamic channel creation.
o+ Support for generalized selective communication.
The way existing languages support these mechanisms is not completely satisfactory. The
probkm i? that th?y ?upput uwrnuuiutation by special operations (and often with special
i?1Ai?			?ir?ti			?
syntax) without providing any mechanism for bu???ig new commm ???on anc? synt??o-
nization abstractions. In the following chapter, 1 describe these limitations in more detail,
and present my approach to concurrent language design, which addresses them.
35
36
Chapter 4
First-class Synchronous
Operations
This chapter describes the central result of this dissertation: a new approach to concurrent
language design in which synchronous operations are treated as first-class values. I first de-
veloped this approach as part of the design of the concurrent language PML [Rep88]. PML
provided a collection of concurrency features similar to those found in Amber [Car86]:
typed channels, dynamic thread and channel creation, and rendezvous with generalized
selective communication. The design of PML broke new ground, however, by providing
first-class synchronotts operations.
The basic idea of first-class synchronous operations is to introduce a domain of first-
class values, called events, for representing synchronous operations. Constructor functions
are provided to build base-event values that represent primitive operations such as channel
1/0, and combinators are provided to combine event values into higher-level synchronous
operations. The design of CML [Rep9Ob, Rep91a] builds on this approach by providing a
more powerful version of events. In addition, CML provides a number of other features
not found in PML, such as garbage collection of threads and integrated 1/0 support.
This chapter is organized chronologically; that is, according to the historical evolution
of the language design. First, I introduce a basic set of concurrency primitives, which are
similar to what is found in Amber. I then motivate and present a subset of CML, called
PML events, that is sufficient to implement the primitives found in CSP-style languages.
The PML subset has llmitations, which I use to motivate the extensions that I have de-
veloped as part of CML. Finally, I summarize the features of CML to provide a basis for
the examples found in later chapters. 1 leave the presentation of extended examples to the
next chapter, where I present a series of examples of the use of events to build higher-level
synchrouization and communication abstractions.
37
4.1 Basic concurrency primitives
We start with a discussion of the basic concurrency operations provided by CML. A
running CML program consists of a collection of threads, which use synchronous message
passing on typed channels to communicate and synchronize. In keeping with the flavor
of SML, both threads and channels are created dynamically (initially, a program consists
of a single thread). The signature of the basic thread and channel operations is given in
Figure 4.1. The function spawn takes a function as an argument and creates a new thread
val spawn : (unit -> unit) -> thread_id
val channel : unit -> `la chan
val accept : `a chan -> `a
val send : (`a chan * `a) -> unit
Figure 4.1: Basic concurrency primitives
to evaluate the application of the function to the unit value. Channels are also created
dynamically using the function channel, which is weakly polymorphic.1 The functions
accept and send are the synchronous communication operations. When a thread wants
to communicate on a channel, it must rendezvous with another thread that wants to do a
?umpkrnentary communication on the same channel (this is the mechanism described in
Section 3.3.2). SML's lexical scoping is used to share channels between threads, and to
hide channels from other threads (note, however, that channels can be passed as messages).
A simple example of these primitives is the unique ID service used in the previous
chapter. In CML, this can be implemented as follows:
abstype unique?id?src = UlD of int chan
with
fun maxeUldSrc () = let
val ch = chaunel()
fun loop i = (send(ch, i); loop(i+1))
in
spawn (fn () => loop 0);
UlD ch
end
fun getuid (UlD ch) = accept ch
end
This abstraction provides a function for creating a new source of unique IDs (makeuldsrc)
1The weak polymorphism is necessary to avoid loo?holes in the type system (see Chapter 8 for details).
38
and an operation for getting a unique ID from a source (getuld). A source of unique IDs is
represented by a channel; the function makeUldSrc dynamically creates this channel, and
also a thread that sends a stream of unique IDs on the channel. The function getUld reads
the next ID in the stream. The implementation is an example of how threads can be used
to encapsulate state; note that the only side-effects are in the concurrency operations. This
style of programming is much more applicative than that of shared-memory primitives (cf.,
Section 3.2).
4.2 Selective communication vs. abstraction
`a Section 3.3, I discussed the arguments for providing generalized selective communication;
in this section, I describe a significant limitation with the forms of selective communication
found in existing languages.
The problem is that there is a fundamental conflict between selective communication and
abstraction. For example, consider a server thread that provides a service via a req?est-repty
(or RPC) protocol. The server side of this protocol is something like:
fun serverLoop () = if serviceAvailable()
then let
val request = accept reqCh
in
send (replych, doit request);
serverLoop ()
end
else dosomethingElse()
where the function doit actually implements the service. Note that the service is not always
available. This protocol requires that clients obey the following two rules:
1. A client must send a request before trying to read a reply.
2. Following a request the client must read exactly one reply before issuing another
request.
If all clients obey these rules, then we can guarantee that each request is answered with the
correct reply, but if a client breaks one of these rules, then the requests and replies will be
out of sync. An obvious way to improve the reliability of programs that use this service is
to bundle the client-side protocol into a function that hides the details, thus ensuring that
the rules are followed. The following code implements this abstraction:
fun clientCall x = (send(reqch, x); accept replych)
39
While this insures that the protocol is observed, it hides too much. If a client blocks on
a call to clientCall (e.g., if the server is not available), then it cannot respond to other
communications. Avoiding this situation requires using selective communication, but the
client cannot do this because the function abstraction hides the synchronous aspect of the
protocol. This is the fundamental conflict between selective communication and the existing
forms of abstraction. If we make the operation abstract, we lose the flexibility of selective
communication; but if we expose the protocol to allow selective communication, we lose the
safety and ease of maintenance provided by abstraction. To resolve this conflict requires
introducing a new abstraction mechanism that preserves the synchronous nature of the
abstraction. First-class synchronous operations provide this new abstraction mechanism.
4.3 First-class synchronous operations
The traditional select construct has four facets: the individual 1/0 operations, the actions
associated with each operation, the nondeterministic choice, and the synchrouization. The
approach of this dissertation is to unbundie these facets by introducing a new type of
values, called events, that represent synchronous operations. By st&ting with base-event
values to represent the communication operations, and providing combinators to associate
actions with events and to build nondeterministic choices of events, a flexible mechanism
for building new synchrouization and communication abstractions is realized. Event values
provide a mechanism for building an abstract representation of a protocol without obscuring
its synchronous aspect.
To make this concrete, consider the following loop (using an Amber style select con-
struct [C&86]), which implements the body of an accumniator that accepts either addition
or subtraction input commands and offers its contents:
fun accum sum =
select addCh?x =>
or subCh?x =>
or readch!sum
accum(sum+x)
accuin(sum-x)
=> accum sum)
The select construct consists of three 1/0 operations: addCh?x, subCh?x,and readCh!sum.
For each of these operations there is an associated action on the right hand side of the =>.
Taken together, each 1/0 operation and associated action define a clause in a nondeter-
ministic synchronous choice. It is also worth noting that the input clauses define a scope;
the input operation binds an identifler to the incoming message, which has the action as its
scope.
Figure 4.2 gives the signature of the event operations corresponding to the four facets
of generalized selective communication. The functions receive and transmit build base-
40
val receive : `a chan -> `a event
val transmit : (`a chan * `a) -> unit event
val choose : `a event list -> `a event
val wrap : (`a event * (`a -> `b)) -> `b event
val sync : `a event -> `a
Fignre 4.2: Basic event operations
event values that represent channel 1/0 operations. The wrap combinator binds an action,
represented by a function, to an event value. And the choose combinator composes event
values into a nondeterministic choice. The last operation is sync, which forces synchroniza-
tion on an event value. I call this set of operations "PML events," since they constitute
the mechanism that I originally developed in PML [Rep88].
The simplest example of events is the implementation of the synchronous channel 1/0
operations that were described in the previous section. These are defined using function
composition, sync and the channel 1/0 event-value constructors:
val accept = sync 0 receive
val send			= sync 0 transmit
A more substantial example is the accumulator loop from above, which is implemented as:
fun accum sum = sync (
choose (
wrap (receive addCh, fn x => accum (sum+x)),
wrap (receive subCh, fn x => accum (sum-x)),
wrap (transmit (readch, sum), fn () => accum sum)
1)
Notice how wrap is used to associate actions ?;t?
icat
---- CommuiL???ions.
The great benefit of this approach to concurrency is that it allows the programmer to
create new first-class synchronization and communication abstractions. For example, we
can define an event-valued function that implements the client-side of the RPC protocol
given in the previous section as follows:
fun clientCallEvt x = wrap (transmit(reqch, x), fn () => accept replych)
Applying cljentCallEvt to a value v does not actually send a request to the server, rather
it returns an event value that can be used to send v to the server and then accept the server's
reply. This event-value can be used in a choose expression with other communications; in
41
E[bv]
E[A?(e) v]			I,'			E[e[x i,' v]]
E[letx=vine]			i,'			E[e[x?v]]			(A??-let)
E[sync (Ge)]			,			E[sync e]			(A?-guard)
Note that the rule (A?-guard) forces the expression delayed by guard. As u5ual, ?` *is
the transitive closure of ?. The evaluation of the other new forms (e.g., spawn) is defined
as part of the concurrent evaluation relation in Section 7.2.3.
7.2.2 Event matching
The key concept in the semantics of concurrent evaluation is the notion of event matching,
which captures the semantics ofrendezvous and commuhication. Informally, if two processes
synchionize on matching events, then they can exchange values and continue evaluation.
Before we can make this more formal, we need an auxiliary definition
Definition 7.2 The abort action of an event value ev is an expression, which, when eval-
uated, spawns the abort wrappers of ev. The map
AbortAct : EVENT H ExP
maps an event value to its abort action, and is defined inductively as follows:
AbortAct(A) =
AbortAct(??) =
AboftAct(?!v) =
AbortAct(ev ? e) = AbortAct(ev)
AbortAct(evi ? ev2) = (AbortAct(evi); AbortAct(ev2))
AbortAct(ev v) = CAbortAct(ev); spawn v)
With this definition we can formally define the matching of event values:
Definition 7.3 (Event matching) The matching of event values is defined as a family of
binary symmetric relations (indexed by CII). For ? E CII, define
ev1			ev2 with (e1,e2)
(pronounced "ev1 matches ev2 on channel ? with respective resiLits e1 and e2) as the smallest
relation satisfying the six inference rules given in Figure 7.2. This relation is abbreviated
to ev1 0 ev2 when the results are unimportant.
An example of event matching is:
(???A?((x.x)))o?(?17?(???Ax()))wfth(Ax((x.x)) 17,())
81
?? with(C),v)
ev1 0 ev2 with (e1, e2)
ev1 ? (ev3 ? ev2) with (e1, (AbortAct(ev3); e2))
ev1 0 ev2 with (e1,e2)
ev1 0 (ev2 ? v) with (e1,v e2)
ev1			ev2 with (ei,e2)
ev1 0 (ev2 ? ev3) with (e1, (AbortAct(ev3); e2))
Figure 7.2: Rules for event matching
ev1 0 ev2 with (e1, e2)
ev1 0? (ev2 v) with (ei, e2)
and
Informally, if two processes attempt to synchionize on matching event values, then we can
replace the applications of sync with the respective results. This is made more precise in
the next section where the concurrent evaluation relation is defined.
Note that event matching is nondeterministic; for example, both
? ?!29) with (17, ())
?!29) with (29, C))
It is also worth noting that even if one of the wrappers of an event value is non-
terminating, the necess&y abort actions for that event will be executed (assuming falr
evaluation). This property is important because a common CML idiom is to have tall-
recursive calls in wrappers (e.g., the buffered channel abstraction in Section 5.1).
ev10ev2with(e1,e2)
ev2 0ev1 with(e2,e1)
82
7.2.3 Concurrent evaluation
Concurrent evaluation is defined as a transition system between finite sets of process states.
This is similar to the style of the "Chemical Abstract Machine" [BB90], except that there are
no "cooling" and "heating" transitions (the process sets of this semantics can be thought
of as perpetually "hot" solutions). The concurrent evaluation relation extends "?" to
finite sets of terms (i.e., processes) and adds additional rules for process creation, channel
creation, and communication. We assume a set of process identifiers, and define the set of
processes and process sets as:
p =
PRoCID			process IDs
e) E PRoc (PRocTh x Exp) processes
P e Fin(P?oc)			process sets
We often write a process as (?; E[e]?, where the evaluation context serves the role of the
program counter, marking the current state of evaluation.
Definition 7.4 A process set P is well-formed if for all (?; e) ? P the following hold:
o+ FV(e) = ? (e is closed), and
o+ there is no e' $ e, such that (?; e'? ?
It is occasionally useful to view well-formed process sets as finite maps from PRoCID to
Exp. If P is a finite set of process states and K is a finite set of channel names, then K, P
is a configuration.
Definition 7.5 A coiiflguration K, P is well-formed, if FCN(P) c K and P is well-formed.
The concurrent evaluation relation "?" extends "?" to coiiflgurations, with addi-
tional rules for the concurrency operations. It is defined by four inference rules that define
single step evaluations. Each concurrent evaluation step affects one or two processes, called
the selected processes. I first describe each of these rules independently, and then state the
formal definition.
The first rule extends the sequential evaluation relation (HH) to coiiflgurations:
e
K,?+(ir; e? ? K,P+(?; e'?
The selected process is ?.
The creation of channels requires picking a new channel name and substituting for the
variable bound to it:
K, P+(r; E[chan x in e]) ? Ks? P+(?; Ek[x ?
83
(A?-chan)
Again, r is the selected process.
Process creation reqnires picking a new process identifier:
r' ?dom(?)$r
K,P+(?; E[spawnv]? ? K,P+(?; E[()])t(ir'; v
This rule has two selected processes: r and ir?.
(A??-spawn)
The most interesting rule describes communication and synchionization. If two processes
are attempting synclironization on matching events, then they may rendezvous --H
exchange a message and continue evaluation:
ev1 0 ev2 with (e1,e2)
K,?t(?i; E1[sync evij)+(r2; E2[sync ev2]?			(A?-sync)
? K,?+(ri; Eikii)+(?2; E2k2]?
The selected processes for this rule are ri and r2. We say that ? is used in this transition.
More formally, concurrent evaluation is defined as follows:
Definition 7.6 (?) The concurrent evaluation relation is the smallest relation "?"
satisfying the rules: (A?-?), (A?-chan), (A??-spawn), and (A??-sync).
Under these rules, processes live forever; i.e., if a process evaluates to a value, it will
never again be selected, but it remains in the process set. We could add the following rule,
which is similar to the evaporation rule of [B1390]:
K,P+(ir; [v]? ?
This rule is not included because certain results are easier to state and prove if the process
set is monotonicly increasing.
7.3 Traces
Unlike in the sequential semantics of Section 6.2.2, a program can have many (often inunitely
many) different evaluations. Furthermore, there are many interesting programs that do not
terminate. Thus some new terminology and notation for describing evaluation sequences
is required. This is used to describe some reasonable fairness constraints (see Section 7.4)
and to state type soundness results for A? (see Chapter 8).
First we note the following properties of ?:
Lemma 7.2 If K, P is well-formed and K, ? ? ?C', P' then the following hold:
84
1. K', P' is well-formed
2. K c K'
3. dom(P) C dom(P')
Proof. ?
y examination ot tlie rutes tor ?.
Corollary 7.3 The properties of Lenna 7.2 hold for ??.
Proof. By induction 011 the length of the evaluation sequence.
Note that property (1) implies that evaluation preserves closed terms.
Definition 7.7 A trace T is a (possibly ilifinite) sequence of well-formed configurations
T			((K0,?; K1,?;..
such that Ki, Pi ? Kj+1, Pi+i (for i < n, if T is finite with length n). The head of T is
K0,P0.
Note that if a configuration K0, Po is well-formed, then any sequence of evaluation steps
starting with K0, Po is a trace (by Corollary 7.3).
The possible states of a process with respect to a configuration are given by the following
definition.
Definition 7.8 Let P be a well-formed process set and let p E P, with p = (ir; e?. The
state of ? in P is either zombie, blocked, or ready, depending on the form of e:
o+ if e = [v], then p is a zombie,
o+ if e = E[sync ev] and there does not exist a (?`; E'[sync ev']? E (P \ [p1), such that
ev 0 ev', then ? is blocked in P.
o+ otherwise, 7r is ready in P.
We define the set of ready processes in P by
Rdy(P) =			? is ready in P?
A configuration K, P is terminal if Rdy(P) = ?. A terminal configuration with blocked
processes is said to be deadlocked.
85
Definition 7.9 A trace is a comptttationif it is maximal; i.e., if it is inunite or if it is finite
and ends in a terminal configuration. If e is a program, then we define the computations of
e to be
Comp(e)			fTIT is a computation with head (ro; e)1
Note, I follow the convention of using ro as the process identifier of the initial process in a
computation of a program.
Definition 7.10 The set ofprocesses ofa traceT is defined as
Procs(T) =			1 ?Kj,Pj E T with ir E dom(?)?
Since a given program can evaluate in different ways, the sequential notions of conver-
gence and divergence are inadequate. Instead, we define convergence and divergence relative
to a particular computation of a program.
Definition 7.11 A process 7r c Procs(T) converges to a value v in T, written r???, if
K,P+(r; v) E T. We say that ir diverges in T, written 7r?T? if for every K, ? ? T, with
? E dom(P), ir is ready or blocked in P.
Divergence includes deadlocked processes and terminating processes that are not evaluated
often enough to reach termination, as well as those with infinite loops. It does not include
processes with run-time type errors, which are called stuck (see Section 8.2.3).
7.4 Fairness
The semantics presented above admits unfair traces, and thus is not adequate as a spec-
ification of CML implementations. It is necessary to distinguish the acceptable traces.
Informally, we require that ready processes make progress and that communication on a
single channel is fair (see [Kwi89] for a survey of fairness issues).
A couple of definitions are required before formalizing the notions of fairness. I have
already defined the notion of a process being ready in a configuration; a similar definition
is required for channels.
Definition 7.12 A channel ? is enabledin a configuration K, ? if there are two distinct
processes (ir; E[sync ev]) E P and (r?; E'[sync ev']) E ?, such that ev
The acceptable computations of a program are defined in terms of fairness restrictions.
86
Definition 7.13 A computation T is acceptable if it ends in a terminal configuration, or if
T satisfies the following fairness constraints:
(1) Any process that is enabled infinitely often is selected infinitely often.
(2) Any channel that is enabled infinitely often is used infinitely often.
In the taxonomy of [Kwi89], the first restriction is strong process fairness and the second is
strong event fairness.
An implementation of CML should prohibit the possibility of unacceptable computa-
tions. In practice this requires that an implementation satisfy some stronger property on
finite traces. As an example, consider the following property.
Definition 7.14 A finite trace T of length n is k-bottnded fair (for k a fixed positive
integer), if every intermediate configuration Ki, Pi, satisfies one of the following (where
m= z+kjRdy(Pi)):
o+ m > n, or
o+ for every 7r ? Rdy(Pi), Ir is a selected process at least once in the evaluation subse-
quence Kj,Pj ?.. ? Km, P,n.
An infinite trace T is k--Hboitnded fair, if every finite prefix of T is k--Hbounded fair.
A k--Hbounded fair trace obviously satisfies restriction (1) (but not necessarily (2)). The k--H
bounded fairness restriction is realizable using fairly stand&d implementation techniques.
For example, an implementation that uses fair preemptive scheduling2 and FIFO queues for
the process ready queue and for channel waiting queues will produce only k--Hbounded fair
sequences, where k is determined by the length of the time-slice and speed of the processor.
Simil& notions can be defined for event fairness.
7.5 Extending Acv
The language A?? lacks a number of features found in CML; in this section 1 show how A??
might be extended to model some of these features. This is not meant to be a complete
development of the formal semantics of a more complicated language, rather it is to illustrate
that a formal treatment of full CML is possible.
2By fair, I mean that a thread is guaranteed some progress before being preempted.
87
7.5.1 Recursion
Dynamic process and channel creation is powerful enough to implement the call-by-value
Y? combinator. This combinator has the following evaluation rule:
E[Yvv]			E[vAx((Y?v) x)]
The following CML code implements Yv using ouly those features found in A?? (it is adopted
from [GMP89]):
val Y? = fn f => let
val a = channel()
val g = fn v => let val h' = accept a
in
spawn (fn () => send(a, h'));
f h' v
end
in
spawn (fn () => send(a, g));
let val h = accept a
in
spawn (fn () => send(a, h));
fh
end
end
This code is somewhat mysterious, but what it actually does is falrly simple. The channel
a is used to cache the function g for the next iteration off; each time g (renamed h) is read
from a, a new thread is spawned to send the copy for the next iteration. For CML, which
is statically typed (see Chapter 8), this definition implements recursion at all imperative
types. As an alternative, we could add the Y? combinator as a built-in term constructor (as
is done in [WF91b]), which would provide recursion at all types.
7.5.2 References
It is well-known that processes and channels can be used to mimic updatable references.
The standard technique is to use a process (or thread) to hold the state of the reference cell,
with messages to implement reading and writing of the cell. Figure 7.3 gives the CML code
for this. One can define a formal translation from programs with references to programs
using this scheme. This is done in [BMT92], and the translation is shown to be falthful to
the expected semantics of references.
The implementation of Yv described in Section 7.5.1 is similar to the imperative Y--H
combinator (Y) defined by Felleisen [Fel87a]. This suggests the following implementation
88
datatype `a ref = REF of (`a chan * `a chan)
fun mkRef initi = let
val inCh = chaunel() and outCh = channel()
fun cell x = sync (choose E
wrap (transmit (outCh, x), fn () => cell x),
wrap (receive inCh, fn newi => cell newi)
in
spawn (fn () => cell initi);
REF(inCh, outCh)
end
fun assign (REF(inch, ), x) = send (inch, x)
fun deref (REF(?, outCh)) = accept outCh
Figure 7.3: Imnlementin? referenc
of references, which uses channels to represent references directly and does not reqnire
explicit recursion. Figure 7.4 gives this alternative representation of references. Note that
this version of references can be directly coded in A?.
fun mkRef initi = let val ch = channel()
in
spawn (fn () => send (ch, initi));
ch
end
fun assign (ch, x) = (accept ch; spawn (fn () => send (ch, x)))
fun deref ch = let val x = accept ch
in
spawn (fn () => send (ch, x));
x
end
Figure 7.4: Implementing references without recursion
7.5.3 Exceptions
One of the most important features of SML (and CML) is the exception mechanism. CML
adds further support for exceptions with the wrapHandler event-value combinator, which
89
handles exceptions that are raised during evaluation of an event's wrappers. Exceptions are
another feature that requires imperative types to achieve sound typing.
Wright and Felleisen provide a semantics of SML's exception mechanism in [WF9lb],
but applying this techuique to A?? requires some care. The problem is that the soundness
of their semantics relies on limiting the scope of exception identiflers to within the scope
of their binding site (the rewrite rules allow the binding sites to migrate up to the top of
the term, thus expanding the scope of the binding). Since processes can include exceptions
in messages, and thus send them out of scope, a different approach is needed. The best
approach seems to be to bind exception identiflers in an implicit global environment (as is
done with chaunel names). In the remainder of this section, 1 sketch the changes to the
syntax and semantics of A? that are required to support exceptions.
Adding exceptions to A?? requires a set of exception names:
ex ? ExNNAME
The syntax of A? must be extended to support the declaration, raising, and handling
of exceptions. A raised exception is represented by an exception packet (ExN c ExP).
Exceptionpackets areirreducible terms,but for techuicalreasons they are not values. The
syntactic extensions are:
e
exceptionx ine			exception binding
raise e1 e2			raise exception
e1 handle x withe2			exception handler
exn			exception packet
exn ::= (ex, v)
exception packet
exception name
v ::= ex
ev ::--H--H (evllv)			wrapped handler
I...
The terms for exception packets, exception names and wrapped handlers are intermediate
forms; i.e., they do not appear in programs.
Sequential evaluation is extended in several ways. Since there is no pattern matching in
A?, exception matching must be explicitly coded in the semantics. For wrapHandler, this
means that the wrapper is a pair of the exception name and the handler. This is reflected
in the 6--Hrule for wrapHandler:
6(wrapHandler,(ev.(ex.v))) (evll(ex.v))
90
The presence of an exception mechanism means that function constants such as div can be
supported. Assuming the existence of the exception name Div ? ExNNAME, then integer
division can be defined by:
6(div,(z.O))			=			(Div, ()]
6(div,(z.y))			=			LwJ
Additional evaluation contexts for the new syntactic terms &e required:
E
raise E e raise ex E
ehandle ex withE I Ehandle ex withv
Note that the handier of a handle term is evaluated before the body. The sequential
evaluation relation (Definition 7.1) must be extended. Most of the new clauses for "?"
&e for short circuiting evaluation when an exception is raised and propagating the resulting
packet up to a handier. A sampling of these is:
E[exn e]			,			E[exn]
E[v exn]			,			E[exn]
E[e handle ex with exn]			E[exn]
There &e simil& rules for pairs, let, sync and raise. The other new clauses have to do
with the raising and catching of exception packets:
Efraiseexv]			E[(ex,v]]
E[ (ex, v] handle ex withv']			HH			E[v' v]
E[(ex, v] handle ex' withv']			E[(ex, v]] (ex # ex')
As is the case with channel names, the binding of new exception names is left to the
concurrent evaluation relation.
The event matching relation (Definition 7.3) must be extended with a clause for wrapped
handiers:
ev1			ev2 with (e1, e2)
ev1 0 (ev2 H (ex v)) with (e1, e2 handle ex withv)
Configurations must now include a set of bound exception names. They have the form
K, ?, P, where X c ExNNAME is a finite set of exception names. A configuration K, X, ?
is well-formed, if FCN(?) c K, P is well-formed, and any exception name that occurs in?
is in X.
The inference rules for concurrent evaluation relation (Definition 7.6) are modified in
light of the new form of configurations. In addition, the concurrent evaluation relation is
extended to allow the decl&ation of exceptions:
ex			W
K,X,?+(ir; E[exceptionx iue]? ? K,X+ex,?+(ir; E[e[x
91
7.5.4 Process join
CML provides the event constructor threadwait that creates an event for synchrouizing
on the termination of a given thread. There are a couple of ways to extend A? to model
this. One approach is to define a distinguished set of channel names, f? 7r c P?ocTh?,
to represent process IDs in the dynamic semantics. lii this approach, the rule for process
creation wraps the body of a process ? with code to repeatedly send () on the channel ??:
where
dom(?)
K,P+(ir; E[spawnv]) ? K+??,?t(?; E[??]?+(?'; Fork(?',v)?
Fork(ir,v) (v (); Y, Af(send (??.()); I ()) ())
Waiting for a process' termination is implemented in this scheme by waiting for a message
on the process' channel; i.e., threadWait is implemented directly by receive. While this
is a reasonable implementation technique, it has the disadvantage that it becomes hard to
distinguish the zombie processes.
A better approach is to support threadwait directly. As a side effect of this approach,
the event constructor always can be directly supported. The direct approach requires
adding PROCID to the domain of values and adding two new event value terms:
v ::--H--H
ev ::--H--H (W?) I A I
The implementation of the always function is defined by the following 6--Hrule:
6(always, v) (A ? Ax(v))
Matching a base event created by threadwait or always differs from rendezvous in that
only one process is selected. This requires a new relation between an event and a set of
processes.
Definition 7.15 Define the ternary relation
ev ? e
(pronounced as "ev is matched in ? with result e") as the smallest relation satisfying the
inference rules in Figure 7.5.
The concurrent evaluation relation is changed slightly in the case of spawn, which now
returns the identifier of the new process:
? dom(P)
K,P+(?; E[spawnv]?			K,?+(?; E[?']?+(ir'; v
92
v
A ?
v
?
v
ev ?
v
(ev ? v) ? (v e)
v
ev ? e
(eveev')? (AbortAct(ev'); e)
v
ev ? e
(ev' ? ev) ? (AbortAct(ev'); e)
v
ev ? e
v
(ev v) ? e
Figure 7.5: Rules for matching events in process sets
And there is an additional concurrent evaluation rule for sync that handles the matching
ofthreadWait and always events:
7.5.5 Polling
v
ev ? e
K, ?+(r; sync ev? ? K, P+(r; e?
As noted in Section 4.6, CML supports a polling mechanism. Recall that the polloperation
is a non-blocking form of sync, which returns NONE if sync would have blocked, and SOME
wrapped around the synchionization result otherwise.
It is falrly stralghtforward to add poll to A??. To start with, the syntax of expressions
and the definition of evaluation contexts is extended with a new form:
e ::= polle
E ::= pollE I
93
Since A? does not have the option datatype, we need another way to encode the result of
polling an event value. To do this, the poll operation takes two arguments: an event value
to poll and a pair of functions. Informally, the evaluation of poll ( ev. . g)) will either
apply f to the result of matching ev, or else it will apply g to ().
Since polling is supposed to be non-blocking, we need a formal notion of when synchro-
mzing on an event would block. The following two definitions do this.
Definition 7.16 An event value ev is offered by 7r in a configuration K, P, if P(?) is of the
form E[sync evi or E[po11(ev v)]. The set of offered events in P is defined to be
Offered(P) fev I ?? ? dom(P) such that ? offers ev in P?
Definition 7.17 The set of matched events in a set of processes P is defined to be
Match(?) = fev I ?ev' ? Offered(?) such that ev 0 ev', for some ?
And, we need three additional concurrent evaluation rules. The first two handle the transi-
tion in which the event is matched by some other process, the third handles the transition
for when sync would have blocked:
ev			ev1 with (e, e')
K, P+(r; E[poll(ev. (v1. v2) )])+(ir'; E'[sync ev']?
? K,?+(?; E[v1 e])+(?'; E'[e']?
ev			ev1 with (e, e')
K, P+(?; E?oll(ev. (vi v2) )]?+(?; E'[poll(ev'. (v11 v21))])
? K,?+(ir; E[v1 e])+(r'; E'[v11 e'])
ev ? Match(?)
K,?+(r; E[poll(ev.(v1.v2))]) ? K,?+(?; E[v2 ()])
To make these rules sensible requires the following fairness constraint:
If p = (?; E[poll(ev.v)]), then a transition K,P+p ? K',?'+p, is acceptable
if:
o+ ev E Match(?) and ev ? Match(P'), or
o+ ev ? Match(P) and ev ? Match(P').
This constraint captures the notion that poll is non-blocking by forcing the polling opera-
tion to complete before the state of the polled event can change.
94
Chapter 8
Typing Acv
In this chapter, 1 present a polymorphic type discipline for A? and prove that it is sound
with respect to the operational semantics presented in the previous chapter. Proofs of
the main results are provided in this chapter; additional proof details can be found in the
appendix.
CML uses SML's polymorphic type inference system, which is an extension of the one
presented in Section 6.2.3 for Av. It has been long known that the naive extension of this
system for polymorphic references is unsound [GMW79, Dam85, To?8]. For example, under
the assumptions
ref H+ Va.(a?aref)
Va.((axaref)?unft)
VoL.(oLref			a)
the following erroneous program has the type bool:
let val r = ref (fn x => x)
in
r (fn x => x + 1);
(! r) true
end
Tofte, in [To?8] and [Tofl)O], shows that the source of the problem is the rule for let
bindings. Recall from Figure 6.1 that this rule is
TE F e1 : YTE??xCL05TE(Y)?Fe2:r
TE F let x = e1 in e2 : r
(r-let)
Tofte points out that the closure operation generalizes too many type variables. In partic-
ular, there are type variables that are free in the implicit typing of the store, but which are
being generalized in the rule for let. For example, in the code above, closure causes r to
be assigned the type scheme Va.(a a) ref in the body of the let, which is instantiated
to both (int H int) ref and (bool bool) ref.
95
Since the typing of the store is undecidable at compile time, a more conservative scheme
is necessary to avoid generalization of variables that are free ill the store typing. Tofte
proposed a system that distinguishes between applicative and imperative type variables,
and between let bindings that are expansive (i.e., may introduce new store objects) and
those that are not. Expansiveness is a syntactic property that conservatively approximates
those expressions that introduce new store objects. Basically, irreducible terms, such as
abstractions and constants, are non-expansive, and all other terms are expansive. For
example, using SML notation,
let val x = ref 1 in ... end
and
let val x = I + 2 in ... end
are both expansive let bindings, while
let val x = fn x => ref x in ... end
is a non-expansive binding. There are two typing rules for let: when the binding is ex-
pansive, then only the applicative type variables can be generalized; when the binding is
non-expansive, then any variable can be generalized.
SML/NJ uses a scheme developed by Dave MacQueen, called weak types, to deal with
imperative features. The basic idea is to assign a rank to each type variable, which is an
approximation of the number of levels of abstraction protecting the variable. When the rank
of a variable gets to zero, it must be instantiated to a monotype. Applicative type variables
have a rank of inlinity, and thus are not weak. It is conjectured, although not proven, that
MacQueen's scheme is sound and strictly more polymorphic than Tofte's. Although CML
inherits this typing scheme from SML/NJ, I use Tofte's scheme in this chapter, because it
has a well-defined inference system and because it is the type system used in the definition
of SML [MTH9o].
Since channels and processes can be used to implement references (as shown in Sec-
tion 7.5.2), it is clear that the typing problems of polymorphic references also exist for
polymorphic channels. One might naively view the implementation of CML as a proof
of the soundness of polymorphic channels, since it is written in SML (plus callcc) and
it typechecks, but it has recently been discovered that the typing rules given for callcc
in SML/NJ are not sound (Bob Harper, personal communication, July 1991). A simple
counter-example (owed to Harper and Lillibridge) is the expression:
96
let val (a, b) = (callcc (fn k =>
(fn x => x, fn f => throw k (f, fn f => ()))))
in
print (a "hello");
b (fn x => x+2)
end
The typing of callcc has been changed in SML/NJ to fix this problem (the correct
typing is given in Section 2.3.3). The impltni?i?atio? of CML, however, uses the unsound
typing,1 which means that the soundness of polymorphic channels is a serious concern. The
remainder of this chapter presents the type system for A?? and proves it sound.
8.1 Static semautics
The type terms of A?? are richer than those of Av. Let  E TYCoN fint,bool,. ..)
designate the type constants. Type variables are partitioned into two sets:
u ? IMPTYVAR
t ? APPTYVAR
E TYVAR IMPTYVAR U APPTYVAR
The set of types, r E Ty, is defined by
T
I			type constants
a			type variables
I			(r1			r2)			function types
I			(T1			x r2)			pair types
r			chan			channel types
event types
imperative type variables
applicative type variables
type variables
r event
and the set oftype schemes, a C TYScHEME, are defined by
a ::= T
Va.a
As with Av, we write ..... a?.T for the type scheme a ..... Van.T, and write FTV(a)
for the free type variables of a. We define the set of imperative types by
O? IMPTY = fT FTV(r) c IMPTYVAR?
Note that all of the free type variables in an imperative type are imperative.
As with Av, type environments assign type schemes to variables in terms. Since we are
interested in assigning types to intermediate stages of evaluation, channel names also need
1The reasons for this `sre discussed in Section 10.1.1.
97
to be assigned types. Therefore, a typing environment is a pair of finite maps: a variable
typing and a channel typing:
VT ? VARTY VAR TYScHEME
CT E CHANTY=CH?IMPTY
TE (VT, CT) E TYENv = (VARTY x CHANTY)
We use FTV(VT) and FTV(CT) to denote the sets of free type variables of variable and
channel typings, and
FTV(TE) = FTV(VT) u FTV(CT)
where TE = (VT, CT). Note that there are no bound type variables in a channel typing,
and that FTV(CT) c IMPTYVAR. The following shorthand is useful for type environment
modification:
TE + f? i?' a)			=?q			(VT I fx:,' a), CT)
TEIfc?0)			Ede!			(VT,CT+f??0))
where x e VAR, `: ? CH, and TE = (VT, CT).
Because of the need to preserve imperative types, we require that substitutions map
imperative type variables to imperative types. As before, we allow substitutions to be
applied to types and type environments.
Definition 8.1 A type r1is an instance ofa type scheme a = Va1 a?.T, written a ?
if there exists a finite substitution, 5, with dom(S) = ???...., a?) and Sr = Y. If a ?
then we say that a is a generalization of r1. We say that a ? a' if whenever a' ? r, then
a ?
Definition 8.2 The closnre of a type r with respect to a type environment TE is defined
as: CLOSTE(T) = ..... a?.r, where
......., an) = FTV(r) \ FTV(TE)
And the applicative dositre of r is defined as: APPCLoSTE(r) = ..... a?.T, where
????..., a?) = (FTV(r) \ FTV(TE)) n APPTYVAR
The following important facts about type closure and generalization are used later:
Lemma 8.1 The following two properties hold for any TE, a, a', r, and x:
o+ If a ? a', then CL0STE+(???1(r) ? CLOSTE?f:??I1(T)
98
o+ If ? ? dom(TE), then CLOSTE(r) ?
Proof. These both follow from the observation that if FTV(TE) c FTV(TE') then
CL0sTE(r) ? CL0STE'(r)			I
8.1.1 Expression typing rules
As before, the function TypeOf assigns types to the constants. For the concurrency related
constants, TypeOf assigns the following type schemes:
never
receive
transmit
wrap
choose
guard
wrapibort
Va.(unit a event)
Va.(a chan a event)
Va.((a chan x a) unit event)
VaP.((a event x (a P)) P event)
Va.((a event x a event) H a event)
Va.((unit H a event) H a event)
Va.((a event x (unit unit)) a event)
We also assume that there are no event-valued constants. More formally, we require that
there does not exist any b such that TypeOf(b) r event, for some type r.
The typing rules for A? are divided into two groups. The core rules are given in
Figure 8.1. These are a modification of the rules in Figure 6.1. There are two rules for let:
the rule (r-app-let) applles in the non-expansive case (in the syntax of A?, this is when the
bound expression is in VAL); the rule (r-imp-let) applles when the expression is expansive
(not a value). There are also rules for typing channel names, and pair expressions. The
rule (r-chan) restricts the type of the introduced channel to be imperative. In addition to
these core typing rules, there are rules for the other syntactic forms (see Figure 8.2). Given
the appropriate environment, these rules can be derived from rule (r-app) (rule (r-const)
in the case of A). It is useful, however, to include them expllcitly. As before, it is worth
noting that the syntactic form of a term uniquely determines which typing rule applles.
In order that the typing of constants be sensible, we impose a typahillty restriction on the
definitions of 6 and TypeOf. If TypeOf(b) ? (r' r) and TE F v : r', then 6(b, v) is defined
and TE F 6(b, v) : r. It is worth noting that the 6 rules we defined for the concurrency
constants respect this restriction.
The following lemma defines a derived typing rule for the sequencing syntax:
Lemma 8.2 The typing rule for the sequencing is
TEFe1:r1TEFe2:r2
TEH (e1; e2) :r2
99
TypeOf(b)?r
TE F			r
xEdom(VT)VT(x)?T
(vT,CT) F x :r
= 0
(vT,CT) F ?: 0
TE L e1 : H r) TE F e2 : T1
TE F e1 e2 : r
TEIfx:?`Fe:
TE F Ax(e) : H
TEFe1:r1TEFe2:T2
TEF (e1.e2) :(ri xr2)
TEFv:YTEI?:CLOSTE(Y)?Fe:r
TE F let z = v in e : r
TEFe1:r1TE I(z?APPCL0STE(Y)?Fe2:r
TE F let z = e1 in e2 : T
TElfx:,'0chan?Fe:r
TE F chan x in e : T
Figure 8.1: Core type inference rules for A?
(r-const)
(r-var)
(r-chvar)
(r-app)
(r-abs)
(r-pair)
(r-app-let)
(r-imp-let)
(r.chan)
loo
TEFe:(unit
TE F spawn e : unit
TE?.revent
TE F sync e : T
TE F e : r event
TE F (Ge) : r event
Va.aevent?r
TE F A : r
TEF?:rchanTEHv:r
TE H ?!v : unit event
TE H ? : r chan
TE F ?? : r event
TE ? ev : r' event TE H e : (r'
TE F (ev ? : r event
TE?ev1:reventTEFev2:revent
TE ? (evi ? ev2) : T event
TEFev:reventTEFv:(unitHunit)
TE F (ev v) : r event
Figure 8.2: Other type inference rules for A?
(r.spawn)
(r-sync)
(r-guard)
(r-never)
(r-output)
(r-input)
(r-wrap)
(r-cboice)
(r-abort)
101
Proof. This follows from the definition of sequencing and the type rules above:
Typeof(snd)?((T1xT2)r2) TEFe1:r1TEFe2:r2
TEHsnd:((r1xr2)?r2)TEF(e1.e2):(rixr2)
TEFsnd(e1.e2) :r2
8.1.2 Process typings
A process typingis a finite map from process identifiers to types:
PT ? PitocT? PRoCID A
Typing judgements are extended to process configurations by the following definition.
Definition 8.3 A well-formed configuration K, P has type PT under a channel typing CT,
written
CT F K,?: PT
if the following hold:
o+ K C dom(CT),
o+ dom(P) C dom(PT), and
o+ for every (r; e? E P, ((?,CT) H e :
For CML, where spawn requires a (unit unit) argument, the process typing is PT(?)
unit for all r E dom(?).
8.2 Type soundness
This section presents a proof of the soundness of the type system given in Section 8.1 with
respect to the dynamic semantics of Section 7.2. As discussed in Section 6.2.4, I use the
approach of [WF91b]. The basic idea is to show that evaluation preserves types (also called
sibject redttction); then characterize run-time type errors (called "stnck states") and show
that stuck states are untypable. This allows us to conclude that well-typed programs cannot
go wrong.
102
8.2.1 The Substitution and Replacement lemmas
Before we can prove the main results, we need several important lemmas. The following
lemma states that any v&iable or channel name in the domain of a typing environment,
which is not free in an expression e, can be ignored in the typing of e.
Lemma 8.3 If x ? FV(e), then TE F e : r iff TE I ?x a? F e : r. Likewise, if ?
FCN(e), then TEF e : r iff TEl (?H+ O? Fe :T.
Proof. The proof is a straightforward induction on the height of the typing deduction. I
Note that the v&iable convention insures that x ? FV(e) whenever this lemma applies.
The following lemma is very important; it allows us to replace a subexpression with
another expression of the same type, without affecting the type of the whole term.
Lemma 8.4 (Replacement) Let C[] be a single-hole context. If the following hold:
1. V is a type deduction concluding TE F C[ei] :
2. V1 is a subdeduction of V, which concludes TE' F e1 :
3. V1 occurs in V in the position corresponding to the hole in C, and
4. TE' F e2 :
then TE F C[e2] :
Proof. The basic idea is that the term C[e1] and type deduction V have isomorphic
structure, thus the replacement of e1 by e2 is p&alleled by a replacement of the deduction
of TE' F e1 : r' by the deduction of TE' F e2 : r, giving the deduction of TE F C[e2] :
This is proven by induction on the structure of the deduction. See [HS86] or [WF9lb] for
detailed proofs.			I
The following lemma essentially says that P-reduction preserves types.
Lemma 8.5 (Substitution) If x ? FV(v), TE F v : T, and
TEI(x ?Va1...a?.r? Fe :r'
with ....... , a?? A FTV(TE) ?, then TE F e[x i, v] :
Proof. The proof is by induction on the height of the typing deduction; the detailed proof
is given in the appendix.			I
103
The following lemma is useful in showing that spawn preserves types.
Lemma 8.6 If (VT, CT) F e : r and FV(e) = $, then (?J, CT) F e :
Proof. This is a more specific version of Lemma 8.3 and follows immediately.
8.2.2 Subject reduction
We are now ready to state and prove the first subject reduction theorem, which says that
sequential evaluation preserves types.
Theorern 8.7 (Sequential type preservation) For any type environment TE, expres-
sion e1 and type r, such that TE F e1 : r, if e1 ,` e2 then TE F e2 :
Proof. Let E[e] = e1 and E[e'] = e2, and assume that TE' F e : r1 with TE' = (vT', CT).
Then, by the Replacement Lemma (8.4), it is sufficient to show that TE' F e' : T1. This is
done by case analysis of the definition of (i.e., the structure of e).
Case E[b v]			E[?(b, v)].
Rules (r-app) and (r-const) apply:
TypeOf(b)?H
TE'Hb:(rr')TE'Lv:r
TE' F b v :
Thus, by the typability restriction on S, we have TE' F 6(b, v) : T1.
Case E[A?(e) v] i,' E[ek:,' v]].
Rules (r-app) and (r-abs) apply:
TE'IfxY'?Fe:Y
TE'FA?(e):(r"Hr')???Fv:
TE' H Ax(e) v :
Applying the Substitution Lemma (8.5), gives us
TE' F e[x			v] : Y
Case E[let 2: v in e] E[e[z v]].
Rule (r-app-let) applies:
TE'Fv:r"TE'Ifxi,'CL0STEI(Y')?Fe:
TE' F let 2: = v in e:
104
Let CLOSTEI(T") = ..... a?.Y', then, by definition,
?ai,...,a??nFTV(TE') =
Then, by the Substitution Lemma (8.5), we get
TE' ? e[x i,' v] :
Case E[sync (Ge)] HH E[synce].
Rules (r-sync) and (r-guard) apply:
Hence, by rule (r-sync),
TE'			r1 event
TE'F(Ge):T1event
TE' F sync (Ge) :
TE'Fe:r?event
TE' F sync e :
Lemma 8.8 If ev1 0 ev2 with (e1,e2) and TE F ev? : Tj event, then TE F e? : Tj (for i ?
Proof. This is proved by induction on the definition of event matching; the details are
given in the appendix.			I
We &e now ready to prove the second subject reduction theorem, which says that concurrent
evaluation preserves process typing.
Theorem 8.9 (Concurrent type preservation) If a configuration K, P is well-formed
with
and, for some channel typing CT,
K\P'
CT F K,P: PT
Then there is a channel typing CT' and a process typing PT', such that the following hold:
o+ CT c CT'
o+ PT c PT' and
`05
o+ CT' F K', ?`: PT'.
o+ CWFK,?:PTh
Proof. The fourth property follows from the others; the proof of the first three properties
proceeds by case analysis of the left hand side of the ? relation.
Case CT F K,?+(ir; e) : PT.
ff e :` ??, then, by sequential type preservation (Theorem 8.7), we have
(??,CT) He':
and hence
CT H K,Pt(? e') : PT
Letting CT' CT and PT' = PT satisfies the theorem.
Case CT F K,P+(ir; E[chan x in e]) : PT.
Then there is a type environment TE = (VT, CT) and types r and 0 (with 0 ?
IMPTY), such that
TE + ? &chan? F e : T
TE H chan x in e : r
((?, CT) F E[chan x in e] :
Let ? be the name of the new channel (hence ? ? K) and define CT' = CT I f? H+
0 chan? (obviously CT c CT'). Then, by Lemma 8.3,
(fy, CT') H E[chan x in e] :
Thus, by the Replacement and Substitution lemmas,
((?, CT') F E[e[x H* ?] :
and, therefore, CT' F K+?, (?; E[e[x ? ic]]) : PT. Letting PT' = PT satisfies the
theorem.
Case CT H K, P+(?; E[spawn v]) : PT.
Then there is a v&iable typing VT and a type T, such that
(vT, CT)Hv:(unitH
(VI, UT) H spawn v : unit
F E[spawnv] :
106
By Lemma 7.1, we know that FV(v) = ?, and thus, by Lemma 8.6,
((?,CT) Hv : (unit H
Applying rule (r-app), we get
(?bCT) F (v ()) : r
Let 7r' be the process identifier of the new process (hence r' ? dom(P)), then
CT F K,?+(? E[()]?+(ir'; v ()? : PT ? ?r' He 4
Letting PT' = PT I f?' r? and CT' = CT satisfies the theorem.
Case CT F K,P+(?; E1[sync evi]?t(?; E2[sync ev2]? : PT.
Then, for i E (1, 2?, there is a type environment TEj and a type Tj, such that
TEjFev?:Tjevent
TEj sync ev? : Tj
(f?, CT) F E?[sync ev?] :
If ev1 0 ev2 with (e1,e2), then, by Lemma 8.8,
TEj F e? : Tj
Thus, by the Replacement Lemma (8.4), we have
(fy, CT) H Eiki] : PT(?j)
hence,
CT F K,P+(ri; Ei[ei]?+(?2; E2k2]? : PT
Letting CT' = CT and PT' = PT satisfies the theorem.
This theorem leads immediately to the following fact about traces:
Corollary 8.10 Let ((K1,P1; ...; Kn,Pn)? be a finite trace, with
CT F K1,P1 : PT
Then there is a channel typing CT' and process typing PT', such that:
o+ CT c CT'
o+ PT c PT' and
o+foriEf1,...,n?CT'FK?,??:PT'.
Proof. This follows by a simple induction on n.
107
8.2.3 Stuck expressions
lii order to show that well-typed programs do not have run-time type errors, we first need
to characterize such errors.
Definition 8.4 A process p (?; e? is stick if e is not a value and there do not exist well-
formed configurations K, ?+p and K', ?` such that K, ?+p ? K', P', with ? a selected
process. A well-formed configuration is stnck if one or more or its processes are stuck.
The notion of being stuck is a semantic one; in Section 6.2.4 and [WF91b], this is con-
servatively approximated by the syntactic notion of faulty expressions. For A??, I take a
somewhat different approach that focuses more on stuck expressions.
Lemma 8.11 (Uniform evaluation) Let e be a program, T ? Comp(e), and 7r
Procs(T), then either ??T? 7r?TV, or Pi(r) is stuck for some Kj, Pi ?
Proof. This follows immediately from the definitions.
It remains to show that stuck expressions are untypable.
Lemma 8.12 (Untypability of stuck configurations) If ? is stuck in a well-formed
configuration K, P, then there do not exist CT E CHANTY and PT E PROCTY, such that
((?, CT) F P(?) :
In other words, K, P is untypable.
Proof. The proof is given in the appendix.			I
8.2.4 Soundness
We are now in a position to state the main result of this chapter: that well-typed programs
do not go wrong. This result is stated in terms of the computations of a program. (recall
from Section 7.3 that a computation is a maximal trace).
Theorem 8.13 (Syntactic soundness) Let e be a program, with F e : r. Then, for any
T E Comp(e), ir ? Procs(T), with Kj, Pi the first occurrence of ? in T, there exists a CT
and PT, such that
CT F Kj,Pj : PT
and PT(r0) = r. And either
108
o+ 7r?T? or
o+ ??TV and there exists an extension CT' of CT with ((),CT') F v :
Proof. The existence of CT and PT follows from Concurrent Type Preservation (Theo-
rem 8.9). By Uniform evaluation (Lemma 8.11), we know that either ir?T, 7r?TV, or
is stuck for some Kj,Pj ?
Assume that 7r is stuck in Kj, ?. By Lemma 7.2, Kj, ? is well-formed and, by
Lemma 8.12, it must be untypable. But, since the configuration ??, f(?o; e?? is typable,
by Concurrent Type Preservation (Theorem 8.9), there is a CT' ? CHANTY and PT' e
PROCTY such that CT' F Kj, Pj : PT'. Which means that (f), CT') F Th(?) : PT'(r), hence
? cannot be stuck and either 7r?T or 7r?TV.
ir?T then we &e done.
Assume that ??TV and let Kj, ?j e T such that ??(r) v. Concurrent Type Preser-
vation means that there exists an extension CT' of CT and an extension PT' of PT
such that CT' F Pj : PT' Since PT' is an extension of PT, PT'(?)			PT(r), and hence
(f?, CT') ? v : PT(r).			I
To state more traditional soundness results, we first need to define a notion of evaluation
that distinguishes those processes that have run-time type errors.
Definition 8.5 For a computation T, define the evaluation of a process ? in T as
WRONG if P?(?) is stuck for some Ki,Th ? T
eval?(7r)
v			if???v
Note that for sequential programs, this is essentially the same as the definition on page 75.
Using this definition we can now state weak and strong soundness results for A??.
Theorern 8.14 (Soundness) If e is a program with ? e : r, then for any T E Comp(e)
and any 7r ? Procs(T), the following hold:
(Strong soundness) If eval?(7r) = v, and Kj, P? is the first occurrence of 7r in T, then for
any CT and PT, such that CT F Kj,Pj : PT and PT(?) = r, there is an extension
CT' of CT, such that (f?, CT') F v :
(Weak soundness) eval?(?) $ WRONG
Proof. This follows immediately from Syntactic soundness (Theorem 8.13) and the defini-
tion of eval.			I
109
hi other words, a well-typed CML progra? can never have a rnn-time type error. It is also
worth noting that for the sequential subset of A??, Theorem 8.14 reduces to the Soundness
theorem of Section 6.2.4 (Theorem 6.4).
110
Part IV
Practice
111
Chapter 9
Applications
This part of this dissertation addresses the question of the usefulness and practicality of
the proposed language mechanisms. While Chapter 5 describes a number of abstractions
that can be implemented using CML, it does not fully address the question of how useful
CML is for real applications and whether it can be efficiently implemented.
To address these questions, I have implemented CML on top of sML/NJ. This im-
plementation has been used by a number of people, including myself, for various different
applications. This practical experience demonstrates the validity and usefulness of my de-
sign as well as the efficiency of my implementation. In this chapter, 1 describe some of
these applications. I describe the implementation in Chapter 10, and its performance in
Chapter 11. The final chapter of this part (Chapter 12) describes further research related
to the implementation and use of CML on multiprocessors.
9.1 eXene: A multi-threaded X window system toolkit
As argued in Section 1.1, concurrency is a useful tool for structuring interactive applica-
tions. To this end, Emden Gansner of AT&T Bell Laboratories and I have been developing
a multi-threaded X window system toolkit [SG86], called eXene [GR9I], which is imple-
mented using CML. This implementation serves two roles: it provides a strenuous test of
the performance of CML in a real-world setting, and it serves as a platform for interac-
tive applications (discussed in Section 9.2). Because the X window system is a distributed
system, the implementation of eXene also involves distributed systems prograruming (dis-
cussed in Section 9.1.4). This section describes the architecture of eXene and gives a couple
of exaruples of the use of CML primitives in its implementation.
113
9.1.1 An overview of eXene
EXene provides a similar level of function as Xlib [Nye90b], but with a substantially
different model of user interaction. Windows in eXene have an environment, consisting
of three streams of input from the window's parent (mouse, keyboard and control), and
one output stream for requesting services from the window's parent. For each child of the
window, there are corresponding output streams and an input stream. The input streams
are represented by event values and the output streams by event valued functions. A window
is responsible for routing messages to its children, but this can almost always be done using
a generic router function provided by eXene. Typically, each window has a separate thread
for each input stream as well as a thread, or two, for managing state and coordinating the
other threads. By breaking the code up this way, each individual thread is quite simple.
This event-handiing model is similar to those of [Pik89] and [Haa90].
There are other differences between eXene and more traditional X toolkits. For ex-
ample, eXene uses immutable pens to specify the semantics of drawing operations, instead
of the mutable graphics contexts provided by the X-protocol. Since pens are immutable,
concurrency control issues are avoided when two threads share the same pen.
9.1.2 An X window system overview
The X window system is a distributed system with the appllcation clients communicating
with the X server process. The core X-protocol consists of 211 different messages, divided
into 119 request messages, of which 42 have replies, 33 event messages and 17 error messages
[Nye90a]. Each request to the server has an implicit sequence number (i.e., the first message
sent is number 1, etc.). Messages from the server to the client are tagged with the sequence
number of the last request processed by the server; this is used to match replies with
requests.
9.1.3 The architecture of eXene
Unlike some non-C language bindings for X, eXene is implemented directly on top of
the X-protocol. The only non-CML code involved is the run-time system's support for
This implementation approach has the advantage of avoiding the C
Furthermore, it provides a demonstration that CML can be used
socket communication.
language bias of Xlib.
to implement low-level systems programs without significant loss of performance.
A counection to an X-server is called a display. In eXene a display consists of seven
threads; Figure 9.1 gives the message-passing architecture of these threads. The input and
output threads provide buffering of th? uu?uuiiiiication with the server. The sequencer
114
X Server
Socket
Display
Buffer			Buffer
I			Top-level
Window
Registry
Window			KeySym			Font
Event Streams			Translations			Requests
Request/
Reply
Figure 9.1: The display message-passing architecture
115
thread generates sequence numbers and matches replies with requests. All error messages
are logged with the error handler; in addition, errors on requests that expect a reply are
forwarded to the requesting thread. The sequencer sends X-events to the event bnffer,
which decodes and buffers them. The top-level window registry is a thread that keeps track
of the top-level windows in the application and their descendants. It manages a stream of
events for each top-level window in the application. The other two display threads manage
global resources: the keymap server provides translations from keycodes to keysyms; the
font server keeps track of the open fonts used by the application.
A display has one or more screens, each of which can support different visvals and
depths (e.g., black and white or 8-bit color). Each visual and depth combination of a screen
is supported by two threads; Figure 9.2 shows the message architecture for these. The draw
Req?est/			Font
Rep1?			Reqtests
Screen
Display
Pixmap			CC Reqtests
Draw Reqvests
Figure 9.2: The screen message-passing architecture
master is a thread that encodes and batches drawing requests for a particular visual and
depth combination; the draw masters at the screen level are used for operations on pixmaps
(off screen rectangles of pixels). The CC server handles the mapping of eXene's immutable
116
pens to X's mutable graphics contexts.1
Windows are displayed with a particular visual and depth on a screen. Internally,
windows are organized into a tree hierarchy with a top-level window at the root. Figure 9.3
gives the message-passing architecture for the top-level window threads. As described above,
?ent			Ke			est/
?ream			Tran			:			GC Reqitests
Window Tree
r
I			Display			Screen I
asam			Req?
stations			Re;
I			Top-level
I			Tree Hierarchy
J
Figure 9.3: The top-level window message-passing architecture
each top-level window in an application has a dedicated stream of X-events from the display.
This stream is monitored by the top-level window ronter thread. This thread provides the
transition from the X view of events to the eXene view (i.e., a window environment).
There is a draw master thread for each window tree as well.
lit is an unpleasant artifact of X that pilmaps and graphics contexts must be associated with a particular
screen, visual and depth.
117
9.1.4 Promises in eXene
The Copylrea operation ill the Xli protocol can be used to copy a rectangle of pixels
from one place on the screen to another. A complication arises if a portion of the source
rectangle is obscured by another window. For example, Figure 9.4 shows a use of Copylrea
to translate a rectangle on the screen; here the cross-hatched region of the destination
corresponds to the obscured region of the source. While some window system maintain a
Source rectangle			I
L
Obscuring
Window
Figure 9.4: The Copylrea operation
backing store (or virtual bitmap) to handle these situations, the standard X policy is to notify
the client that the CopyArea operation was not able to completely fill in the destination.2
This policy is called damage control, since it is up to the client to repair the damage.
A typical use of Copylrea is in inserting a line of text. In this case the client thread
might issue the following sequence of operations: a Copylrea to create space for the new
text, followed by a Clearlrea to erase the old text and lastly a DrawText to insert the new
line. The following picture illustrates these steps:
2Some X servers do support backing store as an option, but applications must be designed to function
correctly when it is not available.
118
CopyArea			Clearlrea			DrawText
THIS IS			THIS IS			THIS IS			THIS IS
TEXT			TEXT			SOME
TEXT			TEXT			TEXT
It is important that the user of the system see this sequence as a single smooth transition.
This has implications for the implementation of operations using CopyArea.
ff Copyirea is treated as a normal X RPC, which returns a list of damaged rectangles,
then the user is subjugated to screen fflcker. To understand the reasons for this examine
Figure 9.5, which shows the timing information for the client doing the text scrolling, the
thread handling the buffering of communication with the server,3 and the X-server. Because
Client			Btt?r			X Server
CopyArea
CopyAck
ClearArea
DrawText
Display in
transition
Figure 9.5: Synchronous text scrornng
the other drawing operations are postponed until an acknowledgement of the copylrea is
received, the period of time the display is in transition can be quite lengthy.
Because of these performance concerns, the X protocol does not use the standard reply
mechanism for Copylrea. Instead there are two special X-events, GraphicsExpose and
NoExpose, which are used to notify the client of the result of a Copylrea request.4 For single-
3For purposes of this discussion, I have collapsed the buffer and sequencer threads into a single thread.
4Things are a little more complicated, since mnltiple GraphicsExpose events can be generated for a single
119
which case the transmit base-event value is in selecting the event. This example shows
that we can use first-class synchronous operations to abstract away from the details of the
client-server protocol, without hiding the synchronous nature of the protocol.
This approach to synchronization and communication leads to a new programming
paradigin, which I call higher-order concttrrent programming. To understand the higher-
order nature of this mechanism, it is helpful to draw an analogy with first-class function
values. Table 4.1 relates the features of these two higher-order mechanisms. Values of
Table 4.1: Relating first-class functions and events
Property			Function values			Event values
Type constructor			->			event
Introduction			A-abstraction			receive
transmit
etc.
Elimination			application			sync
Combinators			op 0			choose
map			wrap
etc.			etc.
function type are introduced by A abstraction, while event values are created by the base-
event constructors. Eunction values are eliminated by application, analogously event values
are eliminated by the sync operator.2 And both types have combinators for building new
values. This analogy does not hold completely, since the various function combinators are
derived forms, while the event-value combinators are primitive.
4.4 Other synchronous operations
The event type provides a natural framework for accommodating other primitive syn-
chronous operations.3 There are three examples of this in CML: synchronization on thread
termination (sometimes called process join), low-level 1/0 support and time-outs. Figure 4.3
gives the signature of the CML base-event constructors for these other synchronous opera-
tions. The function wait produces an event for synchronizing on the termination of another
thread. This is often used by servers that need to release resources allocated to a client
in the case that the client terminates unexpectedly. Support for low-level 1/0 is provided
by the functions synconinput and synconoutput, which allow threads to synchronize on
2??troduction" and "elimination" are being used in a type theoretic sense. They refer to the syntactic
constructs that introduce or eliminate the type constructor.
3This is the reason that the I use the term "event" to refer to first-class synchronous operations instead
of using communication."
42
val wait : thread_id -> unit event
val syncOninput : int -> unit event
val synconoutput : int -> unit event
val waitUntil : time -> unit event
val timeout : time -> unit event
Figure 4.3: Other primitive synchronous operations
the status of ffie descriptors [UN186]. These operations are used in CML to implement a
multi-threaded 1/0 stream library (Section 10.5.2). There are two functions for synchro-
nizing with the clock: waitUntil and timeout. The function waitUntil returns an event
that synchronizes on an absolute time, while timeout implements a relative delay. The
function timeout can be used to implement a timeout in a choice. The following code, for
example, defines an event that waits for up to a second for a message on a channel:
choose [
wrap
wrap
(receive ch, SONE),
(timeout(TIME?sec=1, usec=O1), fn () => NONE)
By having a uniform mechanism for combining synchronous operations, CML provides a
great deal of flexibility with a fairly terse mechanism. As a comparison, Ada has two
different timeout mechanisms: a time entry call for clients and delay statement that servers
can include in a select.
4.5 Extending PML events
Thus far, I have described the PML subset of first-class synchronous operations. In this
section, 1 motivate and describe two significant extensions to PML events that are provided
in CML.
Consider a protocol consisting of a sequence of communications: c1; ..... ; C?. When
this protocol is packaged up in an event value, one of the Cj is designated as the commit
point, the communication by which this event is chosen in a selective commurucation (e.g.,
the message send operation in the clientCallEvt abstraction above). In PML events,
the only possible commit point is c1. The wrap construct allows one to tack on ..... ; C?
after ci is chosen, but there is no way to make any of the other Cj the commit point. This
asymmetry is a serious limitation to the original mechanism.
43
A good illustration of this problem is a server that implements an input stream abstrac-
tion. Since this abstraction should be smoothiy integrated into the concurrency model, the
input operations should be event-valued. For example, the function
val input : instream -> string event
is used to read a single character. In addition, there are other input operations such as
input?1ine. Let us assume that the implementation of these operations uses a request-
reply protocol; thus, a successful input operation involves the communication sequence
send (chreg, RE??INPUT); accept(chrepi?)
Packaging this up as an event (as we did in Section 4.3) will make the send communication
be the commit point, which is the wrong semantics. To illustrate the problem with this,
consider the case where a cllent thread wants to synchronize on the choice of reading a
character and a five second timeout:
sync (choose E
wrap (timeout(TINE?sec=5, usec=O?), fn () => raise Timeout),
input instream
The server might accept the request within the five second limit, even though the wait for
input might be indefinite. The right semantics for the input operation requires making the
accept be the commit point, which is not possible using only the PML subset of events.
To address this limitation, CML provides the guard combinator.
4.5.1 Guards
The guard combinator is the dual of Wrap; it bundies code to be executed beforethe commit
point; this code can include communications. It has the type
val guard : (unit -> `a event) -> `a event
A guard event is essentially a suspension that is forced when sync is applied to it. As
a simple example of the use of guard, the timeout function, described above, is actually
implemented using waitUntil and a guard:
fun timeout t = guard (
fn () => waituntil (add?time (t, currentTime()))
where currentTime returns the current time. Some languages support guarded clauses in
selective communication, where the guards are boolean expressions that must evaluate to
44
true in order that the communication be enabled. CML guards can be used for this purpose
too, as illustrated by the following code skeleton:
sync (choose (
guard (fn () => if pred then evt else chooseE])
Here evi is part of the choice only if pred evaluates to true. Note that the evaluation of
pred occurs each time the guard function is evaluated.
Returning to the RPC example from above, we can now build an abstract RPC operation
with the reply as the commit point. The two different versions are:
fun clientCallEvti x = wrap (transmit(reqch, I), fn () => accept replych)
fun clientCallEvt2 x = guard (fn () => (send(reqch, x); receive replych)
where the clientCallEvtl version commits on the server's acceptance of the request, while
the clientCallEvt2 version commits on the server's reply to the request. Note the duality
of guard and wrap with respect to the commit point. Using guards to generate requests
like this raises a couple of other problems. First of all, if the server cannot guarantee
that requests will be accepted promptly, then evaluating the guard may cause delays. The
solution to this is to spawn a new thread to issue the request asynchionously:
fun clientCallEvt3 x = guard (fn () => (
spawn(fn () => send(reqch, x));
receive replych)
Another alternative is for the server to be a clearing-house for requests; spawning a new
thread to handie each new request.
The other problem is more serious: what if this RPC event is used in a selective com-
munication and some other event is chosen? How does the server avoid blocking forever on
sending a reply? For idempotent services, this can be handied by having the client create
a dedicated channel for the reply and having the server spawn a new thread to send the
reply. The client side of this protocol is
fun clientcallEvt4 x = guard (fn () => let
val replych = chaunel()
in
spawn(fn () => send(reqch, (replych, x)));
receive replyCh
end)
45
When the server sends the reply it evaluates
spawn (fn () => send(replych, reply))
ff the client has already chosen a different event, then this thread blocks and will be g&bage
collected. For services that &e not idempotent, this scheme is not sufficient; the server needs
a way to abort the transaction. The wrapibort combinator provides this mechanism and is
described in the next section.
4.5.2 Abort actions
The wrapAbort combinator associates an abort action with an event value. The semantics
&e that if the event is not chosen in a sync operation, then a new thread is spawned to
evaluate the abort action. The type of this combinator is:
val wrapAbort : (`a event * (unit -> unit)) -> unit
where the second &gument is the abort action. This combinator is the complement of wrap
in the sense that if you view every base event in a choice as having both a wrapper and
an abort action, then, when sync is applied, the wrapper of the chosen event is called and
threads &e spawned for each of the abort actions of the other base events.
Using wraplbort, we can now implement the RPC protocol for non-idempotent services.
The client code for the RPC using abort must allocate two chanuels; one for the reply and
one for the abort message:
fun clientCallEvt5 x = guard (fn () => let
val replyCh = chaunel()
val abortCh = channel()
fun abortFn () = send (abortch, ())
in
spawn(fn () => send (reqch, (replych, abortCh, x)));
wrapAbort (receive replyCh, abortFn)
end)
When the server is ready to reply (i.e., conunit the transaction), it synchronizes on the
following event value:
chooseE
wrap (receive abortch, fn () => abort the transaction),
wrap (transmit (replych, reply) fn () => commit the transaction)
This mechanism is used to implement the concurrent stream 1/0 library in CML (see
Section 10.5.2).
46
4.6 CML summary
So far, I've touched on the highlights of CML's concurrency mechanisms. lii this section,
I give a summary of the features of CML. This provides the background for the rest of
this dissertation. This section is not a language tutorial; for such a discussion see [Rep90b].
Figure 4.4 gives the signature of most of the CML concurrency operations, including those
already described above.
val spawn : (unit -> unit) -> thread_id
val channel : unit -> `ta chan
val sameThread : (thread_id * thread_id) -> bool
val sameChannel : (channel * channel) -> bool
val accept : `a chan -> `a
val send : (`a chan * `a) -> unit
val choose
val guard
val wrap
val wrapHandler
val wrapAbort
`a event list -> `a event
(unit -> `a event) -> `a event
(`a event * (`a -> `b)) -> `b event
(`a event * (exn --H> `a)) -> `a event
(`a event * (unit -> unit)) --H> `a event
val sync : `a event -> `a
val select : `a event list -> `a
val poll : `a event -> `a option
val always : `a -> `a event
val receive : `a chan -> `a event
val transmit : (`a chan * `a) -> unit event
val waitUntil : time -> unit event
val timeout : time -> unit event
val synconinput : int -> unit event
val synconoutput : int -> unit event
Figure 4.4: CML concurrency operations
The two functions saineThread and sameChannel can be used to test equality of thread
IDs and channels. In addition to the wrap combinator, the combinator wrapHandler wraps
an exception handler around an event. For example, syncOninput raises an exception if the
ffie specified by its argument has been closed. Using wrapHandler, a more robust version
47
of syncOnlnput is defined as:
fun waitForiuput fd = wrapHandler (
wrap (synconiuput fd, fn () => true),
fn _ => false)
Upon synchronization, this returns true if input is available and false if the ffie is closed.
The operation select is a short-hand for the common idiom of applying sync to a
choice of events; i.e.,
val select = sync 0 choose
The operation poll is a non-blocking form of sync; it returns NONE in the case that sync
would have blocked. This form of polling is different from those of [Rep88], [Rep89] and
[Rep9la]. In these earlier versions, polling was handled by constructing polling event
values.4 The semantics of these approaches is more difficult to specify and the imple-
mentation is more complicated; furthermore, in practice, the few rare uses of polling have
always been in combination with immediate application of sync. For these reasons, I have
adopted the simpler polling operation.
The base-event constructor always takes an argument and builds an event that is always
available with the argument as its synchronization result. For example, an infiulte stream
of Is can be implemented as (always I). It is useful to compare the function
fun poll' evt = select E
always NONE,
wrap (evt, SOME)
with poll when supplied to the following function:
fun pollLoop pollfn = let
fun loop () = (case (pollfu (always I))
of NONE => loop ()
I (SOME ) => ())
in
loop ()
end
Applying pollLoop to poll' can result in infinite execution sequences, while applying it to
poll will always terririnate. Section 7.5.5 describes the semantics of the poll function.
4Specifically, in [Rep88] and [Rep89] this was done by a special base-event value called anyevont,which
was lower priority than other events. In [Rep9la] this was done by a special event-value constructor.
48
4.6.1 Thread garbage collection
All important property of CML programs is the automatic reclamation of concurrency
objects (i.e., threads and channels). In general, a thread that communicates inunitely
often will block and be garbage collected if it is disconnected from the active part of the
system. This property has two benefits. First, it allows threads to be used to implement
objects, such as the unique ID source above, without having to worry about termination
protocols. If the object representation (i.e., the channels connecting to it) are discarded,
then the channels and thread are reclalmed by the garbage collector. Second, it allows use
of speculative message passing in complex protocols; i.e., the spawning of a thread to send a
message that may never be accepted. If the channel is local to the instance of the protocol,
then it is guaranteed to be garbage collected (e.g., clientCallEvt4 in Section 4.5.1).
4.6.2 Stream 1/0
CML provides a concurrent version of the SML stream 1/0 primitives. Input operations
in this version are event-valued, which allows them to be used in selective communication.
For example, an appllcation may give a user at most 60 seconds to supply a password. This
can be programmed as:
full getpasswd () = sync (choose E
wrap (timeout(TINE?sec=6O, usec=O+),
fn () => NONE),
wrap (input?1ine std_in, SOME)
This will return NONE, if the user fails to respond within 60 seconds; otherwise it wraps SOME
aronnd the user's response.
The 1/0 streams are implemented on top of the other primitives described in this chap-
ter; Section 10.5.2 describes their implementation in some detall.
49
50
Chapter 5
Building Concurrency
Abstractions
Different applications require different abstractions and prograniming styles. Modern pro-
gramming languages provide mechanisms that allow programmers to design and implement
the appropriate data and procedural abstractions for their applications, but when it comes
to concurrency operations, programmers are stuck with the decisions of the language de-
signer. First-class synchronous operations allow programmers the flexibility to design and
implement the right concurrency abstractions for their applications. In this chapter, 1
demonstrate the utility of first-class synchronous operations by showing how v&ious useful
abstractions can be implemented. These abstractions include mechanisms found in other
languages, such as asynchronous channels, Ada-style rendezvous, and futures. In addition,
I present some other abstractions that have proven useful in real applications. These ex-
amples also provide further illustration of the use of CML as a programming notation.
Chapter 9 includes additional examples from applications that &e implemented in CML.
5.1 Buffered channels
Bnffered channels provide a mechanism for asynchronous communication that is similar to
the actor mallbox [Agh86]. The source code for this abstraction is given in Figure 5.1.
The function buffer creates a new buffered channel, which consists of a buffer thread, an
input channel and an output channel; the function bufferSend is an asynchronous send
operation; and the function bufferReceive is an event-valued receive operation. The buffer
is represented as a queue of messages, which is implemented as a pair of stacks (lists). This
example illustrates several key points:
51
abstype `a buffer_chan = BC of +
inch : `a chan,
outch : `a chan
with
fun buffer () = let
val inCh = channel() and outCh = channel()
fun loop (E], E]) = loop(Faccept inCh], E])
I loop (front as (x::r), rear) = select E
wrap (receive inCh,
fn y => loop(front, y::rear)),
wrap (transmit(outCh, x),
fn () => loop(r, rear))
I loop (E], rear) = loop(rev rear, E])
in
spawn (fn () => loop((], E]));
BC+inch=inCh, outch=outCh]
end
fun bufferSend (BC+inch, ...], x) = send(inch, x)
fun bufferReceive (BC+outch, ...]) = receive outch
end (* abstype *)
Figure 5.1: CML
implementation 0			??reu cnannels
o+ Buffered channels are a li?W tulililiuluLatiuli ?bstraction, which have first-class citi-
zenship. A thread can use the bufferReceive function in any context that it could
use the built-in function receive, such as selective communication.
o+ The buffer loop uses both input and output operations in its selective communication.
This is an example of the necessity of generalized selective communication. ff we
have only a multiplexed input construct (e.g., occam's ALT), then we must to use a
request/reply protocol to implement the server side of the bufferReceive operation
(see pp. 37--H41 of [Bur88], for example). But if a request/reply protocol is used, then
the bufferReceive operation cannot be used in a selective communication by the
client.
o+ The buffer thread is a good example of a common CML programming idiom: using
threads to encapsulate state. This style has the additional benefit of hiding the state
of the system in the concurrency operations, which makes the sequential code cleaner.
These threads serve the same role that monitors do in some shared-memory concurrent
languages.
52
o+ This implementation exploits the fact that unreachable blocked threads are garbage
collected. If the clients of this buffer discard it, then the buffer thread and channels
will be reclaimed by the garbage collector. This improves the modularity of the
abstraction, since clients do not have to worry about explicit termination of the buffer
thread.
A more complete version of this abstraction is included in the CML distribution and is
used in a number of applications.
5.2 Multicast channels
Another useful abstraction is a bnffered multicast channel, which builds on buffered channels
by providing fan-ouL A multicast channel has a number of ontpnt ports. When a thread
sends a message on a multicast channel, it is replicated once for each output port. In
addition to the standard channel operations (create, send and accept), there is an operation
to create new ports. The following signature gives the multicast channel interface:
type `a mchan
val toChannel : unit -> `la mchan
val newPort : `a nichan -> `a event
val multicast : (`a mchan * `a) -> unit
New multicast channels are created using rnchannel and new ports using newPort. The
multicast operation asynchronously broadcasts a message to the ports of a multicast chan-
nel. A port is represented by an event value; synchronizing on a port event will return the
next multicast message.
A multicast channel consists of a server thread, which initiates the broadcast and creates
new ports and a chaiu of ports. Each port consists of a buffer and a "tee" thread that inserts
the incoming message in the buffer and propagates it to the next port. The port buffer is
implemented using the buffered channel from above. The following picture gives a schematic
view of a multicast channel with four ports:
53
newPort
multicast
Multicast server
Port			Port			Port			Port
The implementation of the multicast abstraction is given ill Figure 5.2. The function
mCharnel is the most interesting, as it includes the code for the server thread. A multicast
channel value is represented by a request/reply channel pair that provides an interface to
the server thread. A request is either a message to be broadcast, or a request for a new port.
The interface between the server thread and the first port in the chain and the interface
between a tee thread and the next port is an output function. The output function at the
end of the chain is a sink.
5.3 Condition variables
A simple new abstraction is the condition variable, which is a write once variable.1 A
condition variable is initially empty; after a thread writes a value to it, it is fEtiL Reading
an empty condition variable is a blocking operation, while writing to a full one is an error.
In CML condition variables have the following interface:
type `a cond_var
val condVar : unit -> `Ia cond?var
val readVarEvt : `a cond_var -> `a event
val readVar : `a cond_var -> `a
val writevar : (`a cond_var * `a) -> unit
exception WriteTwice
1The name is motivated by the conditionsfound in some sh&ed-memory concurrent languages.
54
abstype `a mchan = NChan of (`a request chan * ?? event chan)
and `a request = Nessage of `a NewPort
with
fun mChannel () = let
val reqch = channel() and respch = channel()
fun mkPort outFn = let
val buf = buffer()
val inCh = channel()
fun tee () = let val m = accept inCh
in
buffersend(buf, m);
outFn m;
tee()
end
in
spawn tee;
(fn m => send(inch, m), bufferReceive buf)
end
fun server outFn = let
fun handleReq NewPort = let
val (outFn', port) = mkPort outFn
in
send (respch, port);
end
I handleReq (Nessage m) = (outFn m; outFn)
in
server (sync (wrap (receive reqCh, handleReq)))
end
in
spawn (fn () => server (fn --H => ()));
NChan(reqch, respch)
end
fun newPort (NChan(reqch, respCh)) =
send (reqch, NewPort);
accept respCh)
fun multicast (MChan(ch, ?), m) = send (ch, Message m)
end
Figure 5.2: CML implementation of muiticast channels
55
Since reading a condition variable is a synchronous operation, an event-valued form of the
operation is provided. The exception WriteTwice is raised when a thread attempts to write
to a full variable. Condition variables are an example of what are called I-strtcttres in the
parallel language Id [ANP89, Nik9l]; they can also be regarded as a weak form of logic
variable.
A condition variable can be implemented in CML using a thread to hold the state of
the variable, as shown in Figure 5.3. Recent versions of CML provide condition variables
datatype `a cond_var = CV of ?
put?ch : `a chan,
get?ch : `a chan
fun condvar () = let
val putCh = channel() and getch = channel()
fun cell () = let
val v = accept putch
fun loop () = (send (getch, v); loop())
in
loop ()
end
in
spawn cell;
CV?put?ch = putCh, get?ch = getCh?
end
exception WriteTwice
fun writeVar (CV?put?ch, get?ch?, x) = select (
wrap (receive get?ch, fn - => raise WriteTwice),
transmit (put?ch, x)
fun readvarEvt (CV?get?ch, ...?) = receive get_ch
fun readVar (CV?get?ch, ...?) = accept get?ch
Figure 5.3: CML implementation of condition variables
as a primitive concurrency object. This is in part because they are used internally to
implement threadwait, but also because they provide a significant performance boost (see
Chapter 11) in the case that exactly one message must be sent (e.g., for abort messages,
see Section 10.5.2).
56
5.4 Ada-style rendezvous
In this section 1 describe the implementation of the communication mechanisms found ill
Ada and Concurrent C. As described ill Section 3.3.4, the basic operation is the extended
rendezvous, which consists of an entry call by a client to a server thread. In Ada (and
Concurrent 0) this call is asymmetric; i.e., the server can nondeterministically select
from a choice of accept clauses, but a client's entry call cannot be involved in a selective
communication. There is no problem with supporting entry calls in selective communication
in CML, but there is the question of which of the two synchronous operations (i.e., sending
the request and accepting the reply) should be the commit point. The various alternatives
are discussed in some detail in Section 4.5; in this example, I arbitrarily choose the sending
of the request as the commit point. Figure 5.4 gives the CML code for an abstraction of
the basic Ada communication mechanism. This implementation is more general than the
abstype (`a, `b) entry = ENTRY of ((`a * `b chan) chan)
with
fun entry () = ENTRY(channel())
fun entrycall (ENTRY reqch) x = guard (fn () => let
val replych = chaunel()
in
wrap
end)
fun entrylccept (ENTRY reqch)
wrap (receive reqch, fn
end
(transrnit(reqch, (x, replych)), fn () => accept replych)
(x, replych) => (x, fn y => send(replych, y)))
Figure 5.4: CML implementation of Ada rendezvous
Ada mechanism in several ways. It allows nested transactions, since the reply channels
are dynamically allocated, and it permits selective entry calls, since entryCall is an event-
valued function. It also allows multiple servers for a given entry. The interface of this
abstraction is
type (`a,'b) entry
val entry : unit -> (`Ia,'Ib) entry
val entrycall : (`a,'3b) entry -> `a -> `3b event
val entryAccept : (`a,'b) entry -> (`a * (`b -> unit)) event
Note that the entryiccept function returns the entry-call argument and the reply function.
As an example of the use of this abstraction, the following is the CML version of the
implementation of the unique ID server given in Section 3.3.4:
57
fun mkUidServer () = let
val e = entry()
fun loop x = let
val ((), reply) = sync (entryiccept e)
in
reply x; loop (x+1)
end
in
spawn (fn () => loop 0);
entrycall e
end
hi systems prograrnrning it is often necessary to deal with the possibility that some
expected event might not actually occur. To this end, Ada supports several variations 011
the basic rendezvous mechanism; namely, a delay clause ill the server's select statement, a
timed entry call, and a conditional entry call. These can be easily implemented in CML.
A timeout event can be used to implement a delay clause; a choice of an entry call and a
timeout event implements a timed entry call; and applying poll to an entry call implements
a conditional entry call.
The language Concurrent C provides a couple ofadditional twists on Ada's rendezvous
mechanism. Recall from Section 3.3.4 that Concurrent C entry clauses may include a
predicate on requests and/or a priority ordering of requests. It is possible to implement
these operations in CML, but, for reasons discussed below, it is not possible to do so while
supporting entry calls in generallzed selective commuuication. As an ilius?ration, I describe
the implementation of an entry abstraction that supports conditional acceptance of entry
calls based on the argument value. The signature of this abstraction is:
type (`a,'b) entry
val entry : unit -> (`Ia,'Ib) entry
val condAccept : (`3a,'3b) entry -> (`3a -> bool)
-> (`3a * (`3b -> unit)) event
val call (`a,'2b) entry -> `a -> `2b
The function entry builds a new entry object, the function condiccept take an entry object
and a predicate and returns an entry event, and the function call is used by clients to call
an entry. The Thck manager given in Figure 5.5 is an example of the use of conditional
accept (taken from [GR86]). A request to acquire a lock is only accepted if the lock is not
currently held. The status of the locks is represented by a list of lock IDs of the currently
held locks.
The implementation of the conditional accept abstraction is given in Figure 5.6. An
entry object is realized as a buffer thread that matches calls with conditional accept offers.
58
fun lockServer () = let
val lockEntry = entry()
val lockReqEvt = condiccept lockEntry
fun serverLoop locks = let
fun isLocked id = is id in locks?
fun unlock id = remove id from locks.
in
select E
wrap
fn
wrap
fn
end
(lockReq isLocked,
(id, reply) => (reply(); serverLoop (id::locks))),
(receive unlockCh,
id => serverLoop (unlock id))
in
spawn (fn () => (serverLoop 5));
? acquireLock = call lockueqEntry,
releaseLock = fn id => send(unlockCh, id) 1
end
Figure 5.5: A lock manager using conditional accept
Two channels are used to communicate with the buffer thread: clients use the call?ch to
request an entry call, and the servers2 use ofier?ch to offer a conditional acceptance. A
call consists of an argument and a reply operation (a curried application of send to a reply
channel), while an acceptance offer consists of a predicate, a channel for sending a call to
the server, and an abort event for notifying the buffer that the offer has been withdrawn.
An acceptance offer matches a call if the predicate contained in the offer returns true
when applied to the argument of the call. The buffer keeps a list of outstanding calls and
outstanding offers with the invariant that none of the buffered calls and offers match. When
a new call comes in, an attempt is made to match it against an outstanding offer; likewise,
when a new offer comes in, an attempt is made to match it against an outstanding call. If a
match is actually found, the buffer must also check to see if the offer has been withdrawn.
This is done by the function doMatch, which synchronizes on the choice of the offer's abort
event and transmitting the call to the server. The semantics of abort actions guarantee that
exactly one of this choices will be available for selection. The other point of interest is that
the buffer thread's main loop synchronizes on the choice of receiving a new call, receiving
a new offer, or being notified of the withdrawal of an offer.
This implementation can easily be extended to order requests by some priority function.
The timed entry call can also be supported by allowing clients to request that their call be
2This abstraction allows multiple servers to share the same entry object.
59
local
datatype (`a,
pred
req?ch
abort_evt
in
`b) offer_t = OFFER of ?
`a -> bool,
(`a * (`b --H> unit)) chan,
unit event
abstype (`a, `b) entry = ENTRY of ?
accept?ch : (`a, `b) offer_t chan,
call_ch : (`a * (`b -> unit)) chan
with
fun entry () = let
val acceptch = channel() and callCh = channel()
exception NoNatch
fun buffer (calls, offers) = let
fun doMatch (call, OFFER?req?ch, abort_evt, .?) = select E
wrap(abort_evt fn () => false),
wrap(transmit(req?ch, call), fn () => true)
---1) = let
= if (pred x)
then r else (matchcall r)
contintted.
fun handleOffer (offer as OFFER?pred,
fun matchCall E] = raise NoNatch
I matchCall ((call as (x, _)) :: r)
then if (doNatch (call, offer))
else call :: (matchcall r)
val arg = (matchcall calls, offers)
handle NoNatch => (calls, offersCEoffer])
in
buffer arg
end
fun handleCall (call as (x, )) = let
fun matchOffer fl = raise NoNatch
I matchOffer ((offer as OFFER?pred, -?)::r) =
if (pred x)
then if (doNatch (call, offer))
then r
else (matchOffer r)
else offer :: (matchOffer
val arg = (calls, matchOffer offers)
handle NoMatch => (calls?Ecall], offers)
in
buffer arg
end
Figure 5.6: CML imp1eme?tation of conditional elltry abstraction
60
Figure 5.6 (continued)
fun withdraw (OFFER?req?ch = reqch, ...?) = let
fun remove ((off as OFFER?req?ch, ...1) :: r) =
if (samechannel(reqch, req?ch))
then rest
else off :: (remove r)
in
remove offers
end
fun withdrawEvt (offer as OFFER?abort?evt, . .?) =
wrap (abort?evt,
fn () => buffer (calls, withdraw offer))
in
select E
wrap (receive acceptCh, handleoffer),
wrap (receive callCh, handlecall),
choose (map withdrawEvt offers)
end
in
ENTRY?accept_ch = acceptch, call_ch = callchl
end
fun condAccept (ENTRY?accept?ch, ...1) pred = guard (
fn () => let
val reqCh = channel() and abortCh = channel()
in
send (accept?ch, OFFER?
pred = pred,
req?ch reqCh,
abort?evt = receive abortCh
1);
wrapAbort (receive reqCh,
fn () => send(abortch, ()))
end)
fun call (ENTRY?call?ch, ...1) x = let
val replych = chaanel()
in
send (call?ch,
accept replyCh
end
end (* abstype *)
end (* local *)
(x, fn y => send(replych, y)));
61
withdrawn after a timeout. When the buffer receives such a request, if it has not already
matched the call with an offer, then it would discard the outstanding call.
The lock manager example given above is cited in [GR86] as an example of the need for
the suchthat clause. The claim is that without this mechanism, the lock manager requlres
a separate thread and several transactions per lock and unlock request. The real problem is
that Concurrent C's (and Ada's) rendezvous mechanism does not allow a server thread
to accept more than one entry call at a time. In CML, however, this is not a problem;
the lock server can keep a list of pending lock requests for each lock. Thus, the conditional
entry abstraction is not a particularly useful one, but as an example it illustrates some
interesting points, including the most serious limitation with CML's primitives.
For the basic Ada rendezvous, it is possible to implement entrycall as an event-valued
function, and it would be nice to do the same for this richer abstraction. Unfortunately, this
requlres a way to insure that three threads (i.e., the client, buffer and server) simultaneously
reach agreement (i.e., rendezvous) on the acceptance of a particular call. But, if both the
client and server are involved in selective communication, then either might back out at
the last minute (i.e., by selecting some other choice). This limitation is not inherent in the
mechanism of first-class synchronous operations, but rather is because synchronous channel
communication provides only a 2-way rendezvous. If a primitive synchronous operation is
supplied for multiway rendezvous3 [Cha87], then abstractions such as the conditional accept
can be supported for generallzed communication.
While I have argued that conditional accept is not a useful abstraction, there are other
examples where this problem arises. Typically, they involve using a thread to implement
a synchronous channel with richer semantics. For example, it might be nice to have a
version of channels that logged all messages for debugging purposes. The natural way to do
this is to use a thread to implement the logging channel abstraction, but without a 3-way
rendezvous the logging channels cannot support generallzed selective communication. This
problem is a topic for future research.
5.5 Futures
The final example of this chapter is the future mechanism of Multilisp (see Section 3.3.5).
Since touching a future is a synchronous operation, we represent futures directly as event
values. The future operation has the type:
val future : (`a -> `2b) -> `a -> `2b evellt
3Multiway rende?vous is an instance of the committee coordination problem [CMS8].
62
and sync is the touch operator. The implementation of future (see Figure 5.7) is straight-
forward: we spawn a new thread to evaluate the application and create a condition variable
for reporting the result. Since the evaluation of a future might result in a raised exception,
fun future f I = let
datatype `a msg?t = RESULT of `a ElN of CXII
val resVar = condvar()
in
writeVar (resvar, RESULT(f x) handle ex => ElN ex);
wrap (
readVarEvt resVar,
fn (RESULT x) => x			(ElN ex) => raise ex)
end
Figure 5.7: CML implementation of futures
the result condition variable (resvar) holds either the result or an exception.
63
64
Part III
Theory
65
Chapter 6
Theory Preliminaries
This part of the dissertation focuses on a small concurrent A-calculus, called A??, that
models the significant concurrency mechanisms of CML. I present both a dynamic and
static semantics for A? and prove that the static semantics, which is a polymorphic type
discipline, is sound with respect to the dynamic semantics.
Before diving into the semantics of A?, it is necessary to review notation. This chapter
first describes the basic notation used in this part, a mix of the notations found in [Tof88]
and [WF9ib], and then introduces the style of semantic specification used by defining the
semantics of Av, a sequential subset of A??.
6.1 Notation
If A and B are sets, then A u B is their union, A A B is their intersection, and A \ B is their
difference. The notation A B denotes the set of finite maps from A to B (i.e., partial
functions with finite domains). If f is a map, then the domain and range of f are defined
as
The notation
dom(f) fx I f(x) is defined?
rng(f) =			I x E dom(f)?
fa1?b1,...,a??b??
denotes a finite map with domain ?.?...., a??, such that a? is mapped to b?; we write f?
for the map with the empty domain. If f and g are maps, then f 0 g is their composition,
and J 1 g, called f modified by g, is a map with domain dom(f) u dom(g), such that
g(x) if x?dom(g)
f(x) otherwise
67
The symbol I is used because something is added and, when dom(g) n dom(f) $ ?, some-
thing is taken away [Tof88]. It is sometimes useful to view maps as sets of ordered pairs
and write f C g, if dom(!) C dom(g) and for x E dom(!), f(x) g(x). In this case, g is
called an extension of f. If S is a set, then we write S+x for S U fx?. When S is a map,
and x = (a, b), then S+x is only defined if a ? dom(S). The operator + associates to the
left, so S+?+z is read as (S+y)+z. The notation Fin(S) denotes the set of finite subsets
of 5 (the finite power set of 5). If p is a binary relation, then p* is the reflexive transitive
closure of p.
6.2 Formal semantics
To further introduce the notation of the following chapters, as well as to survey the impor-
tant concepts and results related to this style of semantics, the rest of this chapter presents
the syntax and semantics of a simple call-by-value A-calculus. This calculus, which is es-
sentially a polymorphically typed version of Plotkin's Av calculus [Plo75], forms the core
of the concurrent language presented in the following chapters. 1 use a style of semantics
developed by Wright and Felleisen [WF9lb]; other versions of this presentation can be found
in many places, such as [DM82] and [TolS8]. The presentation proceeds by first defining the
syntax and dynamic semantics of the untyped Av calculus, and then defining the standard
Hindley-Milner polymorphic type inference system for Av [DM82]. Finally, I state, without
proof, the standard theorems that relate the static and dynamic semantics.
6.2.1 SyntaxMA?
The ground terms of Av are variables, base constants and t'Lnction constants:
xE
VAR
CoNsT = BCoNsT U FCoNsT
BCoNsT = ?(),true,fa1se,Qi,..j
FCoNsT=f+,-,..
variables
constants
base constants
function constants
There are two syntactic classes of terms, expressions (e ? ExP) and values (v ? VAL c ExP),
defiried by the following grammar:
e
v
v			value
e1 e2			appllcation
let x = e1 in e2			let
b			constant
x			variable
Ax (e)			A-abstraction
68
Values are the irredtcible (or canonical) terms in the dynamic semantics. The free variables
of a term are defined inductively:
FV(b) =
FV(x) = fx?
FV(ei e2) = FV(ei) u FV(e2)
FV(let x = e1 in e2) FV(ei) u (FV(e2) \ fx?)
FV(Ax(e)) = FV(e)\?x?
A term e is closed if FV(e) = $. A variable is bound in a term if it appears as the variable
of a let or A. We identify terms up to a-conversion of bound identifiers; for example,
Ax(x) =a Ax'(x')
The s?bstitution of a term e' for a variable x' in a term e, where x' is not bound in e, is
written as e[x':,' e'], and is defined inductively as
= b
= x (x $ x?)
(e1 e2)[x' ? e'] = e1[x' H* e'] e2[x' I,'
(letx=e1 ine?)k'? e'] = letx=eik'? e'] ine?k'? e']
(Ax(e))[x' e'] = Ax(efr' i,'
Note that, because of the assumption that xt is not bound in e, x $ x' in the last two
cases. This is a reasonable assumption, since bound variables can be renamed. In general,
to avoid the problems of free variable capture, we adopt Barendregt's variable convention:
ff .1...., Mn occur in a certain mathematical context (e.g., definition, proof),
then in these terms all bound variables are chosen to be different from the free
variables. (p. 26 of [Bar84])
6.2.2 Dynamic semantics of Av
There are a number of different ways to specify the dynamic semantics of programming
languages. I use the style of operational semantics developed by Felleisen and Friedman
[FF86], because it provides a good framework for proving type soundness results [WF9lb].
In this approach, the objects of the dynamic semantics are the syntactic terms in Exp.
The meaning of the function constants is defined by a partial function
?: (FCoNsT x BCoNsT) CoNsT
69
For example, assuming that ?not,+,I+? C FCoNsT, then
?(not,true)			false
1+
2
Here 1+ is a special function constant that represents the partial application of + to I.
An evalnation context is a single-hole context where the hole marks the next redex (or
is at the top if the term is irreducible) The evaluation contexts of Av are defined by the
following grammar:
I Ee vE letx=Eine
The evaluation relation is defined in terms of these contexts.
Definition 6.1 (?) The evaluation relation is the smallest relation satisfying the follow-
ing three rules:
E[bv]			?
E[Ax(e) v]			HHV			E[e[x v]]			(Av?P)
E[let x =v ine]			?			E[e[x H+ v]]			(A?-1et)
It is easily shown that a given expression has a unique evaluation context under these rules,
which results in left-to-right call-by-value evaluation; i.e., a function application is evaluated
by first evaluating the function position, then the argument position and lastly by applying
the function, and similarly for let expressions. For example, in the expression
Ax(x I)(Ay(y) Az(z))
the evaluation context is
and the redex is
Ax(x i)([])
Ay(y) Az(z)
As an example of evaluation, consider the following evaluation, where [...] is used to mark
the context/redex boundary.
v
I,
I',
Ax(I) ([Ax(x 10) Ay(y)])
Ax(I) ([Ay(y) 10])
[AxCI) 10]
[I]
70
6.2.3 Typing Av
This section describes a standard polymorphic type system for Av. The purpose of the type
system is to provide a static characterization of the possible results of a computation (e.g.,
"the expression e evaluates to an integer"). The type system is a deductive proof system
that assigns types to Av terms. The most interesting aspect of this system is the rule for
let, which is the source of polymorphism. I start by defining the set of types, then I present
the type system and discuss the rule for let. Finally, I describe the standard soundness
results that hold for this system.
Type terms are built up from type constants and type variables:
a ? TYVAR			type variables
t ? TYCON fbool, int,.. .? type constants
The set of types (r E TY) is defined by:
T
--H			t			type constant
a			type variable
I			(r1 H r2)			function type
and the set of type schemes (? ? TYScHEME) is defined by:
a ::= r
I			Va.a
The type schema 0 Va1.Va2.. Va?.r is abbreviated as Va1a2.. a?.r. The type variables
a1,..., a? are said to be bound in a. A type variable that occurs in r and is not bound
is said to be free in a. We write FTV(a) for the free type variables of a. ff FTV(r)
then r is said to be a monotype. A type environment is a finite map from variables to type
schemes
TE ? TYENv VAR ? TYScHEME
It is also useful to view a type environment as a finite set of assumptions about the types
of variables. The set of free type variables of a type environment TE is defined to be
FTV(TE) =			U			FTV(a)
?Erng(TE)
The closure, with respect to a type environment TE, of a type r is defined as
CLOSTE(r) = ..... a?.r
where ?a1,...,a?? = FTV(r)\FTV(TE).
71
A snbstittttion is a map from type variables to types. A substitution 5 can be naturally
extended to map types to types as follows:
5			=
Sa =			5(a)
S(r1			r2) = (Sr1			Sr2)
Application of a substitution to a type schema respects bound variables and avoids capture.
It is defined as:
S(Va1 . . . a?.r) = VPi,... ,P?.5(r[aj I,'
where p? ? dom(S) u FTV(rng(S)). Application of a substitution 5 to a type environment
TE is defined as S(TE) = SoTE. A type r' is an instanceofa type scheme a = .....
written a ? r', if there exists a finite substitution, 5, with dom(S) = ....... a?? and
Sr = r'. ff a ? r', then we say that a is a generalization of r'. Some examples are:
Va.a ? r, for any r E T?
V?P.(a?P)			?			(a " a)
Va,P.(a?P)			?			(a?int)
The typing system is given as a set of rules from which sentences of the form "TE F e:
can be inferred. This sentence is read as "e has the type r under the set of typing assump-
tions TE." We write F e : r for f? F e : r. To associate types with the constants, we assume
the existence of a function
TypeOf: CoNsT			TYScHEME
Figure 6.1 contains the typing rules for Av. The rule (r-let), called the Milner let rule,
plays an important role in this system. It is the rule that introduces polymorphism (via the
closure operation), which is the reason for including the let construct in A,,. For example,
although the operational semantics of A,, equates
with
Af((f f) 1) A?(x)
let f=A?(x) in (f f) I
The former is untypable, while the latter has the typing
? (let f=Ax(x) in (f f) I): int
The reason for this is that (r-let) rule assigns to f the type schema Va.(a H a), which is
instantiated to both (int int) and ((int int) (int int)) in the body of the
let.
72
TypeOf(b)?r
TE F b : r
x?dom(TE)TE(?)?T
TE H x :
TE F e1 : H r) TE F e2 :
TE F e1 e2 : r
TEI?x`?`Fe:Y
TEFAx(e) :(r?Y)
TEFe1:r1TEIfxI,'CL0STE(Y)YFe2:T
TE F let x = e1 in e2 : r
Figure 6.1: Type inference rules for Av
(r-const)
(r.var)
(r-app)
(r-abs)
(r-let)
For this type system to make sense with respect to the set of constants, we place the
following restriction on the definition of ?:
If TypeOf(b) ? (r' T) and F v : r', then 6(b, v) is defined and K 6(6, v) :
This restriction insures that any well-typed application of a function constant has a 6
reduction. Unfortunately, this restriction rules out some useful function constants, such
as integer division, that are not total. In a calculus with exceptions this restriction is
unnecessary (see Section 7.5.3 or [WF91b]).
it is worth noting that there is exactly one typing rule for each syntactic form; thus, if
we have a proof of TE F e : r, for some e, the form of e uniquely specifies which typing rule
was the last applied in the deduction. This is the formulation of [TofS8] and differs from
the system of [DM82], which has judgements that infer type schemas for expressions and
rules for instantiating and generalizing type schemas. A proof of the equlvalence of these
two systems can be found in [CDDK8G].
This type inference system is decidable; there exists an algorithm, called algorithm
w [DM82] that infers the principal type (i.e., most general under the relation ?) of an
expression. Algorithm W is both sound and complete with respect to the inference system.
See [DM82] or [Tof88] for detalls and proofs.
73
6.2.4 Properties of typed Av
The purpose of static typechecking is to provide compile-time guarantees about the run-
time behavior of a program. The most important property of the typing system for Av is
type soundness; i.e., well-typed programs do not have run-time type errors. As with the
dynamic semantics, I follow Wright and Felleisen's approach [WF9ib], which is a purely
syntactic treatment (recall that the objects of the dynamic semantics in Section 6.2.2 are
the syntactic terms). Other approaches to this problem can be found in [DM82], [Dam85],
and [Tof88]. The key result in the approach of [WF91b] is proving that evaluation preserves
types. This is stated ill the following type preservation lemma:
Lemma 6.1 (Type preservation) ff TE Fe: r and e ? e1, then TE Fe':
This lemma is also known as subject reduction.
An expression e ? VAL is said to be stuck if there is no e' such that e ? e'. Because the
notion of stuck expressions is a semantic one, Wright and Felleisen define a syntactic notion
that is a conservative approximation of the potentially stuck expressions. An expression is
faulty if it contains a subexpression of the form "b v," where ?(b, v) is not defined.1 The
following expression is an example of a faulty expression that cannot become stuck:
AxCi) Ay(true 2)
Faulty expressions are shown to be untypable in [WF9lb].
We say an expression e diverges, written e?, if, for all ?? such that e ? e', there exists
an e" such that e1 ? e". An expression e converges to a value v, written e?v, if e ?* v.
Given these definitions, the behavior of evaluation is characterized by the following lemma:
v *
Lemma 6.2 (Uniform evaluation) For any closed expression e, either e?v, e?, or e
e', with e' being faulty.
The subject reduction and uniform evaluation lemmas then give us the following theorem:
Theorem 6.3 (Syntactic soundness) If F e : r, then either e?, or e?v and F v :
To state the soundhess of the type system in the traditional way, we define the partial
function eval by:
eval(e) =			WRONG			if e ?? e', with e' being faulty
v			if e?v
1Examining the evaluation relation and the definition of evaluation contexts, it is clear that the only way
that an expression can be stuck is if it has the form E[b v], where s(b, v) is undefined.
74
Note that if e?, then eval(e) is undefined. Using this definition, we can state strong and
weak soundness results, which are corollaries of Theorem 6.3.
Theorem 6.4 (Soundness) If F e : r, then the following hold:
(Strong soundness) if evai(e) v, then F v : r
(Weak soundness) eval(e) $ WRONG
This theorem means that well-typed programs produce results of the right type (if they
terminate) and do not have run-time type errors. See [WF9lb] for proof details.
75
76
Chapter 7
The Operational Semantics of Acv
In this chapter, 1 present the syntax and dynamic semantics of a small concurrent language
with first-class synchronous operations. This language, which I call A??, is Av (from the
previous chapter) extended with pairs and the concurrency primitives of CML. While
A?? lacks a number of features of SML (and thus of CML), it embodies the essential
concurrency mechanisms of CML. In particular, it includes events, channels, the channel
1/0 event constructors, and the choose, wrap, guard, wrapAbort combinators. I also
discuss how A? might be extended to model additional features found in CML, such as
exceptions and polling. In Chapter 8, I present the static semantics of A?.
7.1 Syntax
As with Av, the ground terms consist of variables, base constants and function constants;
in addition there are channel names. The ground terms are:
ze
bE
VAR
CoNsT BCoNsT U FCoNST
BCoNsT			f(),true,false,Ql,. .
FCoNsT			f+, -,fst,snd,..
CH
variables
constants
base constants
function constants
channel names
The sets VAR, CoNsT, and CH are assumed to be pairwise disjoint. The set FCoNsT
includes the following event-valued combinators and constructors:
choose, guard, never, receive, transmit, wrap, wrapibort
In addition to the syntactic classes of expressions, e E Exp, and values, v E VAL, A??
has a syntactic class of event values, ev E EVENT c VAL. The terms of A? are defined by
the grammar in Figure 7.1. Pairs have been included to make the handling of two-argument
77
e
v
ev
v
e1 e2
(ei .e2)
let x = e1 in e2
chanx in e
spawne
sync e
b
x
(vi v2)
AxCe)
ev
(Ge)
A
(ev?v)
(evi ? ev2)
I(eviv)
value
application
pair
let
channel creation
process creation
synchronization
constant
variable
pair value
A-abstraction
channel name
event value
guarded event
never
channel output
channel input
wrapper
choice
abort wrapper
Figure 7.1: Grammar for A??
function
functions easier. Note that the syntactic class of the term (vi v2) is either Ex? or VAL;
this ambiguity is resolved in favor of VAL. There are three binding forms in this term
language: let binding, A-abstraction and channel creation. Unlike CML, new channels are
introduced by the special binding form for channel creation. This is done to simplify the
presentation of the next chapter, and the channelfunction of CML can be defined in terms
of A?? (see Section 7.1.1). The set VAL0 is the set of closed value terms (i.e., those without
free variables); note, however, that closed values may contain free channel names. The free
channel names of an expression e are denoted by FCN(e). Note that, since there are no
channel name binding forms, FCN(e) is exactly the set of channel names that appear in e.
Channel names and event values are not part of the concrete syntax of the language;
rather, they appear as the intermediate results of evaluation. A program is a closed term,
which does not contain any guarded event functions (i.e., (G e) terms), or any subterms in
the syntactic classes EVENT or CH. In other words, programs do not contain intermediate
values.
78
7.1.1 Syntactic sugar
The syntax of A? differs from CML in several ways, but in many cases the CML syntax
can be viewed as syntactic sug& for A? terms.
CML uses the function channel to allocate new channels and provides the more tradi-
tional synchronous operations sendand accept. These functions can be used by embedding
a A? term e in the following context:
let channel = A:(chankink) in
let send = A:(sync (transmit?)) in
let accept = A?(sync(receivez)) in
The chooseand select functions of CML work on lists of events (instead ofjust palrs).
Although A? does not have SML's recursive datatypes, event lists can be implemented
using the following translation:
?il?			never()
[ev::r?			choose(ev.??)
There is no term for sequencing, but we use "(ei; e2)" as syntactic sug& for the term
"snd Ce1 . e2)." Since A? uses a left-to-right call-by-value evaluation order, this has the
desired semantics.
7.2 Dynamic semantics
The dynamic semantics of A? is defined by two evaluation relations: a sequential evaluation
relation ???,?? and a concurrent evaluation relation ???.?? The relation ???? is ?????
with a richer ? function and a reduction rule for palrs. Concurrent evaluation is an extension
of sequential evaluation to finite sets of processes.
7.2.1 Sequential evaluation
As before, the meaning of the function constants is given by the partial function
FCoNsT x VAL0 H VAL0
Since a closed value v ? VAL0 can have free channel names in it, we require, that if b e
FCoNsT and ?(b, v) is defined, then
FCN(S(b,v)) C FCN(v)
79
In other words, ? is not allowed to introduce new channel names. For the standard built-in
function constants, the meaning of 6 is the expected one. For example:
=			I
=			2
6(fst,(v1.v2))			=			v1
6(snd,(vi.v2))			=			v2
The meaning of 6 is straightforward for most of the event-valued combinators and construc-
tors:
6(never, ()) = A
6(transmit,(tc.v)) =
6(receive, ?) =
6(wrap,(ev.v)) = (ev?v)
6(choose,(evi.ev2)) = (ev1?ev2)
6(wraplbort,(ev.v)) = (evlv)
The only complication arises in the case of guarded-event values:
6(guard,v) =
6(wram((Ge).v)) = (G(wrap(e.v)))
6(choose,((Ge1).(Ge2))) = (G(choose (e1.e2)))
6(choose, ((Ge1).ev2)) = (G (choose (ei.ev2)))
6(choose,(evi.(Ge2))) = (G(choose (evi.e2)))
6(vraplbort, ((Ge).v)) = (G (wrapibort (e.v)))
These rules reflect guard's role as a delay operator; when another event constructor is
applied to a guarded event value, then the guard operator (G) is pulied out to delay the
event construction.1
Like Av, evaluation of A? is call-by-value, but there is the additional constraint that
pairs are evaluated left-to-right. This leads to the following grammar for the evaluation
contexts of A?:
E ::=			Ee I vE			(E.e) I
I			letx=Eine			spawuE I syncE
The following fact about terms and contexts is useful in Chapter 8:
Lemma 7.1 ff E[e] is a closed term, then e is a closed term.
Proof. Examining the above definition, it is clear that if x is free in e, then x must also
be free in Ek]. Hence, FV(e) C FV(E[e]) = ?.			I
Definition 7.1 (?) The seqnential evalnation relation is the smallest relation "?"
satisfying the following four rules:
lin Algol 60 terminology, (Ge) is a thunk.
80
threaded C clients (which make up the vast majority of X clients), this means that the code
using the Copylrea operation must also scan the event stream for the acknowledgement. In
eXene, where we have concurrency and events, we can handle this operation in a much more
elegant way. Our solution is to implement CopyArea as an asynchronous RPC operation,
also known as a promise [LS88]. EXene provides an event-valued function with the type
val copylrea : arg-type -> rect_t list event
where arg-type is the type of the arguments that specify the actual operation. The event that
is returned is the promise of the result. Figure 9.6 gives the implementation sketch of this
operation, where request sends the operation to the buffer thread and ?lushtells the buffer
thread to flush any buffered messages to the server. The guard is optimized to first check to
fun copyArea arg = let
val replyCh = channel()
in
spawn (fn () => request (COPY?AREA(reply?ch, arg)));
guard (fn () => (
case (poll (receive replych))
of (SOME rects) => always rects
NONE => (flush(); receive replych)
(* end case
end
Figure 9.6: The implementation of copyArea
see if the acknowledgement is already available. The buffer code is more complicated, since
it must match the acknowledgements with outstanding CopyArea requests. The advantage
of this approach can be seen by comparing its timing diagram, given in Figure 9.7, with
Figure 9.5.
9.1.5 Menus
Another example of the way concurrency is used in eXene is in the way that popup menus
are attached to windows. This is done by interposing a thread on the window's mouse
stream. When the thread sees a down transition on the appropriate mouse button, it
creates the menu window and starts tracking the mouse (the X semantics cause all mouse
events until the up transition to be directed at the window). Other mouse events are passed
through without action. This is a form of delegation, and the window wrapped by the menu
thread can be viewed as a "sub-class" of the window.
Copylrea request.
120
Client			Bitffer			XSe?er
CopyArea
ClearArea
DrawText
Flush
CopyAck
Display in
transition
Figure 9.7: Asynchronous text scroliing
9.2 Interactive applications
The combination of eXene and CML provides a foundation for building interactive appli-
cations in the spirit of Pegasus [R086, GR92]. In this section, 1 describe an application of
eXene and how it uses the features of CML.
Currently, the most sophisticated application built on top of eXene is Graph-o-
matica, which is an interactive tool for viewing and analyzing directed graphs.5 Graph-
o-matica was originally implemented on top of Pegasus by Emden Gansner and Steve
North at AT&T Bell Laboratories; Emden Gansner ported it to eXene.
Graph-o-matica provides the user with two kinds of windows: command windows,
which provide a terminal-style, language interface to a command shell, and viewers, which
provide a view on a 2D layout of a graph. At any time a user can have multiple command
windows and multiple viewers. Each viewer is associated with a particular layout of a
particular abstract graph. Different graphs can have different layouts, and each layout can
have multiple views. Figure 9.8 is a screen dump from a sample session with Graph-o-
matica. The bottom window is a command window; the two windows above provide two
views of a single layout of a graph. A viewer allows the user to pan and zoom (using menus
5Huimin Lin at the University of Sussex has built an interactive theorem prover on top of eXene, but I
do not know the details of its ianplementation.
121
o+ - --H --H _ mi?
i> view mikel mik
x13			not found
i> view mike2 mike
x13			not found
I> ?
Figure 9.8: Graph-o-matica screen dump
and the scrollbars) over the p&ticular layout. The user can manuaily change a layout using
editing operations such as moving a node, or elision of a subgraph.
The implementation of Graph-o-rnatica exploits the features of CML in several ways.
ff a graph is edited, this information needs to be propagated to the layouts and views of the
graph. We use the multicast channel abstraction (described in Section 5.2) to manage the
propagation of update notifications to the layouts and from the layouts to the views. This
simplifies the implementation of the graph object, since it does not need to know anything
about multiple layouts. The layout objects, if they decide a given change affects them, can
122
query the graph object for more detalled information.
The command shell is a thread that communicates with a virtital terminal (vtty) widget.
The vtty widget is a good example of the need for both communication abstraction and
selective communication. At any time, the vtty must be able to handle both input from
the user and output from its client (the command shell). EXene provides an abstract
interface to the input stream, but since it is event-valued, it can still be used in selective
communication.
Concurrency is also used in the structuring of the application code. Layout algorithms,
for example, run as separate threads, thus allowing the user to continue other activities
while walting for a new layout.
9.3 Distributed systems programming
Many distributed programmIng languages have concurrent languages at their core (e.g.,
SR [AOCE88]), and distributed programming toolkits often include thread packages (e.g.,
Isis [BCJ+90]). This is because threads provide a needed flexibility for dealing with the
asynchronous nature of distributed systems.
The flexibility provided by CML should is a good base for distributed programming. Its
support for low-level 1/0 is sufficient to build a structured synchronous interface to network
communication (as was done in exene). Higher-level linguistic support for distributed
programming, such as the promise mechanism described in Section 9.1.4, can be built using
events to define the new abstractions.
Another example is Chet Murthy's reimplementation of the Nupri environment [Con86]
using CML. His iniplemen?atiun i? ??ructured a? a collection of "proof servers" running on
different workstations. When an expensive operation on a proof tree is required, it can be
decomposed and run in parallel on several different workstations. This system uses CML
to manage the interactions between the different workstations.
9.3.1 Distributed ML
Another project involving CML is the development of a distributed programming toolkit
for ML that is being done at Cornell University [Kru91]. This work builds on the mecha-
nisms prototyped in Murthy's distributed Nupri and on the protocols developed for Isis
[BCJ+90]. A new abstraction, called a port groip has been developed to model distributed
communication. The communication operations provided by port groups are represented
by event-value constructors. For details see [Kru91].
123
9.4 Other applications of CML
CML has beeu used by various people for a uumber of other purposes. Audrew Appel has
used it to teach concurrent progranmiing to undergraduates at Princeton University (Appel,
personal communication, January 1991). Gary Lindstrom and Lal George have used it to
experiment with functional control ofimperative programs for parallel programming [GL9l].
And Cle'ment Pellerin has implemente(L a compiler trom a concurrent constralnt language
to CML.
124
Chapter 10
Implementation
There have been several implementations of first-class synchronous operations. I wrote the
first implementationin C as part of the Pegasus/PML run-time system [Rep88]. I laterim-
plemented the concurrency mechanisms of PML 011 top of SML/NJ in a coroutine1 library
[Rep89], and Norman Ramsey has implemented a similar system at Princeton [Ram9O].
More recently, I implemented CML on top of SML/NJ [Rep9ia]. Of these implementa-
tions, CML provides the richest programming notation and the best performance. It is
written entirely in SML, using two non-standard extensions provided by sML/NJ, first-
class continuations [DHM91] and asynchrono?s signals ?Rep90a], and one minor compiler
modification. This chapter describes the implementation of CML in some detail (a brief
sketch was given in [Rep91a]), and discusses implementation techniques that might further
improve performance. Some specific performance measurements of this implementation are
reported in the next chapter. This implementation runs on single processor computers; the
issues related to a multiprocessor implementation of CML are discussed in the Chapter 12.
10.1 The implementation of SML/NJ
SML/NJ is a high-performance implementation of SML [AM87, AM9i]; it uses a combi-
nation of sophisticated compiler techniques and clever run-time system support to provide
a level of performance that is competitive with C on large examples. In this section, 1
describe the aspects of SML that have a direct bearing on the implementation of CML.
1By "coroutine," I mean that this system does not use preemptive thread scheduling.
125
10.1.1 First-class continuations
As discussed in Section 2.3.3, SML/NJ provides continuations as first-class values. Until
recently the type of callcc was fully polymorphic; i.e.,
val callcc : (`a cont -> `a) -> `a
As discussed in Chapter 8 (p. 97), it has been discovered that this typing of first-class
continuations is unsound. The type of callcc in SML/NJ is now:
val callcc : (`Ia cont -> `la) -> `la
which corrects the soundness problem [WF9la]. Unfortunately, using this weakly polymor-
phic type has the effect of reducing the polymorphism of the CML primitives. For example,
the type of sync is
val sync : `Ia event -> `Ia
using the weakly polymorphic version of callcc. For this reason, I use the unsafe, fully
polymorphic, version of callcc in my implementation. The typing of the resulting primi-
tives, however, is proven sound in Chapter 8.
10.1.2 The compiler
The SML/NJ compiler is a multi-pass compiler. The front-end is fairly conventional (scan-
ning, parsing, type-checking, etc.); it is the back-end (optimization and code generation)
that interests us. The back-end uses a representation called continuation-passing style, or
CPS for short [Ste78, KKR+86, AJ89, App92]. The CPS representation is a specialized
form of A-calculus that has a uniform representation for all transfers of control (conditional
branches, loops, function calls and function returns). This representation is a "goto with
arguments," better known as a tail-recursive call. Since a function return is represented
as a tail-recursive call, functions must be parameterized by the return continuation. It is
from this explicit passing of continuations that CPS gets its name. The advantage of this
approach is that the compiler can concentrate on making function calls as fast as possible,
which is one of the keys to good performance for languages like SML and Scheme. Uullke
other ?unturnatiwi-pa??ing style compilers, such as [Ste78] and [KKR+ 86], the code gener-
ated by the SML/NJ compiler does not use a run-time stack; instead, return continuations
are heap allocated. This means that the code generated for callcc and throw is essentially
the same as that for finction calls. The ouly difference is that the current continuation
created by callcc must restore the current enclosing exception handler. Unfortunately,
126
this means that callcc breaks tail-recursion (ill the same way that exception handlers do),
which has implications for the implementation of sync (see Section 10.4).
10.1.3 The run-time system
The SML/NJ run-time system provides automatic memory management, an interface to
the underlying operating system (UNIX), and a mechanism for building stand-alone ML
worlds. An older version of the run-time system is described in [App9O], but I and others
have revised it several times since then.
The run-time system is logically divided into two coroutines: the ML program and the
C program that provides run-time support. The actual implementation uses procedure call
and return to implement the coroutine switches, with a global C struct used to hold the
ML state in the run-time system.2 Two assembly routines, restoreregs and saveregs,
are used to (respectively) call and return from the ML program. When the ML program
needs a service from the run-time system, it loads the global variable request with name
of the needed service and jumps to saveregs.
Memory management
Each object has a one-word descriptor at its beginning that contains the object's length and
a 4-bit tag. There are four kinds of objects: tuples, arrays, strings and bytearrays (mutable
strings). Code objects in the heap are represented by strings. A single code object is used
to hold all of the code for a compilation unit (e.g., ML structure or functor), which requires
a mechanism for supporting pointers into the middle of strings. This is accomplished by
an embedded-string descriptor, which is used to mark substrings of a code object; preceding
the embedded-string descriptor is a back-pointer descriptor, which tells the garbage collector
how to find the beginning of the code object. For more details on run-time representations
see [App90].
Memory allocation is the dominating cost of ML execution, thus it must be as cheap
as possible. SML/NJ uses inline allocation with minimal overhead (allocation of a tuple
requires 3 or 4 instructions over the cost of object initiallzation). The run-time system uses
two dedicated registers to support allocation: the allocation pointer, which points to the
start of the next object to be allocated; and the heap-limit pointer, which is used to test for
the need for garbage collection. Instead of testing the heap-limit on every allocation, the
compiler tests only at the beginning of an extended basic block.3 The compiler computes
2For releases of SML/NJ since 0.70, the ML state is no longer global. Instead, each run-time routine
takes it as an &gument, which allows multiple ML states to exist (e.g., on a multiprocessor).
3An extended basic block is an acyclic graph of basic blocks with a single entry-point, but multiple exit-
127
the maximum possible allocation ill an extended basic block and generates a heap-limit test
at the root, which will insure that the execution of the block does not run out of allocation
space (allocation of dynamic sized objects, such as arrays, is done by hand-coded assembly
routines in the run-time system). To simplify the test, the heap limit is set at 4096 bytes
below the actual top of the allocation space, so that the test for any block that allocates less
than 4096 bytes only involves a pointer comparison. On many machines, this can be cleverly
coded using one register-register instruction; for blocks that might allocate more than 4096
bytes, a more expensive test involving pointer arithmetic is required. When a heap-limit
overflow is detected a trap is generated, which the operating system maps to a UNIX signal
[UM86] that is caught by a handler in the run-time system. The signal handler saves the
program counter of the ML program, replaces it with the address of saveregs, and returns
to the operating system, which causes program execution to resume in saveregs. This
technique of using a UNIX signal handler to vector to an assembly routine that saves the
register state is owed to Cormack [Cor88]. The compiler generates an embedded-string
descriptor prior to the entry-point of each extended basic block, thus the program counter
at the heap-limit test is treated like a normal ML value by the garbage collector.
Signals
SML/NJ provides an asynchronous signal mechanism [Rep90a], which has semantics sim-
ilar to that of UNIX signals [UN186]. When a signal occurs, the current continuation is
grabbed and passed to the appropriate ML signal handler. The signal handler executes
atomically with respect to signals; it returns a continuation that is used to resume exe-
cution. Signal handlers provide a natural mechanism for implementing preemptive thread
scheduling (see below).
The actual translation of a UNIX signal to an ML signal is more complicated than
described. The principal dlfliculty is that constructing a continuation to pass to the signal
handler at any arbitrary point in the execution is not feasible. The solution to this problem
is to delay capturing the continuation to a safe point where the state of execution can
be easily captured. The heap-limit checks used to trigger garbage collection conveniently
provide such safe points. Thus, when a UNIX signal occurs, the UNIX signal handler in
the run-time system records it and modifles the heap-limit pointer to insure that the next
heap-limit check will trigger a garbage collection. The garbage collector then recognizes
that the request for garbage collection is actually a pending signal, builds a continuation
closure out of the ML state,4 and passes the signal and continuation to an ML routine that
points [Ros8l].
4Recall from above that the hea?limit check is preceded by a descriptor, thus the garbage collector wffl
be able to deal with the code address of the continuation built by the run-time system.
128
dispatches the appropriate ML signal handler. The use of heap-limit checks as safe points
is similar to the preemption teclmique used in Argus [LCJS87]. For a complete description
of the SML/NJ signal mechanism, see [Rep90a].
10.2 Implementing threads
The implementation of threads exploits the fact that first-class continuations are exactly
the thread state that needs to be saved and restored on context switches [Wan80]. This
section describes the implementation of threads and preemptive scheduling.
10.2.1 Threads
Internally, a thread is represented by two pieces of information: a thread ID and a contin-
uation. The thread ID serves as a unique identifier for the thread, as well as providing a
handle for implementing the threadWait operation, while the continuation represents the
suspended state of the thread's computation. Threads are either ready (able to execute) or
blocked (walting to synchronize on some event). At any time, one of the ready threads is
designated as the currently runningthread; the IDs and current continuations of the other
ready threads are kept in the ready quette. The global variable rurningmreadld is used to
refer to the currently running thread's ID.5 Switching thread contexts involves putting the
current thread's continuation and ID into the ready queue and dispatching the next thread
in the queue. The following code illustrates the mechanics of a context switch:
fun contextSwitch runningK = let
val _ = rdyqlnsert (!runningmreadld, runningK)
val (newid, newK) = rdy?Reniove ()
in
runningmreadld := nevid;
throw nevK ()
end)
where runningK is bound to the running thread's current continuation. Variations on this
scheme are used throughout the implementation.
Although the SML/NJ compiler knows nothing about threads and concurrency, the
fact that callcc and throw are used to implement threads means that the implementa-
tion gets many of the benefits of specialized compiler support for free. In particular, the
compiler knows exactly which registers are live at the point of a context switch, thus only
the minimum amount of thread state required is actually saved and restored. Furthermore,
?In the most recent version of CML (0.9.6)1 switched to using the vazptr register to refer to the current
thread's ID. This register is a dedicated per-processor register provided by the compiler.
129
the fact that continuations are heap allocated means that thread creation is a very fast,
constant time, operation.6 For the MIPS processor, a thread context switch is about 190
instructions, and thread creation is about 490 instructions; there is some hope that these
numbers can be substantially reduced (see Section 11.1.2 for more details).
10.2.2 Preemptive scheduling
In order to prevent a thread that is executing a long (or infinite) computation from monop-
olizing the processor, CML uses preemptive thread scheduling. This is done in a straight-
forward manner using the UNIX interval timer [UN186] and the signal mechanism described
above. The interval timer is set to generate a SIGALRM every n milliseconds (n is typically
in the range from 10 to 50), and a signal handler that forces a context switch is installed
for SIGALRM. The only complication is the possible interference between the runnlng thread
and signal handler. To avoid this problem, a global flag is used to mark when execution is
in a critical region:
datatype atomic_state = NonAtomic I Atomic I Signalpending
val atomicState : atomic_state ref
val atomicBegin : unit -> unit
val atomicEnd : unit -> unit
The function atomicBegin, which sets the flag to Atomic, is called just prior to entering a
critical region, and the function atomicEnd, which resets the flag to NonAtomic, is called
on exit. If a signal occurs while atomic?state is Atomic, then the signal handler does
not force a context switch. Instead it sets atomic?state to SignalPeuding and returns.
The function atomicEnd checks the flag before resetting it; if it is SignalPending, then a
context switch is performed.7 Note that this mechanism is internal to the implementation
of CML; user programmers have no access to these operations.
10.3 Implementjng channels
Channels are represented by a pair of queues; one for threads waiting for input and one
for threads waiting for output (see Figure 10.1). Each item in a channel queue is a triple,
6While constant time callcc is possible m stac? basect miplementations (e.g. [HDB9o]), it is not clear
that these techniques are fast enough to implement true light-weight threads on today's hardware. For
example, Haahr reports that a number of Scheme implementations were unsatisfactory for anything more
than prototypes of his multi-threaded window system [Haa9O].
7The reader may recognize that there is a potential race when exiting a critical region between the time
of the test for a pending signal and the resetting of the flag. The delaying of signals to heaplimit check
points, however, means that this race can not occur in practice, since the test and resetting of the flag is
done without any intervening heaplimit checks.
130
type `a chanq = (bool ref * thread_id * `a) queue
datatype `a chan = CHAN of ?
inq : `a cont chanq,
outq : (`a * unit cont) chanq
Figure 10.1: The representation of channels
consisting of a dirty flag (described below), a thread ID, and an offered communication.
In the input queue (inq), an offered communication is represented by a continuation that
will accept a message; in the output queue (outq), an offered communication consists of
the message being sent and the continuation to resume the thread when the message is
accepted. As an example, the implementation of send is given in Figure 10.2. This code
fun send (CHAN?inq, outq?, msg) = callcc (fn send_k => (
atomicBegin();
case (cleanandRemove inq)
of SONE(rid, rkont) => (
rdyQlnsert (!ruuningmreadld, send?k);
runningmreadld := rid;
atomicEnd();
throw rkont msg)
I NONE => (
insert(outq, (ref false, getTid(), (msg, send?k)));
atomicDispatch())
(* end case
Figure 10.2: The implementation of send
works by first capturing the rendezvous point continuation using callcc. Since the channel
queue and ready queue are going to be manipulated, atomicBegin is called to mark the
start of a critical region. The call to cleanAndRemove returns the ID and communication
of the first "clean" item from the input queue8 if one is available, otherwise it returns NONE.
If there is an offered communication avallable (i.e., matching accept or receive), then the
sending thread is added to the ready queue and the message is thrown to the receiving
thread's continuation. If no matching communication is available, then the sender is added
to the output waiting queue, and another thread is dispatched (atomicDispatchdispatches
a thread while exiting the critical region). The implementation of accep? is essentially a
mirror image of send.
8The notion of cleanliness is related to the dirty flag and is explained below.
131
10.4 Implementing events
The implementation of events is moderately complex, so it is useful to first consider a very
simple subset of events without choice, guards or abort actions. In particular, consider the
P operation on binary semaphores, which is one of the simplest synchronous operation. An
implementation of binary semaphores (ignoring issues of atomic regions and thread IDs) is
quite simple:
datatype semaphore = SENAPHORE of +
fig : bool ref,
waitq			unit cont list ref
fun V (SENAPHoRE+fig, waitq?) = (case !waitq
of (] => fig := true
(k::r) => (waitq := r; enqueue k))
fun P (SENAPHoRE+fig, waitq?) = let
fun Pbody resumek = if (`fig)
then (fig := false)
else (waitq := !waitq ? (resumek]; dispatch())
in
callcc Pbody
end
where the body of the P operation is factored out for pedagogical reasons. The thing to
notice about this code is that the resumption continuation of the calling process is a free
variable in the body of the operation. This observation, which holds for all synchronous
operations, is the key to the implementation of events. In this simple setting, it means that
event values can be represented as
type `a event = `a cont -> `a
Using this representation, the event-valued implementation of P is:
fun P (SENAPHORE+fig, waitq]) = let
fun Pbody resumek = if (!flg)
then (fig := false)
else (waitq := !waitq C (resuinek]; dispatch())
in
Pbody
end
It follows that sync is implemented directly by callcc. The implementation of wrap must
feed the value produced by synchronizing on its first argument to its second argument,
which is done as follows:
132
fun wrap (evt, f) = fn k => (throw k (f (callcc evt)))
The continuation that applies f to its argument is passed to the event value being wrapped;
the result of evaluating f is then thrown to the continuation that is the argument to the
event value constructed by Wrap. The astute reader will recognize this as a convoluted form
of function composition.
10.4.1 Event value representation
Unfortunately, this simple representation of events is unable to support choice, guards or
abort actions. In the more general setting of CML events, there are five distinct aspects
to a thread synchronizing on an event values:
Forcing. If the event is a guard event, then it must be forced(i.e., the guard function is
applied).
Polling. For a non-guard event, the first step is to poll the base events to see if any of
them are immediately satisfiable.
Selection. If one or more of the base events is immediately satisfiable, then one of these
is selected and executed.
Logging. If there are no immediately satisfiable events, then the synchronizing thread must
be added to the waiting queues of the base events.
Unlogging. Once one of the base events is satisfied, the thread must be removed from the
other base events' waiting queues and their abort actions (if any) must be spawned.
Figure 10.3 gives the representation of event values, which reflects the five aspects described
above. An event value is either a guard function, or a list of base-event descriptors. A base-
event descriptor is a record of four fields: the function pollfn is used to test if the base
event is immediately satisfied; the function dofn is used to execute the base event if it is
selected; the function blockfn is used to log the base-event value; and the field abortfn is
either NO?ABORT or the abort action. In [Rep88], the informal semantics of PML events is
defined in terms of a rewriting system that converts events to a "canonical" form; this form
is essentially the above representation and the rewrite rules are the implementation of the
various combinators.
10.4.2 Syncbronization
As described above, there are five aspects to applying sync to an event value. These can
be divided into two phases. The first phase is forcing guards, and corresponds to the
133
datatype abort_fn = NO_ABORT I ABORT of (unit -> unit)
datatype `a base_evt = BASE_EVT of +
pollfn : unit -> bool,
dofn : abort_fn -> `a
blockfn : (bool ref * abort?fn * (unit -> unit)) -> `a,
abortfn : abort_fn
datatype `a event
= EVT `a base_evt list
I GUARD of (unit -> `a event)
Figure 10.3: The representation of event values
sequential evaluation rule for "sync (G e)" ill Chapter 7. The second phase corresponds to
the notion of event matching (Definition 7.3), and consists of polling and either selection
or logging (unlogging is done as part of the selection step). This involves accessing shared
data structures and so must be done inside an atomic region. The fact that the second
phase is done as an atomic operation greatly simplifies the implementation of sync (cf.,
Section 12.2.2). The actual implementation of the second phase is tuned for various common
special cases, such as singleton events and events without any abort actions, but, to simplify
the discussion, I describe the general case.
Forcing guards
The recursive forcing of guards is done by the following function:
fun forceGuard (GUARD g) = forceGuard (g ())
I forceGuard (EVT evts) = evts
Once the guards (if any) have been forced, forceGuard returns a list of base-event descrip-
tors (evts).
Polling
Polling the base events involves traversing the base-event list and calling the pollfn for
each element, while extracting the abort action. The polling step produces a status value
for each base event in the list. A base event's status is either ready or blocked, and either
with or without an abort action (see Figure 10.4). ff one or more of the base events is ready,
then the blocked base events are irrelevant. Thus, once the polling loop sees a ready base
134
type `a block_fn = (bool ref * abort_fn * (unit -> unit)) -> `a
datatype `a bevt_status
= BLK of `a block_fll
I BLK_ABORT of (`a block?fn * (unit -> unit))
I ROY of (abort_fu -> `a)
I ROY_ABORT of ((abort_fn -> `a) * (unit -> unit))
Figure 10.4: The representation of event status
event, it can discard the status of any blocked base events. The polling of an individual
base event's status is done by the following function:
fun pollBaseEvt (BASE?EVT?pollfn, dofn, blockfn, abortfn?) =
case (pollfn(), abortfn)
of (false, NO?ABORT) => BLX blockfn
(false, ABORT a) => BLK_ABORT(blockfn, a)
I (true, NO?ABORT) => ROY dofn
I (true, ABORT a) => ROY_ABORT(dofn, a))
Selection
If the resulting list of base-event statuses includes one or more ready base events, then one
of these is selected. The implementation uses a pseudo-random selection policy that gives
probabilistic guarantee of falrness. A global counter is malntalned; its value modulo the
number of ready events is used to select one of the events. The counter is incremented after
each selection and by the preemptive scheduler; the latter introduces a random element
that helps avoid any kind of resonance in the selection patterns. Once a ready base event
is selected, the abort actions of the other base events must be spawned and the dofn of
the selected base event must be executed. The order in which this is done is tricky, since
the dofn must be executed before leaving the atomic region and there is no guarantee
that it will ever return (e.g., if a tail-recursive wrapper is involved). The solution is to
pass the abort actions as an argument to the dofn, which invokes them immediately after
leaving the atomic region. The actual argument is a single abort action that spawns all of
the required abort actions. Section 10.4.3 describes the internals of the dofns of several
base-event constructors.
Logging
If no base event is ready then the base events must be logged. Logging a base event requires
c?apturing a continuation that, when thrown to, will spawn the abort actions of the other
135
base events and apply the base event's wrapper functions. The base event, thread ID, and
continuation together constitute a base-event instance. In order to understand the logging
process it is also necessary to see how the blocking function works. Figure 10.5 gives the
code for the logging loop and the skeleton of a typical blocking function. The logging loop
(* Logging loop *)
System.Unsafe.capture (fn k => let
val escape = System.Unsafe.escape k
val dirtyFlg = ref false
fun log (fl, ) = atomicDispatch ()
I log ((BLK bfn) :: r, i) = escape (
bfn (dirtyFlg, allAborts, fn () => (log(r, i); error [log]')))
I log ((BLK?ABORT(bfn, _)) :: r, i) = escape (
bfn (dirtyFlg, mkAbortFn i, fn () => (log(r, i+I); error "[log]')))
I log - = error "(log]"
in
log (sts, 0)
end)
(* A typical blocking function *)
fun blockFn (dirty, abort, next) = let
fun block k =
add the thread to the waiting list;
next())
in
case abort
of NO_ABORT => (callcc block)
I (ABORT a) => ((callcc block) before (a ()))
end
Figure 10.5: Event logging
is implemented in continuation-passing style; the third argument to a block function, called
next in the skeleton version, is a function that continues the logging loop. The function
error reports an internal error by ralsing an exception; its principal purpose is to make the
types work out. The functions Capture and escape are unsafe versions of callcc that do
not save or restore the exception handler continuation. They are required here in order that
the logging of base events not break tail recursion, which is important since the wrapper
functions often contaln tall-recursive calls (e.g., the buffered channel in Section 5.1).
136
Unlogging
Once a particular base-event instance of an event is selected, the other base-event instances
of that event must be unlogged. To support uniogging there is a boolean reference, called
the dirtyflag,for each event instance that is shared by its base-event instances. When one
of the base-event instances is chosen, the flag is set to true, which marks all of the instances
as being dirty. For the channel operations, the marking of the flag is done by the functions
that remove items from the waiting queues (e.g., cleanlndRemove).
This trick of using a shared reference to mark the dirty instances was invented by Norman
Ramsey [Ram90]. The reason for using this technique, instead of an explicit unlogging loop,
is that it is simple and is constant time (since the cleaning of dirty items can be charged
to their insertion operation). Unfortunately, there are certain situations in which this trick
can result in unbounded heap growth. For example, if a thread is continuously selecting
between communication on two channels, where one of the chaunels is never used, then the
unused channel's waiting queue will be filled with dirty requests that are never removed. To
avoid this problem, the channel must be cleaned when items are inserted, which destroys the
constant-time bound on cleaning overhead. The dirty-flag technique still has the advantage
of simplicity.
10.4.3 Base-event constructors
The simplest example of a base-event constructor is always, which builds an event that is
always ready for synchronization. Its implementation is given in Figure 10.6. As expected,
fun always I = let
fun doFll abortAct =
atomicEud();
case abortAct of (ABORT a) => a() 1 - => ();
in
EVTLBASE?EvT?
pollfn			=
dofn			=
blockfn =
abortfn =
end
(fn () => true)
doFn,
(fn - => error "Ealways]?),
NO_ABORT
Figure 10.6: The implementation of always
the pollfn always returns true. The dofn is minimal; it leaves the atomic region, spawns
the abort action (if any), and returns the argument with which the event value was created.
137
Since pollfn always returns true, blockfn is never called, and since this is a base event,
there is no abort action.
A more complicated base-event constructor is transmit; the code for this is given in
Figure 10.7. The implementation of transmit should be comp&ed to the implementation
fun transmit (CHAN?inq, outq?, msg) = let
fun pollFn () = (clean inq; isEmpty inq)
fun doFn abortfn = let
val (rid, rkont) = remove inq
fun doit k =
rdy?Insert (!ruuningmreadld, k);
runningTnreadld			rid;
atomicEnd();
throw rkont msg)
in
case abortfn
of NO_ABORT => callcc doit
I (ABORT f) => (callcc doit; f())
end
fun blockFn (flg, abortfn, next) = let
fun block k =
insert(outq, flg, (getTid(), msg, k));
next(); error `(transmit]")
in
case abortfn
of NO_ABORT => (callcc block)
I (ABORT f) => (callcc block; f())
end
in
EVT(BASE?EVT?
pollfn = pollFn,
dofn = doFn,
blockfn = blockFn,
abortfn = NO_ABORT
end
Figure 10.7: The implementation of transmit
of send in Figure 10.2. The pollin plays the role of the case in send; it cleans an the
head of the channel's inq and returns true if there is an outstanding input request. The
dofn corresponds to the case in send where there is a clean item in the queue; the sending
thread is enqueued in the ready queue, a request is removed from the queue (note that
remove takes care of marking the dirty flag) and the message is thrown to the accepter. If
there are abort actions, then they are spawned by the sending thread's continuation. The
blockfn corresponds to the case in send where there are no pending input requests. The
138
blockfn inserts its continuation (which embodies any wrappers) in to the channel's outq
and continues the logging loop. As with the dofn, any abort actions are spawned by the
sender's continuation.
10.4.4 Event combinators
In tenu? of implenie??ation, the simplest combinator is guard, which has the implementa-
tion:
val guard = GUARD
The implementations ofthe various other event combinatorsmust deal with guarded event
values. As discussed in Sections 4.5.1 and 7.2, the guard function is essentially a delay
operation. When an event combinator is applied to a guard event, the guard is lifted to the
top level. For example, the implementation of the wrap combinator handies the GUARD case
as follows:
fun wrap (GUARD g, f) = GUARD(fn () => wrap (g(), f))
I wrap ...
When the guarded wrapper is forced, g() will be evaluated to an event value that will be
wrapped by f. The implementations of wrapHandler and wraplbort handie GUARD in a
similar fashion. The implementation of chooseis a little more complicated and is discussed
below.
The actual implementation of the wrap combinator is semantically similar to that de-
?cnbed p?vwuisly, but th? ?uplen??i??tion details &? quite diffe?L?. The wrapper function
must be composed with the dofn and blockfn fields of each base-event descriptor. This is
done by mapping the following function across the list of base-event descriptors:
fun wrapBaseEvt (BASE?EVT?po11fn, dofn, blockfn, abortfnl) =
BASE_EVT?
pollfn = pollfn,
dofn = (f 0 dofn),
blockfn = (f 0 blockfn),
abortfn = abortfn
where f is the wrapper function.
The wrapHandlercombinator must also compose its wrapper with the doin and blockin
fields of each base event, but the composition involves interiecting an exception handier.
The function for wrapping a handier is:
139
fun wrapBBaseEvt (BASE?EVT?pollfn, dofn, blockfn, abortfni) =
BASE?EVT?
pollfn = pollfn,
dofn = fn x => ((dofn i) handle e => h e),
blockfn = fn I => ((blockfn x) handle e => h e),
abortfn = abortfn
where h is the handler function being wrapped.
The choose combinator is fairly straightforward to implement. It essentially takes a list
of lists and flattens them into a single list. ff any one of the events in the argument list
passed to choose is a guard event, then the guard is lifted to the top-level. Also, care must
be taken to preserve the left-to-right order of evaluation of guards.
fun choose 1 = let
fun f ((], el, fl) = EVT el
f (E], el, gl) = let
val applyGuards = revmap (fn g => (g ()))
in
GUARD(fn () =>
choose ((EvT el) :: (applyGuards gl)))
end
((EvT el?) :: r, el, gl) = f (r, el' ? el, gl)
((GUARD g) :: r, el, gl) = f (r, el, g::gl)
If
If
in
f (1, (], E])
end
The implementation of wrapibort is the most interesting of the combinators. When
wrapibort is applied to a singleton event (i.e., an event consisting of exactly one base
event), the implementation simply adds the abort action to the base event, which is done
by the foliowing function:
fun addAbortFn (BASE?EVT?pollfn, dofn, blockfn, abortfn?, a) =
BASE?EvT?
pollfn = pollfu,
dofu = dofn,
blockfn = blockfn,
abortfn = (case abortfn
of NO?ABORT => a
I (ABORT a') => fn () => (spawn a'; a()))
The more complicated case is when wrapibort is applied to an event vaiue consisting of n
base events, where n > 1. The semantics of wrapibort require that the abort action be
spawned only in the case that none of the base events is chosen. This must be implemented
140
ill terms of the individual base-event abort actions, which are spawned if their base event
is not chosen. lii other words, the abort functions of the base events must coordinate to
implement the abort action of the wrapped event. The way this works is that the abort
actions are partitioned into a single leader action and n --H 1 follower actions. A special
channel is allocated for these threads to communicate by. Each follower sends an "I am
here" message to the leader; the leader attempts to read n --H I messages and then executes
the abort action. ff any one of the base-event actions is not spawned, i.e., because the
corresponding base event is the selected one, then the abort action does not get executed.
The channel used by these threads must be allocated anew for each synchronization attempt,
so the creation of the abort actions is protected by a guard. The actual implementation of
the resulting event value is:
GUARD(fn () => let
val ackCh = channel()
fun addFollowerAbortFn b =
addAbortFn (b, fn () => send(ackch, ()))
val n = length followers
fun leaderAbort 0 = abort()
I leaderibort i = (accept ackch; leaderAbort(i-1))
in
EvT(
(addAbortFn (leader, fn () => (leaderAbort II)))
(map addFollowerAbortFn followers))
end)
where leader and followers are, respectively, the base-event descriptors of the leader and
follower abort actions.
10.5 Implementing 1/0
1/0 operations pose two problems for concurrent programming systems: first, the 1/0
devices (ffle descriptors in UNIX systems) are a form of shared state, and thus require
concurrency control; and, second, input operations (and in some cases output operations)
have the potential to block. As described in Sections 4.4 and 4.6, the implementation of
CML supports 1/0 at two levels: synchronization on UNIX ffle descriptor conditions and a
concurrent version of the SML stream 1/0 interface.
10.5.1 Low-level 1/0 support
The low-level 1/0 base-event constructors (e.g., synconinput) provide a mechanism similar
to that of the UNIX system call select [UN186], and, in fact, are implemented using this
141
system call.
A global 1/0 waiting list is maintained by the implementation, with each entry corre-
sponding to a particular instance of an 1/0 base-event value. Each time the preemptive
scheduler is called, it dispatches a continuation that checks the status of the file descriptors
in the 1/0 waiting list. This is done by projecting out the file descriptors of the non-dirty
event instances in the waiting list and building the corresponding file descriptor sets. A
call to the select system call is made to poli the file descriptors, which returns the set
of ready descriptors. The threads waiting on the ready descriptors are then added to the
ready queue.
The only complication to this scheme is handling 1/0 errors; e.g., if one of the files has
been closed. In such a case, the system call select returns an error code, but no specification
of which file descriptor is the source of the error. Since this situation is relatively rare, a
moderately expensive, but simple, linear search for the bad file descriptors is used. Each file
descriptor is tested by a call to the ftype system call, which returns an error if, and only if,
the file descriptor is the source of the error. This error is then mapped back to the blocked
thread by raising an exception in its context. This requires saving two continuations in the
block function; one for successful synchronization and one for error condition. To make this
all concrete, the 1/0 waiting list data structure and the syuchonlnput event constructor's
block function are given in Figure 10.8. Each io?item in the waiting list corresponds to
a pending 1/0 base-event instance, and contains the file descriptor, type of operation, the
waiting thread ID and the two possible continuations.
10.5.2 Stream 1/0
CML includes a structure ClO, which implements a concurrent version of SML 1/0 streams
(see [Rep9Ob] for a complete description). There are two types of 1/0 streams, in-streams
and out-streams, which provide buffered input and output operations. Each open stream
is represented by a thread, which implements the buffering. For out-streams, the protocol
is straightforward and uninteresting.9 The in-stream protocols, however, are an example of
the advanced use of events to build complex communication abstractions.
Because input operations might block indefinitely (e.g., while waiting for the user to
enter a line of text), it is necessary to provide an event-valued interface to in-streams. The
ClO structure includes the foliowing operations:
val inputEvt : instream * int -> string event
val inputLineEvt : instream -> string event
val lookaneadEvt : instream -> string event
?The implementation makes the simplifying assumption that write operations are non-blocking.
142
datatype io?operation_t = IO?RD			10_WR
type io_item =
fd			: int,			(* the
io?op			: io?operation?t,			(* the
id			: thread_id,			(* the
kont			: unit cont,			(* the
err_kont			: unit cont,			(* the
dirty			: bool ref			(* the
file descriptor *)
kind of operation *)
waiting thread's id *)
successful continuation *)
error continuation *)
dirty bit *)
val ioWaitList = ref ((] : io_item list)
fun inputBlockFn (flg, abort, next) =
callcc (fn okay?k => (
callcc (fn err?k => (
ioWaitList
fd=fd, io?op=IO?RD, dirty=flg, kont=okay?k,
err_kont=err?k, id=getTid()
:: !ioWaitList;
next()));
(* continue here on an error
applyAbortFn abort;
raise (TnvalidFileflesc fd)));
(* continue here on success
applylbortFn abort)
Figure 10.8: Low-level 1/0 support
The function inputEvt builds an event value for reading a specified number of characters,
inputLineEvt builds an event value for reading a line of input, and lookaheadEvt builds an
event value for examining the next character to be read from the buffer. As an illustration,
the following function either reads a line of input or times out:
fun getlnswer t = select [
wrap (inputLineEvt std?in, SONE),
wrap (timeout t, fn () => NONE)
In order for code of this sort to work properly, the implementation of the in-stream event
constructors must satisfy the following two requirements:
(1) The corumit point of the event must correspond to the availability of input that
satisfies the request.
(2) Input must never be lost or discarded.
The implementation uses a request-reply protocol (a simple version of this is described in
143
Section 4.5). lii order to meet requirement (1), the commit point must be the server's
reply, which means that the request must be generated by a guard. Meeting requirement
(2) means that the server thread must be informed that the request has been aborted.
This is the scenario discussed in Section 4.5.2, and the client-side code is similar to that of
clientCallEvt5. For example, the client-side implementation of inputLineEvt is:10
fun inputLineEvt (INSTRM?req?ch, ...?) = guard (fu () => let
val abortCh = channel() and replych = channel()
in
spawn (fn () =>
send (req?ch, INPUT?LN(receive abortCh, replych)))
wrapAbort (receive replych, fn () => send(abortch, ()))
end)
The event value constructed by this function is a guard that sends a request to the server
consisting of the operation, an abort event and a reply channel. The commit point of this
event is receiving a reply from the server. If the client synchrouizes on some other event,
then the abort action sends an abort message to the server. On the server side, when a
request comes in the server attempts to satisfy it --H either from the input buffer or by
requesting input from the operating system. Once the server can satisfy the request, it
synchrouizes on the choice of sending the input as a reply and receiving an abort message
on the abort channel. lii the latter case, the input is reinserted into the buffer.
10.6 Implementation improvements
The current implementation of CML uses practically no specialized run-time or compiler
support. There are a number of techniques that could be used in the run-time system and
compiler that would improve performance.
A very simple modification, which requires little work, is to introduce a dedicated register
for referring to the thread ready queue. This would have two benefits: it would reduce
memory traffic, and it would eliminate the store-list allocations associated with ready queue
updates. Using the varptrregister to refer to the current thread ID, instead of a global ref
variable, improved the speed of thread context switching by over 10%. This suggests that
thread context switching might improve by as much as 20% by use of a dedicated ready
queue register. And as the clock speed of RJSC processors increases, the potential savings
become larger.
In the PML compiler, the event-value constructors and channel operations are explicitly
represented by primops. The PML compiler does very aggressive intermodule inlining, thus
10The aclual implemen?ahon uses a condition variable for the abort message (see Section 5.3).
144
it is able to recognize applications of syncto static event values, which can be optimized into
more efficient operations. For example, sync (receive ch) can be replaced by accept ch,
which is about 40% faster. A more modest improvement (about 5%) is achieved by inline
expansion of sync, transmit and receive.
It is also possible for compilers to recognize more complex communication patterns. A
common example is the use of a thread to encapsulate state, with an RPC interface. In
this case, each time the server thread is dispatched to handle some operation, there is a
client thread that is suspended, waiting for the reply. A more efficient implementation
of this pattern is to use a monitor (see Section 3.2.2) to encapsulate the state.11 When
there is no contention, monitors avoid the necessity of context switches on entry and exit;
a comparison of the costs in the context of Ada can be found in [EHP80]. Instead of
providing monitors at the language level, it may be possible for the compiler to detect when
a monitor is snitable and translate the RPC operations to monitor calls; this has been done
for Ada rendezvous [HN80]. Unlike Ada, CML does not have the syntactic signposts
that mark RPC-style interactions, since they are derived operations. Thus, automatically
recognizing that an RPC operation can be single-threaded is a very difficult problem. Other
communication patterns that might be recognized by the compiler include channels that
are used exactly once and channels that are used for point-to-point commun1cation. For
example, consider an R?PC protocol in which the reply channel is dynamically allocated for
each request (e.g., the implementation of input streams described in Section 10.5.2). If the
reply channel is used exactly once, then it can be replaced with a condition variable, which
reduces communication overhead by 30%.
T??? may not be the case on non-uniform memory access multiprocessors, where the client and encap?
11
sulated data are on different processors (see Chapter 12).
145
146
Chapter 11
Performance
In the pr?Vioub' chap?e, I described th? nupThui?ntation of CML jit detail; ill this chapter,
I report various performance measurements that I have made of this implementation. The
measurements include timing results for a collection of small benchmarks on three different
workstations, and instruction counts for these benchmarks on the MIPS R3000 processor. In
addition, I compare the performance of CML with the `zSystem, a C-based thread package.
These measurements show that CML provides a high-level notation at a competitive price.
11.1 The benchmarks
I have conducted a series of benchmarks on three different machines, representing three
different processor architectures. Table 11.1 summarizes the features of these computers.
The benchmarks measure the cost of low-level concurrency operations, such as sending a
Table 11.1: Benchmark machines
NeXT			SPARC 2			DEC 5000
Full name			NeXT Cube			SPARCstation 2			DECstation 5000/120
Processor			25MHz 68040			40MHz SPARC			20MHz R3000
Memory			24Mb			64Mb			16Mb
Operating System NeXTstep 2.1			SunOS 4.1.1			ULTRIX 4.2
message, so to get accurate numbers I measured the time to perform 100, 000 operations.1
For each benchmark, I measured the CPU time spent executing the program (both user
and system) and the time spent in the garbage collector. All times are in micro-seconds.
The benchmarks were run using version 0.75 of SML/NJ (released November 11, 1991),
and version 0.9.6 of CML (released October 11, 1991).
110,000 iterations of a loop of 10 operations.
147
The benchm&ks &e logically divided into two groups; the first measure the basic con-
currency primitives:
Thread switch. This measures the cost of an explicit context switch.
Thread spawn/exit. This measures the time it takes to spawn and run a null thread. It
includes the cost of two context switches: the spawn operation switches control to the
newly spawned thread and a terminating thread must dispatch a new thread.
Rendezvous. This measures the cost of a send/accept rendezvous between two threads.
Event rendezvous. This is an implementation of the rendezvous benchm&k using sync
composed with transmit and receive, instead of send and accept.
The second group measures the cost of several different versions of an RPC implementation
of a simple service. The service is essentially a memory cell; a transaction sets a new value
and returns the old value.
RPC. This uses send and accept to implement the protocol. The client-side code is:
fun call x = (send (reqch, x); accept replych)
Event RPC. This implements the protocol as an event value. The client-side code is:
fun call x = sync (
wrap (transrnit(reqch, x), fn () => accept replych))
Fast ILPO. This uses a condition variabTh (see Section 5.3) to implement a fast, asyn-
chronous reply. The client-side code for a call is:
fun call x = let val replyvar = condvar()
in
send (requn, (x, replyvar));
readvar replych
end
11.1.1 Timing results
The measured times for all of the benchm&ks are given in Table 11.2. Each entry has the
form t + g, where t is the combined user and system time for the operation and g is the
amortized garbage collection overhead. Each entry is the average of five test runs, and there
was little deviation between runs. Real-time measurements were only slightly higher than
than the CPU time measurements.
148
Table 11.2: CML benchmarks
Time in 1LS I Operation
(program + garbage collection)
Operation			NeXT			SPARC			2			DEC 5000
Thread switch			23+4			18+4			15+7
Thread spawn/exit			47+7			46+7			42+13
Rendezvous			54+10			50+7			49+20
Event rendezvous			110+12			90+9			88+23
RPC			105+20			95+15			90+38
Event RPC			171+21			134+15			125+39
Fast RPC			79+11			68+9			65+23
11.1.2 instruction counts
Andrew Appel modified the MIPS code generator to generate instrumentation that counts
the number of executed instructions. Using this mechanism, I measured the instruction
counts for the various benchmarks. Table 11.3 gives the results of these measurements.
These numbers do not include the loop or garbage collection overhead.
Table 11.3: MIPS instruction counts
Operation Instructions / Operation
Thread switch			197
Thread spawn/exit			483
Rendezvous			558
Event rendezvous			934
RPC			?? 008
Event RPC			1,346
Fast RPC			711
These numbers are higher than one might expect (particularly for the thread switch
and creation operations). In [Rep91a], I reported that a thread switch requlred around
100 instructions on the SPARC processor, which is about half of what I measured for the
MIPS. Since the MIPS and SPARC are both RISC processors, the difference is not one
of instruction sets, but rather is because of changes in the SML/NJ compiler (Appel,
personal communication, December 1991). There is some hope that future improvements
in the compiler will reduce the instruction counts to more reasonable values (for example,
a thread context switch should requlre no more than 75 instructions).
149
11.2 Analysis
The measurements show that the penalty for using abstract interfaces (i.e., hiding channel
communication ill event values) is acceptable. Table 11.4 gives the ratio between the non-
GC time of the event version and the non-event version of the two communication protocols
I benchmarked. For a simple rendezvous, the performance cost of using events is about 80%;
Table 11.4: Cost of abstraction
Protocol
Rendezvous			RPC
Machine			cost (1LS)			ratio			cost (?S)			ratio
SPARC 2			40			1.8			39			1.4
DEC 5000			39			1.8			35			1.4
for the RPC it is 40%. The reason for the lower impact on the RPC protocol cost is that
only o?e of the twu ?u?imunications is being represented by an event value. In general, only
the communication that is the commit point needs to be implemented using an event value;
communications in tfle guard and wrapper can be implemented using send and accept.
11.2.1 Garbage collection overhead
The high garbage collection overhead in these benchmarks is mostly a result of the way the
current SML/NJ collector, which is a simple generational collector, keeps track of inter-
generational references [App89]. Each time a mutable object is updated, a record of that
update is added to the store list. This store list is examined for potential roots at the begin-
ning of each garbage collection. The implementation of CML uses a small number of very
frequently updated objects: the thread ready queue, current thread pointer and channel
waiting queues. This "hot-spot" behavior is the worst-case scenario for SML/NJ's col-
lector, destroying the 0(1LIVE1) normally expected from copying collection. The collector
also suffers from the problem of poor "real-time" responsiveness.
11.3 Comparison with the ?System
To put these measurements into perspective, I implemented a similar set of benchmarks in
version 4.4 of the ?System, which is a 0 light-weight process library ?BS90]. The ?System
provides threads and a request/reply communication primitive (it also has shared-memory
primitives), but it does not have selective communication. It runs on the SPARCstation
and DEOstation, but not on the NeXT. Table 11.5 reports the results for the SPARCstation
150
Table 11.5: ijsystem benchmarks
SPARC 2			DEC 5000
Operation			Time (ijS)			Ratio			Time (iS)			Ratio
Task switch			62			2.8			13			0.6
Task create			161			3.0			59			1.1
Send/receive			127			2.2			42			0.6
Send/receive/reply			128			1.7			43			0.5
and DECstation. As before, the times are given in micro-seconds and represent the sum
of the user and system CPU times; obviously there is no garbage collection overhead. The
column labeled "ratio" gives the ratio of the ?System and CML times (including g&bage
collection overhead); a ratio greater than 1.0 means that CML is faster.
On the SPARCstation, CML is unilormly faster than the `tSystem. The principal
reason for this is that SML/NJ does not use the SPARC's register windows, and thus does
not have to flush them on a thread switch. The comp&ison for the DECstation is not as
favorable, but CML is still competitive, even though the ?System provides a lower-level
concurrency model (no selective communication, lor exampl( . This shows that we can have
the advantages of the higher-level language without sacrificing performance.
151
152
Chapter 12
Multiprocessors
While the main thrust of this dissertation is the study of concurrency as a tool for structur-
ing programs, it is worth considering the issues associated with a possible multiprocessor
implementation of CML. In this chapter, 1 survey various parallel language features and
parallel programming techniques and discuss how they might apply to parallel programming
in CML. I also discuss the implementation issues that must be addressed in a multipro-
cessor implementation of CML, and finally summarize the prospects for multiprocessor
CML.
For purposes of this chapter, 1 assume a multiple-instruction multiple-data (MIMD)
machine with a shared address space. There are many experimental and commercial ex-
amples of these machines, and it is reasonable to expect that they will appear on desks
in the near future. Although these machines provide a shared-memory model, they usu-
ally also have non-uniform memory access (NUMA); e.g., each processor may have a local
memory or at least a cache. Because of NUMA, maintaining locality is important for good
performance, and as the number of processors increases NUMA effects become more pro-
nounced. Programs written in a message-passing language typically have good locallty, and
can out-perform the shared-memory versions [LS9o].
There are several benefits to be obtained from a multiprocessor implementation of CML.
Although most existing applications, such as eXene, have fairly limited amounts of par-
allelism (typically ouly a few ready threads at a time), a multiprocessor implementation
should result in noticeable performance improvements for many existing applications. Ow-
icki reports improvements for these kinds of applications written in Modula-2+ running
on a Firefly multiprocessor [Owi89]. In particular, she mentions that the Trestle window
system exploits the multiprocessor by pipelining graphics operations. EXene is designed
with some pipelining, so that running on a multiprocessor should improve its performance.
And, of course, all programs, including highly sequential ones, will benefit from parallel
153
garbage collection.
With a multiprocessor implementation available, it becomes reasonable to implement
parallel algorithms. Some examples are parallel attribute grammar evaluation [Zar90], par-
allel theorem provers [BCLM89], and parallel graphics algorithms [Gre9l].
12.1 Parallel programming in CML
This section examines the use of CML as a parallel programming language. There are two
parts to this discussion: first, I discuss several parallel programming teclmiques and how
they might apply to CML; and second, I describe possible extensions to provide better
support for parallel programming. The proposed extensions do not represent changes in
the semantics of CML; rather, my approach is to define new commuhication operations
that can be derived from the CML primitives, but that are also amenable to efficient
implementation on multiprocessors.
12.1.1 Pipelining and data-flow
One class of parallel programs naturally expressed in CML are those programs that can
be structured as data-flow networks [KM77]. A data-flow network consists of a graph of
computation nodes, where the edges are commuhication llnks (Landin's streams [Lan65] are
a precursor to this). A data-flow graph does not have to be static; a computation node can
be replaced by subgraph having the same 1/0 interface.
Data-flow graphs provide parallellsm in two ways. If two computation nodes do not
depend on each other for data, then they can compute in parallel. Even if there is a
dependency, if they operate on a sequence of values then they form a pipeline. In order for
a data-flow network to be efficient, the granularity of the operations at individual nodes
must be large enough to compensate for the commuaication overhead.
Using threads for the computation nodes and channels for the edges, a data-flow network
can be directly implemented in CML. For example, eXene's thread network is essentially
a data-flow network (see Section 9.1.3). Another example is given in [R?ep90b], where I
describe a pipellued implementation of the Sieve of Eratosthenes using CML.
A nice fflustration of the use of data-flow networks is given by Mcllroy in [McI90], where
he describes the use of processes and channels to compute power series (this was actually
suggested in [KM77], but without implementation details). Mcllroy represents a power
series as a stream of rational coefficients. For example, the power series for the exponential
154
function
i			12			13
2			6
1 + x + -z + -z +..
can be implemented in CML as follows:
fun e			= let
val ch = channel()
fun loop (i, ifact) =
send(ch, fnum=1, denom=ifactl);
loop(i+1, (j+1)*ifact))
in
spawn (fn () => loop(O, I));
ch
end
Using this representation, operations such as addition and multiplication of power series
can be coded up as networks of threads (see [McI9O] for details).
12.1.2 Controlling parallelism
One of the key problems in writing parallel programs is avoiding excessive parallelism. A
basic technique in many parallel programs is to divide a problem into two or more pieces and
to spawn a thread for each piece. If there are many more pieces than processors, then this
technique leads to excessive parallelism and the cost of thread management can dominate
the execution time. Premature limiting of parallelism, however, can result in starvation;
i.e., idle processors without any work.
Work crews
One approach to limiting excessive parallelism in concurrent programs is the work crew
abstraction [RV89]. In this scheme, a fixed set of threads, called workers, execute jobs
taken from a queue,1 where a job is a piece of computation. When a worker gets a job that
can be computed in n parallel pieces, it chooses one piece for itself and generates n --H 1
help requests for the remainder. When the worker finishes the its piece of work, it then
checks to see if the help requests have been answered. If not, then the worker computes the
next piece of the job, and so on until the job is completed. After completing the job, the
worker looks for help requests from other workers; if it finds one, that becomes its next job.
Figure 12.1 sketches the code for a job consisting of three pieces. In addition to providing a
mechanism for limiting parallelism, work crews also have the important property of breadth-
first parallel decomposition, which results in coarse-grain parallelism. And, since jobs are
lin [RV89], the term tasks is used to refer to jobs.
155
fun job () =
requestReip
requestHelp
do job1;
if nollelp(job2)
if nollelp(job3)
(job2);
(job3);
then dojob2 else ();
then dojob3 else ())
Figure 12.1: Work crew job decomposition
decomposed eagerly, worker starvation is avoided. When no extra workers are available for
a job, execution reduces to the standard sequential order.
The structuring of job decomposition in a breadth-first manner is probably the most
important benefit that work crews would provide CML. Since thread creation and space
overhead are low, a CML implementation of work crews could use a thread for each job,
but ouly enable a subset of threads to run at any time. A "token" mechanism could be
used for this, where each job (i.e., thread) would wait for a token before executing, and
would pass the token to the next job when it completed. In this scheme, a token holder
corresponds to a worker.
Futures
The semantics of futures in concurrent Lisp systems provide another opportunity for lim-
iting parallelism. Consider the general form of a future creation:
K (future e)
where K is the context of the future call. The keyword future can be viewed as an
annotation, which tells the compiler that e is a good candidate for parallel evaluation. The
actual evaluation of e,however, can be immediate (called inline evaittation), in parallel with
or when K demands its value. One approach is to choose dynamically between inline
and parallel evaluation of e based on the current load; this is called load-based inlining
[KH88, MKH91]. A problem with this approach is that the rate of thread creation in a
program may not be uniform, so a decision to inline a future at one point may lead to
starvation later. Furthermore, load-based inlining can introduce deadiock [MKH91].
An alternative to load-based inlining is lazy task creation, which is is a scheme that
always inlines the evaluation of e, but saves enough information to spawn a thread to
evaluate K in parallel if the number of ready tasks falls below the number of processors
[MKH9l]. This scheme is qulte similar to work crews in its effect, but requlres less effort
156
by the programmer. Futures with lazy task creation can be implemented in CML fairly
easily. Assuming that we have a global channel
val continue : unit chan
for sparking new threads, then the future operation is implemented as follows:2
fun future f I = let
val resVar = condvar()
in
spawn (fn () => writeVar (resvar, f x));
select E
receive continue,
wrap (readVarEvt resVar, fn - => ())
readVarEvt resVar
end
The idea is that an idle processor sends a message on the continue channel to wake up
some waiting thread. Since channel communication is FIFO, this results in the desired
breadth-first problem decomposition. Of course, it would be much more efficient to directly
implement futures and lazy task creation using callcc and the techniques of [MKH91]. For
CML, a principal advantage of lazy task creation is that it doesn't introduce deadlock; even
in the case when the body of a future attempts to synchronize with the future's parent.
12.1.3 Speculative parallelism
Certain classes of parallel programs, such as parallel search, use speculative parallelism to
improve performance [Osb89]. For example, consider the problem of finding an item in a
balanced binary tree; by searching subtrees in parallel, the running time of the search is
reduced from 0(n) to O(logn), given a sufficient supply of processors. Although this exam-
pie is trivial, it is illustrative of problems arising in appilcations such as theorem provers.
In addition to the problem of controlling excessive parallellsm (discussed in Section 12.1.2),
there is the problem of terminating unnecessary computations (e.g., once one thread has
found the item, there is no reason for the others to keep searching). CML does not cur-
rently support the asynchronous termination of threads, thus it would be necessary to add
kill operation on thread IDs. The other aspect of thread termination is recognizing which
threads need to be killed. It is also important to note that the speculative threads must be
referentially transparent, otherwise killing them changes the semantics of the program.
2Note that this version is simplified by ignoring the issue of exceptions (cf., Section 5.5).
157
One attractive, but tricky, approach is to garbage collect those threads that are able to
run, but are irrelevant to the future execution of the program.3 One way to do this is to give
the garbage collector special knowledge about channel and thread objects, which allows it
to trace thread interconnections [BH77, KNW90]. The problem with using this technique
for CML is that it does not interact well with the implementation of threads as ordinary
SML values. Another strategy, which might be more suitable for CML, is judicious use
of weak pointers in the representations of some of the concurrency objects [ME89]. Weak
pointers, which are already supported by SML/NJ, are a way to hold a reference to an
object while allowing the garbage collector the option of collecting it. If, during a garbage
collection, the only references to an object are weak pointers, then the garbage collector
collects the object and nullifies the weak pointers that refer to it. Using weak pointers,
a future mechanism can be implemented that gives the parent thread a strong pointer to
the future object and gives the child thread a weak pointer to it. If the parent discards its
reference to the future object, then the child's weak pointer is nullified, and, using object
finalization [Rov85], the child thread is collected. A similar scheme is described in [ME89].
A simple technique, which has similar utility to the weak pointer scheme above, is to
exploit the guard and wrapibort combinators to implement a speculative fork operation:
fun fork f x = guard (fn () => let
val cv = condvar()
val id = spawn (fn () => writevar(cv, f x))
in
wrapAbort (readvarEvt cv, fn () => kill id)
end)
Using multiple instances of this in a select implements or-parallelism. The first thread to
finish provides the answer and triggers the abort actions of the other choices, which kill the
other threads. For example, the following function sorts a list while testing to see if it is
already sorted in parallel:
fun fastSort 1 = select (
fork (fn () => sort 1),
fork (fn () => if (isordered 1) then 1 else exit())
where sort is some sorting function and isOrdered tests a list to see if it is sorted.
3The CML implementation collects unreachable blocked threads, but not threads that are ready (i.e.,
threads that are reachable via the ready queue).
158
12.1.4 I-structures
The parallel programming functional language Id provides a form a mutable state called
I-structnres [ANP89, Nik9l]. I-structures come ill various flavors, including aggregate struc-
tures such as arrays.
Condition variables are essentially the value-return mechanism of a future. Eutures have
been promoted as a useful mechanism for parallel progranirning by the Lisp community
(e.g., [Hal85] and [KH88]). Although, as discussed in Section 5.5, futures can be imple-
mented using channels, an implementation based on condition variables has the significant
advantage of avoiding a context switch each time the value of the future is read.4
Condition variables are an example of what are called I-strnctnres in the parallel lan-
guage Id [ANP89, Nik9i]. Id provides I-structures in various flavors, including aggregate
structures such as arrays. A discussion of the use of I-structures in parallel programs and
some small example programs can be found in [ANP89].
12.1.5 M-structures
Another form of state supported by Id is the M-strnctttre [BNA91]. Like I-structures, M-
structures are either empty or full. There are two basic operations on M-structures: pnt,
which initiallzes a cell and, llke I-structures, raises an exception if the cell already has a
value; and take, which removes and returns the contents of a cell (making it empty). The
take operation forces synchionization, since a thread may have to wait for another thread
to put a value into the cell. This is similar to the "1/0 ports with memory" described
in [KS 79]. M-structures can also be viewed as finitely buffered asynchronous channel with
only one slot. In CML, M-structured variables have the following interface:
type `a mstnict
val mstruct : unit -> `la rnstruct
val put : (`a mstxuct * `a) -> unit
exception Put
val takeEvt : `a mstruct -> `a event
val take : `a rnstruct -> `a
Since M-structures are mutable, the allocation function is weakly polymorphic. As men-
tioned above, the take operation involves synchronization, so an event-valued form is also
provided. If a put is attempted on a full cell, the exception Put is raised.
4Condition variables have proved quite useful in CML programs when "single-shot" corumunication is
required (e.g., for abort messages, see Section 10.5.2).
159
M-structures can be updated atomically if threads use the following update protocol:
put (m, f (take m))
where f computes the new value of m from the previous value. The reason why this
update is atomic is that the take operation locks the variable against take operations by
other threads (this is similar to the safety of shared request-reply channels discussed in
Section 4.2). Paul Barth at MIT has developed a number of parallel algorithms in Id using
M-structures (Barth, personal communication, August 1991); some examples can be found
in [BNA91].
M-structures can be defined as a derived feature in CML; Figure 12.2 gives an imple-
mentation of the above interface. In a parallel implementation of CML, M-structures might
datatype `a mstruct = M of ?
full_ch : unit chan,
take?ch : `a chan,
put?ch : ?? chan
fun
mstruct () = let
val fuliCh = chaunel()
val takeCh = chaunel() and putch = chaunel()
fun undefined () = defined (accept putch)
and defined v = select (
wrap (transmit(takech,
wrap (transmit(fullCh,
in
v), fn () => undefined()),
()), fn () => defined v)
spawn undefined;
M?full_ch = fullCh, take_ch = takeCh, put?ch = putchl
end
fun takeEvt (N?take?ch, .. .1) = receive take_ch
fun take (N<take?ch, ...1) = accept take_ch
exception Put
fun put (N?full?ch, put_ch 1, x) = select [
wrap (receive full?ch, fn () => raise Put),
transmit (put?ch, x)
Figure 12.2: CML implementation of M-structure variables
be implemented directly on top of the low-level shared-memory primitives, making them
very efficient. Other operations that Id supports on M-structures, such as a non-destructive
read operation, could also be directly supported.
160
12.2 Multiprocessor implemeutation
Designing and building a high-performance multiprocessor CML implementation is a major
research project in its 0Wll right, and I leave it for future work. It is possible, however, to
identify and discuss some of the major implementation issues.
12.2.1 Concurrency control
The uniprocessor implementation described in Section 10.4 relies on a single global mutex
lock for guaranteeing atomic access to the channel and thread data structures (see Sec-
tion 3.2.1 for a description of mutex locks). On a uniprocessor, using a global lock is the
most efficient approach, since it reduces locking overhead and does not cause any loss of
parallelism. For multiprocessors, however, a single global lock is likely to cause contention
and idie processors. For example, on a four processor machine (P?1,2,3,4?), a thread on P1
should be able to communicate with a thread on P2 in parallel with communication between
threads on P3 and P4. This means that channels must be locked independently.
Different multiprocessors provide different kinds of support for locking. A common
mechanism is the test-and-set instruction, which atomicaliy applies the foliowing function
to a word:
fun testlndSet w = if !w then true else (v true; false)
This operation can be used to implement a spin-lock, which is busy-waiting mutex lock:
fun aquirespinLock v = if (testindSet v)
then (aquirespinLock v)
else ()
fun releaseSpiuLock v = (v false)
A more sophisticated implementation might use exponential back-off or other techniques
to improve performance (see [And89] and [CS9l] for a comparison of locking techniques).
Some multiprocessors do not provide hardware support for locking, but Lamport has de-
veloped an algoritlim for these cases, which is optimal in the number of memory reads and
writes [Lam87]. Other machines only provide test-and-set on a limited number of mem-
ory locations, in which case software locks must be implemented on top of the hardware
supported spin-locks.
12.2.2 Generalized selective communication
With the introduction of a separate lock on each channel's data structure, the implemen-
tation of sync applied to a choice of multiple communications becomes significantly more
161
complicated. For example, a naive implementation of select on a list of communications
is to first grab the locks of all the channels and then do the operation. This fails in the
following situation. Assume there are two threads t1 and t2, running on different processors,
with t1 attempting
select Ereceive c1, transmit (c2 v)]
while simultaneously t2 attempts
select Ereceive c2, receive c1]
This can result in a situation in which t1 holds a lock on c1 and needs a lock on c2, while
t2 holds a lock on c2 and needs a lock on c1 --H i.e., deadlock. There are various known
algorithms for this problem (e.g., see [BS83], [Bor86], or [13ag89]). The basic strategy is to
first make tentative offers of communication; when two tentative offers match, one thread
must freeze its state until the other thread either corumits or rejects the communication. The
choice of which thread will fix its state is based the order of the threads' IDs; this avoids the
possibility of cyclic dependencies and deadlock. Greg Morrisett has implemented a protocol
similar to [Bor86] on top of ML-threads (Greg Morrisett, personal communication, July
1991).
12.2.3 Thread scheduling
The techniques and data structures used for thread scheduling can have a significant im-
pact on multiprocessor performance, because of contention for thread queues and cache
consistency effects.
A single global scheduling queue would be a significant source of contention. Further-
more, a single queue does not provide any mechanism for keeping a thread on a single
processor, which is important for preserving cache consistency. Accordingly, as a first cut,
it is clear that each processor should have its own queue of ready threads. Some policy
is needed to baiance out the load. One possibility is to balance the scheduling queues at
garbage collection time. Since typical memory allocation rates in ML programs are high
(on the order of 5 to 10 megabytes per second on a SPARCstation-2), a processor that runs
out of work would not have to wait long for load balancing. A fall-back would be to allow
an idle processor to force a garbage collection if it has been idle for more than a few millisec-
onds. This scheme has the advantage of insuring that the scheduling queue is ouly accessed
by its processor (except during load balancing), which means that a light-weight locking
mechanism, such as that used in the single processor implementation (see Section 10.2.2),
can be used to protect the queue. Since the scheduling queue is the single most heavily
162
accessed shared data structure, this scheme might provide good performance. The question
that needs to be answered by empirical tests is how often do processors run out of threads
to schedule?
The implementation of Mul-T uses two thread queues per processor; one for threads
that have never run, called the new thread q?eue, and one for threads that have been
suspended, called the suspended thread queue. When selecting a new thread to dispatch,
the processor's scheduler first looks in its own suspended thread queue, then in its own new
thread queue, then in other processors' new thread queue and lastly, if it has not found a
thread, it looks in the other processors' suspended thread queues. By selecting new threads
over suspended threads when migrating threads, the impact on cache consistency of thread
migration is reduced.
12.2.4 Memory management
Implementations of heap-based languages, such as SML or CML, live or die by the per-
formance of their memory allocation and garbage collection techniques. An efficient multi-
processor implementation of CML must address several memory management issues. The
most important of these is avoiding contention during memory allocation. The standard
scheme to address this problem is to divide the allocation space into multiple chunks and to
give each processor its own allocation chunk (e.g., [AEL88], [KH88], [ME89] and [Mar91]).
When a processor fills its allocation chunk, it grabs another from the global list of free
chunks. The ouly source of allocation contention are the accesses to the global chunk list,
which are relatively rare.
When the allocation chunks are exhausted, it is necessary to perform a garbage collec-
tion. For a "stop-the-world" collector, this first requires synchronizing the processors, so
that they are all in collection state. One possible technique to force synchronization is to
have the processor that notes the need for garbage collection use a UNIX signal to notify
the other processors [KH88]. Another approach is to walt for the other processors to ex-
haust their allocation chunks [Mar91]. To avoid problems in the unlikely case of an infinite,
non-allocating, computation, a global flag is set that is checked by the SIGINT signal han-
dier. Since allocation rates in SML/NJ are very high (typically one 4-byte word per 5-10
instructions), the idle-time of the processor that initiated the garbage collection might be
less costly than the overhead of using signals to interrupt the other processors.
Once all of the processors are in collection state, the garbage collection can begin. The
simplest technique is to run a standard collection algorithm on a single processor. This has
the clear disadvantage, however, of increasing the cost of garbage collection relative to the
rest of the program. A sequential collector is a performance bottleneck; it is much more
163
desirable to garbage collect in parallel. There are a number of systems using parallel garbage
collection (e.g., [KH88], [ME89] and [Mar91]). The techniques of [Mar91] seem to fit the
SML/NJ memory management system fairly well. In this scheme, each processor has its
own to-space. When a collector process encounters a reference to a from-space object while
sweeping its to-space, it examines the object's descriptor. If the descriptor is a forward
pointer, then the collector process updates the reference in its to-space. If the object has
not been forwarded, then the collector process locks the descriptor word, allocates space in
its to-space, sets the forward pointer, unlocks the descriptor, and then copies the object. For
machines, like the SGI 4D/380, which have a limited number of hardware locks, a hashing
scheme on the object's address can be used to multiplex the hardware locks. Since the lock
on the object's descriptor is only held for a few instructions, contention should be rare.
These techniques still suffer from the problem that they stop the world during garbage
collection. Although the use of generational techniques reduces the frequency of noticeable
pauses [Ung84], providing uniform responsiveness for real-time applications, such as user
interfaces, requires interleaving garbage collection activities with mutator computation. On
a multiprocessor, the most obvious approach is to dedicate one or more processors to the
task of garbage collection. The principal technical problem with interleaving mutator and
garbage collection activity is synchionization. If synchronization overhead is high, then any
performance benefits will be lost. In lieu of special purpose hardware, the virtual memory
system can be used to implement synchionization [AEL88].
12.3 The outlook for multiprocessor CML
This chapter has described a number of issues related to the implementation and use of
a multiprocessor version of CML. Some work has already been done towards supporting
CML on multiprocessors. Greg Morrisett has implemented a low-level library of multipro-
cessmg primitives, such as spin-locks, for SML/NJ on the SGI 4D/38O [Mor]. This should
provide a suitable base for implementing a multiprocessor version of CML.
Once a multiprocessor implementation exists, it will be possible to experiment with
different styles of parallel programming. The flexibility provided by first-class synchronous
operations means that CML can accommodate different parallel programming paradigms
without serious disruption or incompatibilities with existing code. Condition variables and
some form of M-structures should provide the right primitives for programming parallel
algorithms, while being semantically consistent with CML's other primitives, and the tech-
niques of work crews and lazy futures should provide reasonable mechanisms for controlling
parallelism.
164
Part V
Conclusion
165
Chapter 13
Future Work
Although this dissertation is a comprehensive treatment of the design, semantics, application
and implementation of a concurrent language, there is still room for additional research and
implementation. Following the structure of this dissertation, the topics for future research
are divided into design, theory and practice.
13.1 Design
The design of CML has evolved for a number of years based on practical experience and
is now falrly mature. Given the amount of practical experience with the mechanisms, it is
ulIlikely that CML will change in any radical way. There are, however, a couple of areas
for exploration.
One of the attractive aspects of CML is that it supports a wide range of concurrency
mechanisms using a small set of core primitives. A possible area of exploration is to take a
reductionist approach in the choice of primitives. CML uses synchronous message passing
as the basic synchronous operation, but perhaps there are other, more primitive, choices.
For example, some variation on low-level shared-memory primitives might be possible. This
would factor out the communication from the primitive synchronous operations. While this
exercise would be intellectually interesting, I suspect that the resulting language design
would be too low-level. Synchronous message passing seems to provide a happy medium
between low-level performance and high-level abstraction.
As I discussed in Section 5.4, choosing synchronous message passing as the primitive
synchronization mechanism limits rendezvous to two threads. This limitation interferes
with a potentially useful class of abstractions --H the use of threads to implement active
channels. An example of an active channel is a channel that logs all message traffic for
debugging purposes. One approach to supporting such abstractions is to add a multiway
167
rendezvous primitive [Cha87]. The implementation details remain to be worked out, but
solutions to this problem are discussed in Chapter 14 of [CM88], where it is called the
committee coordination problem.
The original prototype of first-class synchronous operations was implemented in C and
included support for using events in 0 programs. The lack of closures, however, limited
the usefuiness of events in 0. Adding events to a language such as Modula-3 might prove
more satisfactory, since objects can be used to provide a closure-like mechanism1 [Nel91].
13.2 Theory
In Section 7.5, I described a number of ways to enrich the A?? calculus to more fully model
CML. It remains to prove type soundness results for the extended calculus. In particular,
th			I c?			o+4s. both			atial			3f t			- loo?hole?
ombination of )t
should be shown to be sound.
Wlib cuLl Lcu.Ult			-			bUU??b'			yjlt byb??u			I,
The operational semantics that I presented in Chapter 7 could be used as the basis for
a "theory" of first-class synchronous operations. There are a number of transformations on
event values that should be shown to be semantics preserving. For example, the representa-
tion of event values in the implementation can be described by a rewriting system of event
values. Showing that the rewriting of an event value preserves its meaning would be a sig-
nificant step toward showing that the implementation is correct. In Section 10.6, a number
of optimizations are suggested (e.g., replacing use-once channels with condition variables).
A theory of events would provide a framework for showing that these optimizations are
"safe."
Proving such results requires a notion of event value equivalence: the obvious definition
is that two event values are equivalent if they are indistinguishable in all contexts. This
definition requires, in turn, some notion of process equivalence. This is an active area of
theoretical research (e.g., [B1o89]) and there are many different notions of what it means
for two processes to be equivalent. For various reasons, I think that a modified notion of
testing eqtivalence [Hen88] is the most suitable for developing these results.
13.3 Practice
As we gain more experience with CML, certain common abstractions may emerge. By
supporting these abstractions directly as primitives, performance can be improved substan-
tially. The condition variables discussed in Section 5.3 is an example of this; using them for
lin fact, a closure in Modula-3 is an object type with an apply method.
168
replys ill an RPC abstraction reduces overhead by about 30% (see Chapter 11). Another
possible candidate is buffered channels, which are often used in interactions with external
processes (e.g., the X-server). It is important to note that adding these new primitive op-
erations does not change the semantics of CML, since semantically they are still derived
operations.
The most glaring weakness of CML is the lack of debugging facilities. A short term
solution is to provide a version of the CML primitives that allows monitoring of com-
munication, thread scheduling, etc. A more ambitious scheme is to provide an interactive
debugger. Andrew Tolmach, who is responsible for the SML/NJ debugger [TA90], is work-
ing on a concurrent version for a "safe" version of ML-threads [TA91]. It is likely that his
work can be adapted to CML; in fact, CML may be a better target than ML-threads,
since the shared state is more clearly defined.
Chapter 12 discussed many of the issues relating to the implementation and use of CML
on multiprocessors. I view this as the most important direction for future implementation
work. Multiprocessor server machines are already common, and that technology is likely
to trickle down to single-user workstations in the next few years. CML provides a natural
migration path for SML applications to benefit from the parallel processing capabilities of
multiprocessor workstation.
There are a number of active ongoing projects that are using CML. Emden Gansner
and I are continning to develop eXene and plan use it as part of a foundation for interac-
tive programming environments [RG86, CR92]. The DML project at Cornell University is
exploring issues in distributed systems, using CML as starting point [Kru9l]. These appli-
cations and others will help to gnide future evolution of CML and its implementation.
169
170
Chapter 14
Conclusion
Concurrent programming is an area of growing importance, but there has been little recent
progress in the design of concurrent languages. For example, Modula-3 encompasses many
recent ideas in sequential language design, but uses concurrency features that date back to
the 1970s [Nel9l]. In this dissertation, I have presented a new approach to concurrent
language design that supports a higher level of concurrent programming. The key new
idea is to treat synchronous operations as first-class values that can be composed into
new synchronous operations. This allows many different styles of communication to be
supported in the same linguistic framework. I call this new style of programming "higher-
order concttrrent programming," as an analogy with higher-order programming in languages
such as ML. This dissertation is a broad look at this new approach to concurrent language
design, exploring the design, theory and practice of first-class synchronous operations.
The ideas of this thesis are presented in the context of the language CML, which is
an extension of SML that supports first-class synchronous operations. I use CML to
illustrate the usefulness and practicality of my approach. In Chapter 5, I show how a
number of synchronization and communication abstractions found in other languages can
be implemented in CML as first-class citizens. This demonstrates that CML can support
different concurrent paradigms in a single linguistic framework.
I have also developed the formal underpiunings of first-class synchronous operations. In
Chapter 7,1 give the operational semantics of a simple untyped language, called A?, that has
first-class synchronous operations. This language includes most of the concurrency features
of CML, and is a substantial step toward a formal definition of CML. In Chapter 8, I
define a polymorphic type discipline for Acv that is in the tradition of ML type systems,
and I prove that this type system is sound with respect to the operational semantics of A??.
To my knowledge, this is the first proof of type soundness for a polymorphic concurrent
language.
171
CML has ?
I ??I ??? been used to build several non-trivial applica-
?1
Jeen im???men?eL
tions. The most significant of these is eXene, which is a multi-threaded X window system
toolkit. EXene and some other applications of CML &e described in Chapter 9. CML
has also been publically distributed since November of 1991, and is being used by a num-
ber of other rese&chers. The implementation of CML is described in Chapter 10 and
performance measurements &e reported in Chapter 11. The use of CML to implement
substantial applications, as well as the performance of the implementation (which is com-
petitive with lower-level concurrency packages), demonstrates that CML is a useful and
practical language for systems prograninling. In many respects, the CML system is the
most important result of this research, and I expect that it will provide a solid basis for
other research and development for years to come. In the future, I plan to implement CML
on a shared-memory multiprocessor (Chapter 12 discusses issues related to this).
172
Bibliography
[AB8O]
[AB86]
[AEL88]
Arvind and J. D. Brock. Streams and managers. In Operating Systems Engi-
neering; Proceedings of the 14th IBM Computer Science Symposium, vol. 143 of
Lecture Notes in Computer Science. Springer-Verlag, October 1980, pp. 452--H465.
Abramsky, 5. and R. Bornat. Pascal-m: A language for loosely coupled dis-
tributed systems. lii Y. Paker and J.-P. Verjus (eds.), Distributed Computing
Systems, pp. 163--H189. Academic Press, New York, N.Y., 1986.
Appel, A. W., J. R. Ellis, and K. Li. Real-time concurrent collection on stock
multiprocessors. In Proceedings of the SIGPLAN'88 Conference on Program-
ming Language Design and Implementation, June 1988, pp. 11--H20.
[Agh86] Agha, G. Actors: A Model of Concurrent Computation in Distributed Systems.
The MIT Press, Cambridge, Mass., 1986.
[AJ89]
[AM87]
[AM91]
[And89]
Appel, A. W. and T. Jim. Continuation-passing, closure-passing style. In Con-
ference Record of the 16th Annual ACM Symposium on Principles of Program-
ming Languages, January 1989, pp. 293--H302.
Appel, A. W. and D. B. MacQueen. A Standard ML compiler. In Functional
Programming Languages and Computer Architecture, vol. 274 of Lecture Notes
in Computer Science. Springer-Verlag, September 1987, pp. 301--H324.
Appel, A. W. and D. B. MacQueen. Standard ML of New Jersey. In Pro-
gramming Language Implementation and Logic Programming, vol. 528 of Lecture
Notes in Computer Science. Springer-Verlag, August 1991, pp. 1--H26.
Anderson, T. E. The performance of spin lock alternatives for shared-memory
multiprocessors. Technical Report 89-04-03, Department of Computer Science,
University of Washington, August 1989.
[And91] Andrews, G. R. Concurrent Programming: Principles and Practice. Ben-
jarnin/Cuminings, Redwood City, California, 1991.
Arvind, R. 5. Nikhil, and K. K. Pingali. I-structures: Data structures for parallel
computing. ACM Transactions on Programming Languages and Systems, 11(4),
October 1989, pp. 598--H632.
[ANP89]
173
[AOCE88] Andrews, G. R., R. A. Olsson, M. Coffin, and I. Elshoff. An overview of the SR
language and implementation. A CM Transactions on Programming Languages
and Systems, 10(7), January 1988, pp. 51--H86.
[App89] Appel, A. W. Simple generational garbage collection and fast allocation. Soft-
ware --H Practice and Experience, 19(2), February 1989, pp. `275--H279.
[App90] Appel, A. W. A runtime system. Lisp and Symbolic Computation, 4(3), Novem-
ber 1990, pp. 343--H380.
[App92] Appe1, A. W. Compiling with Continuations. Cambridge University Press, New
York, N.Y., 1992.
[AS83] Andrews., G. R. and F. B. Schneider. Concepts and notations for concurrent
programming. ACM Computing Surveys, 15(1), March 1983, pp. 3-43.
[Bag89]
Bagrodia, R. Synchronization of asynchronous processes in CSP. A CM Trans-
actions on Programming Languages and Systems, 11(4), October 1989, pp. 585--H
597.
[Bar84] Barendregt, H. P. The Lambda Calculus, vol. 103 of Studies in Logic and the
Foundations of Mathematics. North-Holland, revised edition, 1984.
[BB90]
[BCJ+90]
Berry, G. and G. Boudol. The chemical abstract machine. In Conference Record
of the 17th Annual ACM Symposium on Principles of Programming Languages,
January 1990, pp. 81--H94.
Birman, K., R. Cooper, T. A. Joseph, K. Marzullo, M. Makpangou, K. Kane,
F. Schinuck, and M. Wood. The 1515 system manual, version 2.0. Computer
Science Department, Cornell University, Ithaca, N.Y., March 1990.
[BCLM89] Bose, 5., E. M. Clarke, D. E. Long, and 5. Michaylov. Parthenon: A parallel the-
orem prover for non-horn clauses. In Proceedings of the ?th Annual Sympos?um
on Logic in Computer Science, June 1989, pp. 80--H89.
[BD80]
[BH77]
[Blo89]
Bryant, R. and J. B. Dennis. Concurrent programming. In Operating Systems
Engineering; Proceedings of the 1?th IBM Computer Science Symposium, vol.
143 of Lecture Notes in Computer Science. Springer-Verlag, October 1980, pp.
426--H451.
Baker, Jr., H. G. and C. Hewitt. The incremental garbage collection of processes.
In Proceedings of the Symposium on Artificial Intelligence and Programming
Languages, August 1977, pp. 55--H59.
Bloom, B. Ready Simulation, Bisimulation, and the Semantics of CCS-like
Languages. Ph.D. dissertation, Massachusetts Institute Technology, Laboratory
for Computer Science, October 1989. Available as MIT/LCS/TR-491.
Berry, D., R. Milner, and D. N. Turner. A semantics for ML concurrency prim-
itives. In Conference Record of the 19th Annual ACM Symposium on Principles
of Programming Languages, January 1992. To appear.
[BMT92]
174
[BNA91]
Barth, P., R. 5. Nikhll, and Arvind. M-structures: Extending a parallel, non-
strict, functional language with state. In Functional Programming Languages
and Computer Architecture, vol. 523 of Lecture Notes in Computer Science.
Springer-Verlag, August 1991, pp. 538--H568.
[Bor86] Bornat, R. A protocol for generalized occam. Software - Practice and Expert-
ence, 16(9), September 1986, pp. 783--H799.
[Bri77] Brinch Hansen, P. The Architecture of Concurrent Programs. Prentice-Hall,
Englewood Cliffs, N.J., 1977.
[Bri89] Brinch Hansen, P. The Joyce language report. Software - Practice and Expert-
ence, 19(6), June 1989, pp. 553--H578.
[BS83]
[BS9O]
Buckley, G. N. and A. Silberschatz. An effective implementation for the gen-
eralized input-output construct of CSP. ACM Transactions on Programming
Languages and Systems, 5(2), April 1983, pp. 223--H235.
Buhr, P. A. and R. A. Stroobosscher. The ?System: Providing light-weight con-
currency on shared-memory multiprocessor computers runalug UMX. Software
--H Practice and Experience, 20(9), September 1990, pp. 929--H963.
[Bur88] Burns, A. Programming in occam ? Addison-Wesley, Reading, Mass., 1988.
[Car86]
Cardelli, L. Amber. In Combinators and Functional Programming Languages,
vol. 242 of Lecture Notes in Computer Science. Springer-Verlag, July 1986, pp.
21--H47.
[Car89] Cardelli, L. Typeful programming. Technical Report 45, DEC Systems Research
Center, May 1989.
[CD88] Cooper, E. C. and R. P. Draves. C threads. Technical Report CMU-CS-88-54,
School of Computer Science, Carnegie Mellon University, February 1988.
[CDDK86]
Clement, D., J. Despeyroux, T. Despeyroux, and 0. Kahn. A simple applicative
language: Mini-ML. In Conference record of the 1986 ACM Conference on Lisp
and Functional Programming, August 1986, pp. 13--H27.
[Cha87] Char1esworth, A. The multiway rendezvous. A CM Transactions on Program-
ming Languages and Systems, 9(2), July 1987, pp. 350--H366.
[CHL91]
Cooper, E. C., R. Harper, and P. Lee. The Fox project: Advanced development
of system software. Technical Report CMU-CS-91-178, School of Computer
Science, Carnegie Mellon University, August 1991.
[CM88] Chandy, K. M. and J. Misra. Parallel Program Design: A Foundation. Addison-
Wesley, Reading, Mass., 1988.
Cooper, E. C. and J. 0. Morrisett. Adding threads to Standard ML. Technical
Report CMU-CS-90-186, School of Computer Science, Carnegie Mellon Univer-
sity, December 1990.
[CM9O]
175
[Con86] Constable, R. et aL Implementing Mathematics with The Nupri Development
System. Prentice-Hali, Englewood Cliffs, N.J., 1986.
[Cor88] Cormack, G. V. A rnicro-kernel for concurrency in C. Software - Practice and
E:perience, 18(5), May 1988, pp. 485--H491. Short Communication.
[CS91]
Crummey, J. M. M. and M. L. Scott. Algorithms for scalable synchronization
on shared-memory multiprocessors. A CM Transactions on Computer Systems,
9(1), February 1991, pp. 21--H65.
[Dam85] Damas, L. M. M. Type assignment in programming languages. Ph.D. disserta-
tion, Department of Computer Science, University of Edinburgh, April 1985.
[DH89] Dybvig, R. K. and R. Hieb. Engines from continuations. Computing Languages,
14(2), 1989, pp. 109--H123.
[DHM91]
Duba, 13., R. Harper, and D. MacQueen. Type-checking first-class continuations.
In Conference Record of the 18th Annual ACM Symposium on Principles of
Programming Languages, January 1991, pp. 163--H173.
[Dii75] Dijkstra, E. W. Guarded commands, nondeterminacy, and formal derivation of
programs. Communications of the ACM, 18(8), August 1975, pp. 453--H457.
[DM82]
Damas, L. and R. Milner. Principal types for functional programs. In Confer-
ence Record of the 9th Annual ACM Symposium on Principles of Programming
Languages, January 1982, pp. 207--H212.
[DoD83] Reference Manual for the Ada Programming Language, January 1983.
[EHP80]
[Fel87a]
Eventoff, W., D. Harvey, and R. J. Price. The rendezvous and monitor con-
cepts: Is there an efficiency difference? In Proceedings of the ACM-SICPLAN
Symposium on the Ada Programming Language, December 1980, pp. 156--H165.
Felleisen, M. The Calculi of Lambda-v-CS Conversion in Imperative Higher-
order programming languages. Ph.D. dissertation, Computer Science Depart-
ment, Indiana University, 1987. Available as Technical Report Nr. 226.
[Fel87b] Felleisen, M. Reflections on Landin's J-operator: A partly historical note. Com-
puter Languages, 12(3/4), 1987, pp. 197--H207.
[FF86]
[FLP85]
Felleisen, M. and D. P. Friedman. Control operators, the SECD-machine, and
the A-calculus. In M. Wirsing (ed.), Formal Description of Programming Con-
cepts --H III, pp. 193--H219. North-Holland, New York, N.Y., 1986.
Fischer, M. J., N. A. Lynch, and M. 5. Paterson. Impossibility of distributed
consensus with one faulty process. Journal of the ACM, 32(2), April 1985, pp.
374--H382.
Francez, N. and 5. A. Yemini. Symmetric intertask communication. ACM
Transactions on Programming Languages and Systems, 7(4), October 1985, pp.
622--H636.
[FY85]
176
[GC84]
[GL91]
Gehani, N. H. and T. A. Cargill. Concurrent prograirning ill the Ada language:
The polling bias. Sofiware - Practice and Experience, 14(5), May 1984, pp. 413--H
427.
George, L. and G. Lindstrom. Using a functional language and graph reduction
to program multiprocessor machines. Technical Report UUCS-91-02O, Depart-
ment of Computer Science, University of Utah, October 1991.
[GM88] Gehani, N. H. and A. D. McGettrick (eds.). Concurrent Programming. Addison-
Wesley, Reading, Mass., 1988.
[GMP89]
Giacalone, A., P. Mishia, and 5. Prasad. Facile: A symemetric integration of
concurrent and functional programming. In TAPSOFT'89 (vol. ?, vol. 352 of
Lecture Notes in Computer Science. Springer-Verlag, March 1989, pp. 184--H209.
[GMW79] Gordon, M. J., R. Milner, and C. P. Wadsworth. Edinburgh LCF, vol. 72 of
Lecture Notes in Computer Science. Springer-Verlag, New York, N.Y., 1979.
[Gor79] Gordon, M. J. C. The Denotational Description of Programming Languages.
Springer-Verlag, New York, N.Y., 1979.
[GR86] Gehani, N. H. and W. D. Roome. Concurrent C. Sofiware - Practice and
Experience, 16(9), September 1986, pp. 821--H844.
[GR91] Gansner, E. R. and J. H. Reppy. eXene. In Third International Workshop on
Standard ML, Carnegie Mellon University, September 1991.
[GR92]
Gansner, E. R. and J. H. Reppy. A foundation for user interface construction.
In B. A. Myers (ed.), Languages for Developing User Interfaces, pp. 239--H260.
Jones & Bartlett, Boston, Mass., 1992.
[Gre91] Green, 5. Parallel Processing for Computer Graphics. The MIT Press, Cam-
bridge, Mass, 1991.
[Haa90] Haahr, D. Montage: Breaking windows into small pieces. In USENIX Summer
Conference, June 1990, pp. 289--H297.
[Hal85] Halstead, Jr., R. II. Multilisp: A language for concurrent symbolic computa-
tion. ACM Transactions on Programming Languages and Systems, 7(4), October
1985, pp. 501--H538.
[Har86]
[HDB90]
Harper, R. Introduction to Standard ML.
Laboratory for Foundations of Computer
ment, Edinburgh University, August 1986.
Technical Report ECS-LFCS-80-14,
Science, Computer Science Depart-
Hieb, R., R. K. Dybvig, and C. Bruggeman. Representing control in the presence
of first-class continuations. In Proceedings of the SIGPLAN'90 Conference on
Programming Language Design and Implementation, June 1990, pp. 66--H77.
[Hen88] Hennessy, M. Algebraic Theory of Processes. The MIT Press, Cambridge, Mass.,
1988.
177
[HN8O]
Habermann, A. N. and I. R. Nassi. Efficient implementation of Ada tasks. Tech-
nical Report CMU-CS-80-103, Computer Science Department, Carnegie Mellon
University, January 1980.
[Hoa74] Hoare, C. A. R. Monitors: An operating system concept. Communications of
the ACM, 17(10), October 1974, pp. 549--H557.
[Hoa78] Hoare, C. A. R. Communicating sequential processes. Communications of the
ACM, 21(8), August 1978, pp. 666--H677.
[Hoa85] Hoare, C. A. R. Communicating Sequential Processes. Prentice-Hall, Englewood
Cliffs, N.J., 1985.
[Hol83a] Holmstr5m, 5. PFL: A functional language for parallel progranming. In Declar-
ative programming workshop, April 1983, pp. 114--H139.
[Hol83b] Holt, R. C. Concurrent Euclid, the UNIX System, and Tunis. Addison-Wesley,
Reading, Mass., 1983.
[HS86] Hindley, R. J. and J. P. Seldin. Introduction to Combinators and the A-calculus.
Cambridge University Press, New York, N.Y., 1986.
[1NM84] INMOS Limited. Occam Programming Manual. Prentice-Hall, Englewood Cliffs,
N.J., 1984.
[KH88]
[KKR+ 86]
Kranz, D. A. and R. H. Halstead, Jr. Mul-T: A high-performance parallel
Lisp. In Proceedings of the SICPLAN'89 Conference on Programming Language
Design and Implementation, June 1988, pp. 81--H90.
Kranz, D., R. Kelsey, J. Rees, P. Hudak, J. Philbin, and N. Adams. Orbit: An
optimizing compiler for Scheme. In Proceedings of the SIGPLAN'86 Symposium
on Compiler Construction, July 1986, pp. 219--H233.
[KM77] Kahn, G. and D. B. MacQueen. Coroutines and networks of parallel processes.
In Information Processing 77, August 1977, pp. 993--H998.
[KNW90] Kafura, D., J. Nelson, and D. Washabaugh. Garbage colection of actors. In
OOPSLA/ECOOP`90 Proceedings, October 1990, pp. 126--H134.
[Kru91] Krumvieda, C. D. DML: Packaging high-level distributed abstractions in SML.
In Proceedings of the 1991 CMU Workshop on Standard ML, September 1991.
[KS79]
Kieburtz, R. 13. and A. Silberschatz. Comments on `Communicating sequential
processes'. ACM Transactions on Programming Languages and Systems, 1(2),
April 1979, pp. 218--H225.
[Kwi89] Kwiatkowska, M. Z. Survey of fairness notions. Information and Sofiware Tech-
nology, 31(7), September 1989, pp. 371--H386.
[Lam87] Lamport, L. A fast mutual exclusion algorithm. A CM Transactions on Com-
puter Systems, 5(1), February 1987, pp. 1--H11.
178
[Lan65] Landin, P. J. A correspondence between Algol 60 and Churli's lanibda notation:
Part I. Communications of the ACM, 8(2), February 1965, pp. 89--H101.
[LCJS87]
[LHG86]
Liskov, B., D. Curtis, P. Johnson, and R. Scheifler. Implementation of Argus.
lii Proceedings of the 11th ACM Symposium on Operating System Principles,
November 1987, pp. 111--H122.
Liskov, B., M. Herlihy, and L. Gilbert. Limitations of synchronous commuru-
cation with static process structure in langauges for distributed programming.
In Conference Record of the 13th Annual ACM Symposium on Principles of
Programming Languages, January 1986, pp. 150--H159.
[LR80] Lampson, B. W. and D. D. Redell. Experience with processes and monitors in
Mesa. Communications of the ACM, 23(2), February 1980, pp. 105--H116.
[LS83] Liskov, B. and R. Scheifler. Guardians and actions: Linguistic support for
robust, distributed programs. ACM Transactions on Programming Languages
and Systems, 5(3), July 1983, pp. 381--H404.
[LS88]
[LS90]
[Mar91]
Liskov, B. and L. Shrira. Promises: Linguistic support for efficient asynchronous
procedure calls in distributed systems. In Proceedings of the SIGPLAN'88 Con-
ference on Programming Language Design and Implementation, June 1988, pp.
260--H267.
Lin, C. and L. Snyder. A comparison of programming models for shared memory
multiprocessors. In 1990 International Conference on Parallel Processing, vol. 2,
1990, pp. 163--H170.
Maranget, L. GAML: A parallel implementation of lazy ML. In Functional
Programming Languages and Computer Architecture, vol. 523 of Lecture Notes
in Computer Science. Springer-Verlag, August 1991, pp. 102--H123.
[Mat89] Matthews, D. C. J. Processes for Poly and ML. In Papers on Poly/ML, Technical
Report 161. University of Cambridge, February 1989.
[Mc190] Mcllroy, M. D. Squinting at power series. Software - Practice and Experience,
20(7), July 1990, pp. 661--H683.
[ME89]
[MKH91]
Miller, J. 5. and B. 5. Epstein. Garbage collection in multiscneme (preliminary
version). In Parallel Lisp: Languages and Systems, vol. 441 of Lecture Notes in
Computer Science. Springer-Verlag, June 1989, pp. 138--H160.
Mohr, E., D. A. Kranz, and R. H. Haistead, Jr. Lazy task creation: A technique
for increasing the granularity of parallel programs. IFEE Transactions on Par-
allel and Distributed Systems, 2(3), July 1991, pp. 264--H280. A longer version is
available as DEC CRL report 90/7, November 1990.
[MM579] Mitchell, J. G., W. Maybury, and R. Sweet. Mesa Language Manual (Version
5.0). Xerox PARC, April 1979.
179
[Mor] Morrisett, J. G. A multi-processor interface for SML. CMU technical report (in
preparation).
[MT91] Milner, R. and M. Tofte. Commentary on Standard ML. The MIT Press,
Cambridge, Mass, 1991.
[MTH90] Milner, R., M. Tofte, and R. Harper. The Definition of Standard ML. The MIT
Press, Cambridge, Mass, 1990.
[Nel81]
Nelson, B. J. Remote Procednre Call. Ph.D. dissertation, Computer Science
Department, Carnegie Mellon University, May 1981. Available as Xerox PARC
Report CSL-81-9.
[Nel91] Nelson, G. (ed.). Systems Programming with Mod?a-3. Prentice-Hall, Engle-
wood Cliffs, N.J., 1991.
[Nik91] Nikhil, R. 5. ID Language Reference ManuaL Laboratory for Computer Science,
MIT, Cambridge, Mass., July 1991.
[Nye90a] Nye, A. X Protocol Reference Manual, vol. 0. O'Reilly & Associates, Inc., 1990.
[Nye90b] Nye, A. Xlib Programming Manual, vol. 1. O'Reilly & Associates, Inc., 1990.
[Osb89] Osborne, R. B. Speculative computation in Multilisp. In Parallel Lisp: Lan-
guages and Systems, vol. 441 of Lecture Notes in Computer Science. Springer-
Verlag, June 1989, pp. 101--H137.
[Owi89] Owicki, 5. Experience with the firefly multiprocessor workstation. Technical
Report 51, DEC Systems Research Center, September 1989.
[Pau91] Paulson, L. C. ML for the Working Programmer. Cambridge University Press,
New York, N.Y., 1991.
[Pik89] Pike, R. A concurrent window system. Computing Systems, 2(2), 1989, pp.
133--H153.
[Plo75] Plotkin, G. D. Call-by-name, call-by-value and the A-calculus. Theoretical Com-
puter Science, 1, 1975, pp. 125--H159.
[Ram90] Ramsey, N. Concurrent programming in ML. Technical Report CS-TR-?62-9O,
Department of Computer Science, Princeton University, April 1990.
[RC86] Rees, J. and W. Clinger (Eds.). The revised3 report on the algoritlirnic language
Scheme. SIGPLAN Notices, 21(12), December 1986, pp. 37--H43.
[Rep88] Reppy, J. H. Synchronous operations as first-class values. In Proceedings of the
SIGPLAN'88 Conference on Programming Language Design and Implementa-
tion, June 1988, pp. 250--H259.
Reppy, J. H. First-class synchronous operations in Standard ML. Technical Re-
port TR 89-1068, Computer Science Department, Cornell University, December
1989.
[Rep89]
180
[Rep90a] Reppy, J. H. Asyncbronous signals in Standard ML. Technical Report TR 90-
11??, Computer Science Department, Cornell University, August 1990.
[Rep90b]
[Rep91a]
[Rep91b]
[RG86]
[RLW85]
[Ros81]
[Rov85]
[RV89]
Reppy, J. H. Concvrrent programming with events --H The Concnrrent ML man-
itaL Computer Science Department, Cornell University, Ithaca, N.Y., November
1990. (Last revised October 1991).
Reppy, J. II. CML: A bigher-order concurrent language. In Proceedings of the
SIGPLAN'91 Conference on Programming Language Design and Implementa-
tion, June 1991, pp. 293--H305.
Reppy, J. H. An operational semantics of first-class synchronous operations.
Technical Report TR 91-1?3? Computer Science Department, Cornell Univer-
sity, August 1991.
Reppy, J. H. and E. R. Gansner. A foundation for programming environments. In
Proceedings of the A CM SIGSOFT/SIGPLANSoftware Engineering Symposium
on Practical Software Development Environments, December 1986, pp. 218--H227
Rovner, P., R. Levin, and J. Wick. On extending Modula-2 for building large,
integrated systems. Technical Report 3, DEC Systems Research Center, January
1985.
Rosen, B. K. Degrees of avallabillty as an introduction to the general theory
of data flow analysis. In 5. 5. Muchnick and N. D. Jones (eds.), Program Flow
Analysis: Theory and Applications, pp. 55--H76. Prentice-Hall, Englewood Cllffs,
N.J., 1981.
Rovner, P. On adding garbage collection and runtime types to a strongly-typed,
statically-check, concurrent language. Technical Report CSL-84-7, Xerox PARC,
July 1985.
Roberts, E. 5. and M. T. Vandevoorde. WorkCrews: An abstraction for con-
trolling parallellsm. Technical Report 4? DEC Systems Research Center, April
1989.
[5G86] Scheifler, R. W. and J. Gettys. The X window system. ACM Transactions on
Graphics, 5(2), April 1986, pp. 79--H109.
[5R90]
Saraswat, V. A. and M. Rinard. Concurrent constraint programming. In Confer-
ence Record of the 17th Annual ACM Symposium on Principles of Programming
Languages, January 1990, pp. 232--H245.
[Ste78] Steele Jr., G. L. Rabbit: A compiler for Scheme. Master's dissertation, MIT,
May 1978.
Tolmach, A. P. and A. W. Appel. Debugging Standard ML without reverse
engineering. In Conference record of the 1990 A CM Conference on Lisp and
Functional Programming, June 1990, pp. 1--H12.
[TA90]
181
[TA9l]
Tolmach, A. P. and A. W. Appel. Debuggable concurrency extensions for Stan-
dard ML. In Proceedings of the ACM/ONR Workshop on Parallel and Dis-
tributed Debugging, May 1991, pp. 120--H131.
[TofB8] Tofte, M. Operational semantics and polymorphic type inference. Ph.D. disser-
tation, Department of Computer Science, University of Edinburgh, May 1988.
[Tof9O] Tofte, M. Type inference for polymorphic references. Information and Compu-
tation, 89, 1990, pp. 1--H34.
[Ung84]
Ungar, D. Generation scavenging: A non-disruptive high-performance storage
reclamation algorithm. In Proceedings of the ACM SICSOFT/SIGPLAN Soft-
ware Engineering Symposium on Practical Software Development Environments,
April 1984, pp. 157--H167.
[UN186] University of California, Berkeley. UNIX Programmer's Reference Manual
(4.3bsd), 1986.
[Wan80] Wand, M. Continuation-based multiprocessing. In Conference Record of the
1980 Lisp Conference, August 1980, pp. 19--H28.
[WF91a] Wright, A. and M. Felleisen. Corrigendum to "A syntactic approach to type
soundness", July 1991.
[WF91b] Wright, A. and M. Felleisen. A syntactic approach to type soundness. Technical
Report TR9l-160, Department of Computer Science, Rice University, April 1991.
[WS83] Wegner, P. and 5. A. Smolka. Processes, tasks and monitors: A comparative
study of concurrent programung primitives. IEEE Transactions on Software
Engineering, 9(4), July 1983, pp. 446--H462.
Zaring, A. K. Parallel Evaluation in Attribute Grammar-based Systems. Ph.D.
dissertation, Computer Science Department, Cornell University, August 1990.
Available as Technical Report 90-1149.
[Zar9O]
182
Appendix
183
Appendix A
Proofs from Chapter 8
This appendix contains the detailed proofs of some of the lemmas ill Chapter 8. It also
includes some additional definitions and lemmas needed for these proofs.
Proof of Lemma 8.5
Before proving the Substitution Lemma, we need a couple of minor lemmas. The following
lemma extends substitution to type judgements.
Lemma A.1 If 5 is a substitution and TE F e : r, then S(TE) F e : ST.
Proof. Proofs of this for a similar system can be found in [Tof88] (Lemma 5.2, p. 48) and
in [Tof9O] (Lemma 4.2, p. 18).			I
The following lemma says that the typing assumptions (i.e., the type environment) of a
type derivation can be generallzed without affecting the result.
Lemma A.2 If TE + f?			F e: r and a' ? a, then TE + fx I; a'? F e :
Proof. The proof is by induction on the height of the deduction of
TEI(x ? a? Fe :T
and by case analysis of the last step (i.e., analysis of the shape of e). The interesting
cases are those involving the variable typing component of TE. Recall that the variable
convention means that x is not bound in e.
Case e =
If ? $ x', then Lemma 8.3 means that TE F x : T. Applying Lemma 8.3 again, we get
TE I a'? F x' :
185
If x = x', then a ? r, and since a' ? a, a' ? r. Then TE I fx a'? F X : r, by
rule (r-var).
Case e =
Rule (r-abs) applies:
TE I ?x a, x' r'? F e' : r
TE I (x a? F Ax1Ce') : (r'
So, by the induction hypothesis,
TE I fx I,' a', x' ? F e' : r
And, thus, applying rule (r-abs), we get
TE + f: ? a'? F Ax'(e') : (T' H
Case e = let x' = v in e'.
Rule (r-app-let) applies:
TE I(x:?`alFv:r'TEI?xi,'a,x'i,'CL0STE+?????(Y)?He:r
TE I fx H+ a? H let x' = v in e : r
Then, by the induction hypothesis,
TEl (x			a'? Fv :r'
and
TE I fx a', x' CLOSTE+(:??gy(Y)? F e : T
Since a' ? a, Lemma 8.1 gives us
OLOSTE+?:?+gI?(Y) ? CLOSTE+??cyJ(T')
We can then apply the induction hypothesis again to get
TE I fx ? a', x' H+ CL0STE+(:??I?(Y)? F e : r
And then, by rule (r-app-let), we get
TE I fx ? a'? F let X1 = v in e : r
Case e = let z1 = e1 in e2.
This follows the argument of the previous case.
186
Case e = chan x`in e'.
This case is similar to the case e = Ax'(e') above.
Lemma 8.5 (Substitution) If x ? FV(v), TE k v : r, and
TE I fx i,' ..... Q?.r? F e : r1
with (ai, . . ., a?? fl FTV(TE) = ? then TE F e[x i,' v] : r1
Proof. The proof is mostly from [WF9lb], and proceeds by induction on the height of the
deduction of
TE I ?z ?,` ..... a?.r? F e :
and by case analysis of the last step. Let TE = (vT, CT), VT' = VT If? .....
and TE' = (vT', CT) in the following discussion. We skip the cases for the terms covered
by the rules in Figure 8.2, since these cases follow those for (r-app) and (r-const). As
before, recall that the variable convention means that x is not bound in e.
Case e =
The last step is rule (r-const), so TypeOf(b) ? r'. Applying rule (r-const), we get
TE k b: r'. Since b[x I' v] = b, we are done.
Case e =
If ?? # x, then, by rule (r-var), VT'(x') ? r'. Since z'[x i,' v] = :? and VT(x') ?
TE F x' :
If x' = x, then VT'(x) ..... Q?.r. By rule (r-var), ..... Q?.r ? r', which
means that there is a substitution 5, such that dom(S) = ?..,..., Q?? and Sr =
r'. Lemma A.1 gives us S(TE) F v: Sr, which implies that S(TE) F v : r'. Since
dom(S) n FTV(TE) = ?, we have S(TE) --H TE thus, TE ? v :
Case e =
Rule (r-chvar) applies, thus CT(?) = r'. Since ?[::?` v] = ?, we can apply (r-chvar)
to get TE F ?[? v] :
Case e = e1 e2.
Rule (r-app) applies, so we have
TE'Fe1:(r"r')TE'?e2:
TE' F e1 e2
187
By the induction hypothesis and rule (r-app), we have
TEre1[xv]:(r"Y)TEFe2[xv]:
TE F ei[x			v] e2[x			v] :
Therefore, TE H e1 e2[x			v] : T1.
Case e = (e1.e2).
This case is very sirnilar to the previous case.
Case e =
Rule (r-abs) applies:
TE'I?x'T1?Fe':r2
TE' F ??I(??) :
with r' = (T1 H r2). Let 5 be the substitution
(a1?P1,...,cn?Pn?
where the a?, ?? and FTV(TE) &e all distinct. Then, by Lenuna A.I,
S(TE') I fx' ST1? F e': ST2
Note that S(TE') --H TE' since dom(S) A FTV(TE') = ?, hence
TE' I fx' ? ST1? F e' : ST2
The v&iable convention insures x' ? FV(v), so Lemma 8.3 gives us
TE + fx' ST1? H v : T
And the choice of 5 means that
FTV(TEI ?x' STil) A ??i,...,?n1 =
These facts, coupled with the induction hypothesis gives us
TE I fz'			ST1? F e'[x			v] : ST2
The substitution 5 is a bijection, so ?--H1 exists; hence, by Lemma A.1,
S--H1(TE I ?x'			ST1?) F e'[x :? v] : S?1ST2
simplifying, we get
TE I fx' ? T1?) F e'[x			v] : T2
thus, applying (r-abs), we get
TEI?x'rilFe'[xv]:T2
TE F Ax'(e'[x v]) : (T1
and therefore, TE ? (Ax'(e'))[x v] : (T1 T2).
188
Case e let :`=v1in e'.
This is the case of a non-expansive let, so the first step of the type deduction must
be rule (r-app-iet):
TE'Fv':T"TE'I(x'CLOSTEI(Y')??:T1
TE' F let x? = v'in e1 : r1
Since TE F v : r, Lemma 8.3 gives us
TE I ?x' CLosTEi(r")) F v : T (*)
Recall that ?.??..., a?? A FTV(TE) ? in the following:
f0Li, ., 0L?? A FTV(TE I ?x'			CL0STEI(Y')?)
fai,			., Q?? A (FTV(TE) u FTV(CL0STEI(r")))
fai,.			. , an? A FTV(CLOSTEI(T"))
=			?0Li,.			. , 0L?? A (FTV(r") \ (FTV(r") \ FTV(TE')))
=			fai,.			., 0L?? A FTV(r") A FTV(TE')
=			fai,.			., ?n1 A FTV(T") A (FTV(TE) u FTV(Va1... an.r))
= tai,. ., an) A FTV(r") A FTV(Va1...
= fai,. ., an) A FTV(r") A (FTV(r)\?ai,..., an))
The second premise of (r-app-let) and (*) with the induction hypothesis give us:
TE I ?x' CLOSTEs(T11)) F e'[z i' v] :
Note that CLOSTE(r") ? CLOSTE'(r"), so we can apply Lemma A.2 to get:
TE I fx' CL0STE(Y')) ? e'[x i,' v] :
Thus, by the induction hypothesis and (r-app-let), we have:
TEFv'[xv]:r"TEItx'CLosTE(Y'))Fe'[xv]:
TE F let x? = v'[x ` v] in e?[x			v] :
and, therefore, TE F (let x'=v' in e')[x i,' v] :
Case e = let x'=e1 ine2.
This is the case of an expansive let, so rule (r-imp-let) applles:
TE'Fe1:r"TE'If?'APPCL0STE'(Y'))Fe2:
TE' F let x' = e1 ine2 : Y
Choose a substitution
S: ((0Li,.. ,an) A IMPTYVAR) H (vi,
189
such that ..... ,Um are distinct imperative type variables, 5 is a bijection, and
fui, .,uml fl (FTV(TE) u FTV(r) U ?a1,. -,
Then, Lemma A.1 tells us that
S(TE' I fx' :, APPCL0STEI(Y')?) F e2 : Sr'
Since dom(S) A FTV(TE') ? and dom(S) c IMPTY, we have
TE' I ?x' `?` APPCLOSTEJ(Sr11)? F e2 : Sr'
Since x' ? FV(v) and since TE F v : r, we have
TE I fx' ? APPCL0STE'(SY')? F v : r
Let APPCLOSTE'(Sr") Vt1, ,t?.SY'; i.e.,
....... , ti?			(FTV(Sr") \ FTV(TE')) A APPTYVAR
then
fai,
?a1,
(a1,
o+ .., Q?? A FTV(TE I (x' :?` ????...? t1.SY'?)
a?) A (FTV(TE) u FTV(Vt1,... , tt.Sr"))
o+ ., a?? A (FTV(Sr")\fti,... , t1?)
By the inductive hypothesis with (*) and (**), we have
TE I (z'			APPCLosTEI(SY')? F e2[x:; v] : Sr'
Since 5 was chosen to be a bijection, 5-1 exists, so by Lemma A.1, we have
S-1(TE + (x' APPCL0STEI(SY)?) F e2[x :,` v] : S-1(SY)
simplifying, we get
TE + (x' APPCL0STEI(Y')? F e2[x ? v] :
Since APPCL0STE(r") ? APPCL0STEI(T"), Lemma A.2 applies:
TE + (x' APPCLosTE(Y)Y F e2[x ? v] :
By the induction hypothesis, we have
TE F e1[x H+ v] :
and, thus, we can apply (r-app-let):
TE?e1[xi,'v]:r"TEIfx'APPCL0STEJ(Y')?Fe2[xv]:
TE F let x'=e1[x v] in e2[x " v]: Y
and, therefore, TE F (let ?`=e1 in e2)[z v] :
190
Case e chan z1in ??
Rule (r-chan) applies:
TE'Ifx'0chan?He1:
TE' F chan z`in e1 :
By the variable convention, xl ? FV(v), so Lemma 8.3 gives us
TEI?x' i, Ochan? F v :r
Thus, by the induction hypothesis and rule (r-chan)
TEIfx'i,'0chan?Fe'[xv]:
TE F chan x' in e'[x			v] :
and therefore, (VT, CT) F chan x' in e'[x H+ v] :
Proof of Lemma 8.8
In this section, 1 show that the matching of event values preserves the parameter type of
the events. This requlres the following fact about abort actions:
Lemma A.3 If TE F ev : T event, then TE F AbortAct(ev) : unit.
Proof. The proof is by induction on the structure of event values and the definition of
AbortAct.			I
Lemma 8.8 If ev1 0 ev2 with (e1, e2) and TE F ev? : Tj event, then TE F e? : Tj (for i ?
fi, 2?).
Proof. This is proved by induction on the definition of event matching. Let TE = (vT, CT)
below.
Base case: ?!v ?? with ((),v). For i --H 1 the claim follows immediately from the type
of () and rule (r-output). For i --H 2 we must examine the type of ?. We have the
following judgement5:
TE F ?!v : unit event			(1)
TE ?			: r event			(2)
By rule (T-input) and (2), we have TE H ?: r chan, thus, the deduction of (1) by
rule (r-output) requires that TE F v :
191
Inductive cases. For the inductive cases, the i			1 case follows immediately from the
induction hypothesis. The i 2 case is proven by case analysis:
Case ev2 0ev1 with(e2,ei).
This case follows immediately.
Case ev1 0? (ev' ? v) with (e1,v e').
Rule (r-wrap) applies:
TEFev':r'eventTEFv:(YHT2)
TE F (ev' ? v) : r2 event
Thus, applying the induction hypothesis and rule (r-app) we get:
TEFe':r'TEFv:(r'r2)
TE F v : r2
Case ev1 0 (ev2 ? ev3) with (e1, (AbortAct(ev3); e2)).
Rule (r-choice) applies:
TEFev2:r2eventTEFev3:T2event
TE F (ev2 ? ev3) : r2 event
Then, by the induction hypothesis, and Lemmas 8.2 and A.3, we get
TEFAbortAct(ev3):unitTEFe2:T2
TEF (AbortAct(ev3); e2) :r2
Case ev1 0 (ev3 ? ev2) with (ei, (AbortAct(ev3); e2)).
This is the same as the previous case.
Case ev1 0 (ev2 v) with (e1, e2).
This case follows immediately from the induction hypothesis.
ProofofLemma 8.12
In this section, 1 show that stuck expressions are untypable. First, we need to characterize
the syntactic form of stuck expressions.
Definition A.1 The set of acceptable argtmentsto sync is defined as
SYNcARG = EvENTU ((Ge) e E ExP?
192
Lemma A.4 A process (ir; e?, with e closed, is stuck iff e has one of the following forms:
(1) E[b v], such that ?(b, v) is undefined.
(2) E[v v'], where v has the form (vi v2), ? ev, or (Ge').
(3) E[sync v], such that v ? SYNcARG.
Proof.
(?) Let E[e'] = e, then this direction proceeds by case analysis of the possible forms of e'
Case e' v.
Then E[e'] = [v], thus 7r it is not stuck.
Case e' = v v.
This case proceeds by analysis of the form of v:
Case v =
If S(b, v) is defined, then 7r is not stuck, otherwise it is stuck and has form 1.
Case v =
Then e is not closed, which is a contradiction.
Case v = A?(e11).
hi this case, ? is not stuck.
Otherwise.
lii the other cases, e is stuck and has form 2.
Case ?? = let x = v in e".
`a this case, ir is not stuck.
Case e' = sync v.
This case proceeds by analysis of the form of v:
Case v = ev.
7r is not stuck.
Case v = (Ge").
7r is not stuck.
Otherwise.
`a the other cases, e is stuck and has form 3.
Case e' = spawn v.
`a this case, ? is not stuck.
Case e? = chan x in e".
`a this case, ? is not stuck.
193
Thus, for each possible form of e', either 7r is not stuck (a contradiction), or the lemma
holds.
(?) This direction follows immediately from the definitions.
Before we can prove that stuck configurations are untypable, we need a lemma that char-
acterizes the values that have event types.
Lemma A.5 If TE F v: 7, for v ? VAR, then 7 = 7, event, for some 7', iff v ? SYNcARG.
Proof.
(?) This direction proceeds by examination of the terms in the set
VAL\(VARu SYNcARG)
None of these terms has an inference rule that can derive a judgement of the form
TE ? ev : 7' event. Thus, since v ? VAR, v C SYNcARG.
This direction is by examination of the terms in SYNcARG. The inference rules for
these terms are (?-never), (7-output), (r-input), (r-wrap), (r-choice), (?-abort),
and (r-guard), all of which derive judgements of the form
Thus, 7 = 71 event, for some 71
TE F ev : 71 event
Finally, we are ready to the main proof.
Lemma 8.12 (Untypability of stuck configurations) If ir is stuck in a well-formed
configuration K, ?, then there do not exist CT ? CHANTY and PT ? PRoCTY, such that
(f?, CT) F ?(7r) :
In other words, K, P is untypable.
Proof. Let ir be stuck in K, P, with P(r) = E[e'], and assume that there exist CT E
CHANTY and PT E PROCTY, such that (??, CT) F P(?) : PT(?). It suffices to show that
e'is untypable, which is a contradiction. Let 7 be the type of e1; i.e., TE' F e1 : 7, for
some TE'. Note that since K, P is well-formed, e' is closed; and thus Lemma A.4 gives the
possible forms of e1. The proof proceeds by case analysis of e', showing that e' is untypable
in each case.
194
Case e' = v v'. Rule (r-app) applies:
TE'Fv:(r'r)TE'Fv':
TE' F v v' : r
There are five subcases, depending on the structure of v.
(*)
Case v = b, with ?(b, v') undefined.
By the 6--Htypability restriction, 6(b, v') is defined, which contradicts e' being
stuck.
Case v = (v2.v3).
Rule (r-pair) requires that
TE' F (vi .v2) : (Ti X r2)
where TE' F v? : Tj, which contradicts the first premise of(*), thus e' is untypable.
Case v = ev.
By Lemma A.5,
TE' F ev : T1 event
but this contradicts the first premise of (*), thus e'is untypable.
Case v =
Rule (r-chvar) requires that ? have the type ri chan, for some r1, but this
contradicts the first premise of (*), thus e' is untypable.
Case v = (Ge").
By Lemma A.5,
TE' F (Ge") : r1 event
but this contradicts the first premise of (*), thus e' is untypable.
Case e' = sync v, with v ? SYNcARG.
Rule (r-sync) applies:
TE'Fv:r1event
TE' F sync v:
but, by Lemma A.5, v E SYNcARG, which is a contradiction.
195
