BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1290
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Abstract Value Constructors: Symbolic Constants for Standard ML
AUTHOR:: Aitken, William 
AUTHOR:: Reppy, John H.  
DATE:: June 1992
PAGES:: 31
ABSTRACT::
Standard ML (SML) has been used to implement a wide variety of large systems, 
such as compilers, theorem provers, graphics libraries and even operating 
systems. While SML provides a convenient, high-level notation for programming 
large applications, it does have certain deficiencies. One such deficiency is 
the lack of a general mechanism for assigning symbolic names to constant 
values. In this paper, we present a simple extension of SML that corrects this 
deficiency in a way that fits naturally with the semantics of SML. Our 
proposal is a generalization of SML's datatype constructors: we introduce 
constants that generalize nullary datatype constructors (like nil), and 
templates that generalize non-nullary datatype constructors (like ::). 
Constants are identifiers bound to fixed values and templates are identifiers 
bound to structured values with labeled holes. Templates are useful because 
they allow users to treat the representation of structured data abstractly 
without having to give up pattern matching.
END:: CORNELLCS//TR92-1290
BODY::
Abstract Value Constructors
Symbolic Constants for Standard ML*
William E. Aitken**
John H Reppy***
TR 92-1290
June 1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*A shorter version of this paper was presented at the ACM SlCPLAN Workshop on
ML and its Applications [AR92].
**This work was supported, in part, by the ONR under contract N0001 4-88-K-0409.
***This work was supported, in part, by the NSF and ONR under NSF grant CCR-85-
14862 and by the NSF under grant CCR-89-18233.
Abstract Value Constructors
Symbolic Constants for Standard ML*
William E. Aitkent			J0h11 II. Reppy?
Cornell University			Cornell University
aitken?cs cornell. edu			jhrCresearch.att corn
June 12, 1992
Abstract
Standard ML (SML) has been used to implement a wide variety of large systems,
such as compilers, theorem provers, graphics libraries, and even operating systems.
While SML provides a convenient, high-level notation for programming large applica-
tions it does have certain deficiencies. One such deficiency is the lack of a general
mechanism for assigning symbolic names to constant values. In this paper, we present
a simple extension of SML that corrects this deficiency in a way that fits naturally with
the semantics of SML. Our proposal is a generalization of SML's datatype construc-
tors: we introduce constants that generalize nullary datatype constructors (like nil),
and tempiates that generalize non-nullary datatype constructors (like : :). constants
are identifiers bound to fixed values, and templates are identifiers bound to structured
values with labeled holes. Templates are useful because they allow users to treat the
representation of structured data abstractly without having to give up pattern match-
ing.
1 Introduction
Standard ML (SML) is a modern high-level language with both a formal definition ([MTH9o,
MT9l]) and good implementations (e.g., [AM87]). SML is being widely used to implement
substantial systems; including compilers, theorem provers, graphics libraries, and even oper-
ating systems. Although SML is generally well designed, there are certain language features
that are known to be useful for system prograllinling, but are not included in SML. One of
the more glaring omissions is the lack of a general purpose mechanism for assigning symbolic
names to constant values.
A shorter version of this paper was presented at the ACM SIGPLAN Workshop on ML and its Appli-
cations [AR92].
tThis work was supported, in part, by the ONR under contract Nooo1?88-K-o4o9.
?This work was supported, in part, by the NSF and 0NR under NSF grant CCR-85-14862, and by the
NSF under NSF grant CCR-89-18233.
1Current address: AT&T Bell Laboratories, Murray Hill, NJ, 0T9T4.
The use of symbolic identifiers for constant quantities is an essential tool for writing
understandable and maintainable software, Every systems programming language provides
support for symbolic constants, from the very powerful *define mechanism in the C pre
processor ([KR88]), to the constant declarations of Modula-3 ([Nel9lj). For example, an
elevator control program that uses the values 0 and 1 to represent direction is harder to
read than one that uses the symbolic constants UP and DOWN. In C we might write:
typede? i?t directio??t;
#de?i?e UP 0
#de?i?e DOWN 1
SML provides no mechanism that allows users to specify names for constant values and to
use these names in pattern matching. This problem is significant because pattern matching
is the principal mechanism used for case analysis and the decomposition of structured
values. The lack of a general symbolic constant mechanism limits pattern matching to
concrete representations (unlike expressions, which can involve abstract symbolic values).
This sharply limits the programmer's ability to define abstract objects, since there is a
trade-off between abstraction and programmer convemence.
This paper presents data temptates, a powerful new mechanism for defining symbolic
constants. Templates are a generalization of SML's data constructors, and can be used
in both expressions and patterns. Furthermore, they work in the presence of separate
compilation and parameterized modules. Our mechanism is type-safe and fits naturally
into the semantics of SML ([MTH90]).
Our implementation technology has the added benefit of solving an outstanding problem
with datatype representation and abstraction. Furthermore, our semantic framework and
implementation technology admit a number of extensions, which make the language and
semantics more uniform. Mthough we set our presentation in the context of SML, our
mechanism and techniques are applicable to other languages with datatypes and pattern
matching.
In the remainder of the paper, we give examples of the utility of our mechanism, describe
the interaction between our mechanism and the module system, explain its semantics, sketch
iLb uupleine??ativ?, and discuss related work.
2 ML datatypes and patterns
The SML datatype mechanism provides a high-level notation for declaring new structured
types. For example, the declaration
datatype `a list = uji o? (`a * `a list)
2
defines the standard predefined polymorphic list type. This declaration defines two data
constructors: nil is a nullary constructor representing the empty list, and :: (which is an
inlix operator) is the list constructor. The expression 1: :2: :3: :nilevaluates to an integer
list ofthree elements (SML provides the derived notation 11, 2, 3] for this construction).
Datatypes can also be used to define enumerations; for example,
datatype direction_t = NORTH I SOUTH I EAST I WEST
The utility of datatypes is greatly enhanced by the use of pattern matching in function
definitions to do case analysis and value decomposition. For example, the following function
takes a list of strings and returns a list with commas inserted between adjacent elements:
commas 0 = nil
commas Es] = Es]
commas (s::r) = 5 :: , :: (commas r)
This function consists of three rules (or equations). The left side of a rule is a pattern,
the right side is an expression to be evaluated when the pattern is matched. The first rule
in commas matches the empty list. The second matches the singleton list and binds its
single element to 5. Theoretically, the third matches any list of one or more elements, but,
since the order of equations defines their precedence, it can only match lists of two or more
elements; it binds s to the head of the list and r to the tail. Patterns in SML are Linear,
i.e., a variable may occur at most once in a pattern.
The data constructors defined by datatypes have a dual nature; they are used both to
construct values (when they are used in expressions), and to destruct values (when they are
used in patterns). We use the term valte constructors to refer to identifiers with this dual
nature. If no confusion with type constructors is possible, we abbreviate this to constructors.
Our extension of SML provides another mechanism through which identifiers with this dual
nature may be defined. Our mechanism allows the definition of constants, which are value
constructors that refer to fixed pre-existing values, and the definition of templates, which
are value constructors that refer to fixed, structured values with named holes. Templates
can be thought of as the constructors for families of pre-existing values in the same way
that non-nullary data constructors may be thought of as constructors for families of new
values.
3 Constants
The use of symbolic names to refer to constant values is an essential tool for the writing
of unde??aijdable, maintainable software. In SML, val bindings allow values to be given
symbolic names, but these names cannot be matched against in patterns. This is a serious
3
limitation because pattern matching is the principal mechanism for case analysis and the
decomposition of structured values. It is also possible, using the datatype declaration,
to declare identifiers that may be matched against in patterns, but these are not symbolic
names for existing values. While these two mechanisms are adequate for many applications,
sometimes what is required is a mechanism that allows identifiers both to be bound to
particular values and to be matched against in patterns.
For example, consider the implementation of an X Window System library. The X Win-
dow System uses a device independent representation of keyboard keys called keysyms. We
need to provide users with a symbolic name for each of them. Clearly, users need to be able
to pattern match against these names so that they can conveniently write programs that
respond to different keystrokes differently. Thus, val bindings are not suitable to provide
these names. Using a datatype to represent keysyms is also unsuitable. The tuire represen-
tajion of a keysym, that is, the representation used by the X server, is a 29 bit integer. If the
library uses a datatype to represent keysyms, it must also include a function to convert from
wire representations to library representations, and another to convert library representa-
tions back into wire representations. It is essential that these functions be mutually inverse.
Since there are literally thousands of keysyms defined --H even minimtm English language
support requires 512 --H these functions present a maintenance nightmare. Furthermore,
the use of such huge functions adds significant run-time overhead to the marshaling and
unmarshaling of messages. Using our mechanism, we can write definitions such as
datatype keysym?t = KEYSYN o? int
const KS?a = KEYSYM 97
const KS?b = KEYSYN 98
Here the representation of a keysym is the constructor KEYSYM wrapped around the wire
representation. (This is a standard idiom for creating new types isomorphic to existing ones.
Compilers can represent keysym?t and int identically, and treat the constructor KEYSYM
as a no-op.) The identifiers KS?a and KS?b are declared as constants that represent the
characters `a' and `b'. They can be used both in patterns and in expressions. The identifier
KS?a stands for the structured value (KEYSYM 97). Note that this representation of keysyms
has a further advantage over the datatype representation in that it allows keysyms from
different keysets to be defined in different modules, enabling users to import only those
keysets they actually need.
This use of our mechanism is reminiscent of the *de?ine mechanism of the C language
[KR88] (arguably, this is one area in which C provides a higher-level notation than SML).
SML has a more general pattern matching facility than the C switch statement in that
patterns can match structured values (e.g., tuples). Therefore, it is natural to allow symbolic
names for structured constant values. For example, a calendar program might include the
4
following definitions:
datatype date = DATE of ?mollth : illt, day : illt+
const CHRISTNAS = DATE?moiith=I2, day=2S+
full getpresellts CHRISTNAS = true
I getpresellts - = false
4 Templates
Templates are a natural generalization of symbolic constants to allow labeled holes1. They
provide a mechanism to define a concise syntax for a collection of similarly structured values,
A template is defflied by a declaration of the form
collst id trivpat = pate:p
where trivpat is a pattern that involves only variables and record construction, and pate:p
is a pattern that contains only variables, record construction, special constants and other
value constructors. Every pate:p can be viewed both as a pattern and as an expression
(they are the syntactic intersection of patterns and expressions), Every variable appearing
in trivpat must also appear exactly once in pate:p. This ensures that it is always possible
to map back and forth between instances of a template and its expansion.
For example, the template mechanism allows us to define a template for the days of the
month July using the declaration
const JULY(x) = DATE?month=7,day=x>
which allows expressions such as JULY(17) to be used to create values for the days in July.
It also allows code such as
full nameOfflay (JULY 4) = "independence Day"
I nameOfDay CHRiSTNAS = "Christmas"
in which JULY is used as a constructor in a pattern match.
A more substantial and more useful example of templates arises in systems that do
term manipulation (such as the Nuprl proof development system, or code optimizers). For
the sake of concreteness, we set our example in the context of a generalized A-calculus.
1The term template was suggested by Dave MacQueen.
5
A termis either a variable or the application of an operatorto a sequence of bo?nd terms.
For example, A and ap are operators and A(?.a) is a term with ? bound in the sub-term
a, and ap(a; b) is term with sub-terms a and b (but no bound variables). One possible
representation of this term language uses a different constructor for each operator:
datatype term
= VAR o? var
I LANBDA o? (var * term)
I AP o? (term * term)
PLUS o? (term * term)
This representation has a two major deficiencies. First, functions that are largely indepen-
dent of the operators, such as computing the free variables of a term or substitution, require
many similar cases. For example, the code for computing the free variables of a term is:
iiill ireeVars (VAR v) = [v]
I freeVars (LANBDA(x, t))
?reeVars (AP(ti, t2)) =
?reeVars (PLUS(ti, t2))
= setNinus (?reeVars t, [1])
setUnion (freeVars ti, freeVars t2)
= setUnion (?reeVars ti, ?reeVars t2)
where free variable sets are represented by lists. Second, often it is desirable to make the
set of operators extensible (for example, as in the Nupri system), but if this representation
of terms is used, the datatype needs to be changed to extend the operator set, and this re-
quires a complete recompilation of the program. Furthermore, extension of the operator set
exacerbates the problem with functions like substitution --H the addition of a new operator
requires that a new case be added to each such function.
An alternative is to define a regular representation that is extensible. The following
datatype is such a representation; the set of terms is extended by treating more strings as
valid operator names:
datatype operator = OP o? string
datatype term
= VAR o? var
I TERM o? operator * ((var list * term) list)
With this representation, the free variable function requires only two cases:
?un ?reeVars (VAR v) = [v]
I ?reeVars (TERN(?, args)) = let
?un ?vb (bndVars, subTerm) = setMinus (freeVars subTerm, bndVars)
in
fold (fn (binding, 5) => setUnion (fvb binding, 5)) args []
end
6
Furthermore, this code is independent of any extensions to the term system.
Unfortunately, the syntax of expressions and patterns using this representation is quite
ugly. For example, the pattern that matches (3-redices (i.e., terms of the form ap(A(:.s); t)
is
TERN(OP "AP'1, E(F?, TERN(oP "LAMBDA", ECEx], 5)])), (E], t)])
compared to
AP(LAMBDA(i, s), t)
in the first representation. Moreover, the second representation scheme does not provide
the syntactic checking given by the first representation. The expression
TERM (OP "LAMBDA", [])
is a perfectly acceptable member of the type term even though terms formed with the
A operator should always have exactly one subterm in which exactly one variable is newly
bound. Lastly, the use of strings to name operators adds the overhead of string comparison
to pattern matching. All of these problems are elegantly solved using our mechanism. For
example, with the following declarations
datatype operator = OP o? int
datatype term = VAR o? var I TERM oi operator * ((var list * term) list)
const LAMHDA_OP = OP 0 and AP_OP = OP 1 and PLUS?OP = OP 2 and ...
const AP(p, q) = TERM(AP?OP, ECE], p), CE], q)])
and LANBflA(x, t) = TERM(LAMBDA?OP, E(Ex], t)])
and PLUSCa, b)			= TERM(PLUS?OP, [CE], a), ([], b)])
the pattern for P-redices is again AP(LAMBDA(x, s), t), but the code for computing the
free variables remains unchanged. While the type term still includes many unintended
values, disciplined use of the templates IP, LlMBDA and PLUS makes it impossible for users
to produce these values accidentally, and as described in the next section, the required
discipline can be enforced using the module facility. Thus, we obtain both the succinctness
and syntactic checking of the first representation and the flexibility and regular structure
of the second.
7
5 Constructors and the module system
The module system is an important feature of SML. Our proposal meshes elegantly with the
module system; moreover, the module system, because it allows the separation of specifica-
ti d imnl? ntati- ?eatlv increases the abstraction achievable with our mechanism.
ull CLI- - -- lU -
In particular, because the module system allows the programmer to limit the externally
visible definitions of a structure, it is possible to limit the constructors available to users
of the structure. This is particularly important when a type used to represent a class of
objects contains extra elements that do not represent any object. For example, the second
datatype for terms defined in Section 4 includes many ill-formed elements. A signature of
the form
sig
eqtype term
const AP : term of term * term
const LANBDA : term of var * term
const PLUS : term of term * term
end
would ensure that users of this term type only used values corresponding to well-formed
terms. Moreover, because data constructors are simply a special kind of value constructor,
it is possible to provide an intefface to a structure in which they are made available as
constructors without having to make the datatype declaration in which they are declared
visible. This in turn allows the constructors of a datatype to be selectively exported.
Our proposal adds constructor specifications to the syntax of signatures. A nullary
constructor specification has the form
const id : type
while a unary constructor specification has the form
const id : type of type'
In this specification, type' is the domain and type is the range of the constructor. The syn-
tactic distinction between unary and nullary constructors is required because, in patterns,
unary constructors must always appear with arguments and nullary constructors may never
appear with arguments. The legality of code such as
signature SIG =
Big
type un?owu
const K : unknown
end
8
functor F (A : SIG) =
struct
fun f A.K = 17
f - = 12
end
depends on K being a nullary operator. Thus it is essential that structures such as
structure 5 = struct
type unknown = int -> int * int
const K x = Cx, 12)
end
not be allowed to match SIG.
6 Projections
One of the asyrnmetries of the design of SML is that one can define a view of a constructor
that may only be used in expressions (by binding it to a variable using a val declaration),
but cannot define a view that can only be used in patterns. As an exarnple of the utility
of such a mechanism, consider the implementation of points in a graphics toolkit. We may
want to restrict points to some sub-range of the representable values, but still allow users
to decompose points using pattern matching. This might be done as follows
abstype pt?t = PTREP of (int * int)
with
exception PtRange
fun mkpt (a, b) =
if (a and b art in `ange)
then PTREP(a, b)
else raise PtRange
proj PT (x,y) = PTREP (x,y)
end
where the proj declaration defines an identffler that can be used only in patterns. It is safe
to allow wildcards on the right-hand side of such projection declarations.
Just as val specifications in signatures are used to export only the value of a constructor
defined in a structure, so also proj specifications can be used to export only its pattern
matching behavior. For the same reason that it is required for constructors, we must
malntain explicit arity information for projections.
9
7 Semantics
In the appendix, we give a rigorous formal description of the semantics of SML extended
with abstract value constructors in the style of [MTH90, MT91]. Obviously, a certain
amount of new syntax is introduced, and new semantic rules are required to describe its
meaning. In addition, some minor changes need to be made to the semantics to provide
the extra function given by our extensions. The substantive changes required are small,
but the need to propagate their effects means that many of the semantic rules appearing
in [MTH9O] must be updated. This section briefly outlines and motivates the changes we
made to the semantics of the Definition in adding our mechanism to SML.
Our semantics includes a treatment of exceptions. It does not provide an interpretation
of SML's other constructor ref. We ignore it because allowing ref as a constructor would
(of course) allow it to appear in constant and template declarations. Since the semantics
of ref depends on the store, this would mean that the semantics of all value constructors
might potentially depend on the store. Moreover, it is not entirely clear what the semantics
of templates defined using ref should be. Consider the declaration:
const strange = ref I
Should every use of strange in an expression result in a new reference cell being allocated,
or should all occurrences refer to the same object? We do not view this ambiguity as a
deficiency of our proposal, rather we view it as evidence that SML's treatment of ref as a
constructor is a mistake.
In the Definition, the only constructors avallable are data constructors, which evaluate
to themselves, and exception constructors, which can be looked up in the exception envi-
ronment. In our proposal, constructors denote arbitrary values; therefore, their semantics
must depend on a more general environment. For reasons described below, we actually use
two environments to give their semantics. We use the value environment, that is used to
give the semantics of variables, to give the semantics of constructors in expressions, and
we use a new environment, called the projection environment, to give their semantics in
patterns.
The semantics of a non-nullary value constructor C is given using a pair of functions
(C?, C?): an injection and a projection. The injection is used to construct values in expres-
sions, and the projection is used to perform the data destructuring in pattern matching.
This is a very general mechanism, and allows many extensions. For example, if C1r and
Ct are arbitrary ML functions, we have views in the sense of [Wad87]. The injection and
projection functions of our mechanism are quite restricted, and can be efficiently compiled.
Moreover, we believe that our proposal provides most of the function that programmers ac-
tually desire. Non-nullary data constructors and exception constructors can also be treated
10
in this manner. Using ML-like notation, the functions corresponding to the data constructor
TERM defined above are
TERMt			=			fn x => TERM x
TERM?			=			fn (TERM x) => I			- => FAIL
where FAIL is a special value used to denote match failure. Similarly, the functions associ-
ated with the identifler LAMBDl in the template definition given above are
LAMBDAL = fn Cx, t) => TERMCLAMBDA?OP, [(lx], t)])
LAMBD? = fn CTERMCLAMBDA?OP, [Clx], t)])) => Cx, t) I - => FAIL
Note the appearance of LAMBDA?OP in these functions. Since the scope of LAMBDA?OP
may differ from that of LAMBDA, these functions must record the environment in which
they were defined. A template declaration, const id trivpat = patexp, executed when the
value environnient is VF and the projection environment is PE, associates an injection
clostre (trivpat, patezp, VE) with id in the value environment, and a projection ciosnre
(trivpat, pate:p, VE) with id in the projection environment. This discussion is formalized
by the following rule:
((SF, VF, PE, FE) ? constbind ? VE', PE')
(SE, VE, PE, FE) F con trivpat = pate:p(and constbind) ?
(con i' (trivpat,pate:p, VE)?(+ VE'), (con i? (trivpat,patexp, PE)?(+PE')
We choose not to use ordinary closures to emphasize the restricted nature of the functions
denoted.
Intuitively, the semantics of constants is straightforward. We associate each constant
identifler c with its value v. Nullary data constructors can be treated similarly: they are
associated with themselves. So too can nullary exception constructors: the evaluation of an
exception declaration generates a new exception name, with which the exception constructor
is associated.
In practice, to avoid the need to define equality on structured objects, we do something
slightly different. Nullary data and exception constructors are treated just as described:
they are bound to their value in both the value and projection environments. Constants are
bound to their value in the the value environment, and to a closure consisting of the right
side of their declaration and the current projection environment. Now, patteru matching
against a constant can be defined in terms of pattern matching against their right hand
side. This discussion is formalized by the following rule:
(SF,VE, PE, FE)Fpatexpin Exp?v((SF,VE, PE, FE)Fconstbind?VE',PE')
(SF, VF, PF, FE) F con = patezp(and constbind) ?
(con i, vl(+ VE'), (con i, (pate:p, PE)?(+PE')
11
The environment must be recorded to ensure that constructors appearing in a constant's
expansion are correctly interpreted. Once again, we could use ordinary closures, but choose
to use a special closure representation to emphasize the special, restricted nature of the
represented functions.
Projection declarations produce a binding in the projection environment, but no binding
in the value environment. The binding produced is analogous to that produced by the
constructor declaration with the same body. Variable declarations are treated exactly as
before, in particular they have no effect on the projection environment.
The most important aspect of the module system semantics is the appropriate definition
of signature-structure matching. Informally a signature ? matches a structure 5 if it is less
specific: that is, if it has fewer components, is less polymorphic, or elides the constant
nature of values. For example, the specification
val nil : int list
matches the standard list constructor nil (which has type `a list). The specification
const nil : int list
also matches the constructor nil. When a structure is constrained by a signature (e.g.,
when used as the argument to a functor), it is necessary to thin it by removing the unused
components. Thinhing also involves mapping constructors to values or projections by dis-
carding their projection or injection functions. It is to facilitate these operations that we
use separate environments to maintain the associations between identifiers and their values,
and between identifiers and their projections, rather than associating injection-projection
pairs with constructors in the value environment. The association between constructors
and their injections is maintained using the ordinary value environment. This ensures that
the appropriate binding are automatically available when a signature exports an injection
ouly view of a constructor.
The value of an identifier appearing in an expression is always determined by lookup in
the value environment. This contrasts with the Definition, where value constructors merely
evaluate to themselves without reference to the environment and exception constructors are
looked up in the exception environment.
We must include a rule in the semantics to give the value of an application e:p1 e?p2 in
which the value of e:p1 is an injection closure (trivpat, pate:p, VE). Here trivpat is treated
as a pattern, and matched with the value of e:p2. Because trivpat consists only of variables
and tupling, this match always succeeds in type correct programs. Matching produces
an environment that records the required association between values and the variables of
trivpat. The term pate:p is treated as an expression, and evaluated using this environment
12
and the environment VE stored in the closure. The resulting value is the value of the
application. This description can be formalized as follows:
E H e:p ? (trivpat,patexp, VE)			E ? ate:p ? v
f?,vFtrivpatinPat?VE'VE+VE'Fpate:pinExp?v'
E ? e:p atexp ? v'
Pattern matching proceeds as in the Definition, except in its treatment of constructors,
where the projection environment is used to give their meaning, rather than the current
approach in which the semantics of exception constructors is given using the exception
environment, and in which datatype constructors are given meaning without reference to
an environment.
First, we consider the matching of a value v against a nullary constructor C. If C is
bound to either a constructor or an exception name in the projection environment --H that is,
if it is declared by either a datatype or an exception declaration --H then v matches c if and
only if it is equal to the projection associated with C. This means that these constructors
receive the same interpretation as they do in the Definition, although the mechanism used
to assign this interpretation is rather different. Otherwise, C is bound to a constant closure
(pate:p, PE). To match a value against this sort of constructor, we treat pate:p as a pattern,
and match the value against it, using PE to interpret any constructors appearing in patexp.
The match against the constructor succeeds if and only if this match succeeds. No variable
bindings are produced. This description is formalized as follows:
PE' = PE of E PE'(tongcon) = (patezp, PE")
v F pate:p in Pat ? VE1/FAIL
E,v ? longcon ? VE'/FAIL
Matching a value v against a pattern C p, where C is a constructor and p is a pattern, also
requires that C be looked up in the projection environment. If C is bound to a constructor
or an exception name, then the match succeeds if and only if the value is a pair consisting
of a tag equal to the exception name or constructor associated with the constructor and
a value v' that matches p. The bindings produced are those produced by the match of v'
against p. Otherwise, C is bound to a projection closure (tiivpat, pate:p, PE). The term
patezp is treated as a pattern, and matched against v. If this match fails, so does the match
against C p. Otherwise, the set of bindings it produces, which includes bindings for every
variable of trivpat, is used to evaluate trivpat (interpreted as an expression). The resulting
value v', is matched against p. The match of v against Cp succeeds if the match of v'
against p does, and when successful it produces the same bindings, otherwise, it fails. This
informal description of pattern matching against template constructors is made formal by
the following rules:
PE' = PE of E			PE'(longcon) = (trivpat, pate:p, PE")
PE", vFpate:pin Pat?VE'VE'?trivpatinExp?v'E, v'Hatpat?VE11/FAIL
E, v F longcon atpat ? VE11/FAIL
13
PE' = PE of E			PE'(longcon) = (trivpat, patezp, PE")
v ? pate:p in Pat ? FAIL
v F Longcon atpat ? FML
In the static semantics, we must show how the newly added declarations assign types to
identifiers. We must also refine the definition of structure-signature matching to explicitly
maintain the status and arities of constructors,
We intend that constant and template declarations assign types to the constructors
similarly to the corresponding variable declarations. That is, we intend that the following
pairs of declarations assign the same type to id
val id = pate:p			and const id = pate:p
val id = fn trivpat => pate:p and coust id trivpat = pate;p
This requires that we refine the definition of scope of explicit type variables to account for
coust declarations, that we extend the definition of type closure, and that we introduce the
appropriate rules to the static semantics.
The definition of signature matching requires information about the status of identifiers,
that is whether they are projections, variables or constructors. In the Definition, this
information can be deduced implicitly: an identifier is a constructor if and only if it is
datatype constructor, in which case it appears in the entry for the datatype in the type
environment, or if it is an exception constructor, in which case it appears in the exception
environment. All other identifiers are variables. This implicit determination of identifier
status is impossible in our setting. Thus, we must maintain the status of variables explicitly
in the environment. We do this by maintaining a new environment, the stat? enrn'??nment
that associates identifiers with their statuses. The definition of signature enrichment is
refined to account for status information. For one status environment to enrich another,
the latter environment must associate fewer identifiers with statuses, and each identifier
given a status by the latter environment must be given a less restrictive status than in
the former environment. Constructor status is more restrictive than either projection or
variable status, neither of which is more restrictive than the other. Exception constructor
status is the most restrictive status of all. It is tempting to treat identifier status solely in
the static semantics, but doing so is quite messy because various syntactic conditions, in
particular, the linearity of patterns, depend on the status of identifiers.
For reasons discussed earlier, we must also explicitly maintain the arity of constructors
and projections. We include this arity information along with the status information in the
status environment.
14
8 Implementation
We have built a simple testbed implementation of our proposal. At compile time, template
identiflers are bound to (trivpat, pate:p) pairs (the environment component of an injection
or projection closure is coded in the representation choices for the constructors). When a
known template occurs in a pattern, we iniine expand it. Say that C is bound to (tp,pe)
and we have the pattern C p. We symbolically apply tp to p. This yields a substitution
on the variables of tp (which are the same as the variables occurring in pe). Applying the
substitution to pe yields a new pattern, which replaces C p.
For a functor parameterized by a structure with constructors, there are two implemen-
tation issues: what is the representation of the abstract constructors and how are they used
in patterns. To handle the first question, we use implicit structure members for the injec-
tion and projection functions. Note that these only need to be added when the structure is
made abstract (i.e., by functor application), and can be generated as part of signature thin-
ning. The injection function is an ordinary function, while the projection is a function that
returns either the sub-values or FAIL. When building a decision tree, the compiler treats
abstract templates as ordinary constructors, which allows merging of matches against them.
This ensures that when CONST is abstract, code such as
case v
o? coHsT(fl) => 0
COJST(a::b) => a
is compiled so that the projection function associated with CONST is called only once. A
test against an abstract constructor is a call to the projection function. Of course, the
use of functions to implement the construction and destruction associated with abstract
constructors may result in a certain degradation of performance. Of particular concern is
the loss of merging when two abstract constructors have common structure. (For example,
the templates LAMBDA and AP defined earlier share the structure TERM(OP -` : ). Any
value not matching this pattern cannot possibly match either of them). Other costs include
the loss of inline tests and the replacement of branch tables with trees of conditionals.
Note that these costs are incurred only when constructors are actually abstract, that is,
when functors are used. In particular no penalty is incurred when structures, which have
transparent signatures, are used rather than functors. Moreover if macro expansion is used
in the implementation of functor application (a reasonable thing to do when compiling a
production version of a system), then functors reduce to structures and abstract constructors
become concrete.
Our implementation of abstract constructors can also be applied to solve an outstanding
problem, described in more detail in [App90], with datatype representation and abstrac-
15
tion. In some implementations of SML, the representation of the datatype defined by
datatype d = A I B of t, depends on the representation of t. ff the representation of
t is appropriate, it is possible to represent the value B e?p with the representation of e:p.
Problems arise when t is abstract, since its representation is known at the definition of d
but not elsewhere. Extending our technique to cover abstract datatype constructors that
fall into the danger zone solves this problem. Although our solution to the problem incurs a
performance penalty, a less speedy program is better than one that does not run correctly.
9 Related work
Wadler's view mechanism ([Wad87]) shares the objective of allowing data abstraction and
pattern matching to cohabit. Views were once part of the Haskell definition ([HW88]),
but were dropped because of technical difficulties. Conceptually, a view of a type T, is a
datatype T' together with a pair of mappings between T and T'. Ostensibly these maps
are isomorphisms, but since they are defined by the user, there is no assurance that the
types are truly isomorphic. Views and templates differ in several significant ways. The
principal difference is that Wadier's views define maps between concrete representations,
whereas templates provide abstract views of a single representation. Because views define
different types, a given pattern match can involve only one view. In addition, once a view is
defined, it is not possible to add additional constructors (even if other representations admit
additional objects). Templates, on the other hand, do not suffer these restrictions. The im-
plementation of views uses the user defined maps to convert between representations; thus,
pattern matching can incur arbitrarily large performance penalties.2 In our scheme, most
uses of templates incur no run-time cost, and the worst cost associated with their use is that
they may force patterns to be matched serially rather than in parallel. Our presentation of
the semantics of templates is more detailed than that of views given in [Wad87]; further-
more, we address the semantic and implementation issues related to separate compilation
and parameterized modules.
The CAML system ([WAL+]) provides a mechanism for defining new concrete syntax,
by specifying a grammar to map quoted phrases to the internal representation of programs.
This mechanism could be used to implement our template mechanism, although the imple-
mentation details appear non-trivial. Recently, a qttotation mechanism has been proposed
for SML, which allows terms in some object language to be included in expressions ([Sli9l]).
This provides some of the syntactic convenience of our mechanism, but it provides no help
for pattern matching against terms of the object language.
2In fact, there is no guarantee that the maps even terminate.
16
References
[AM87]
A. W. Appel and D. B.. MacQueen. A Standard ML compiler. In F?nctional
Programming Lang?ages and Computer Architecture, volume 274 of Lecture Notes
in Computer Science, pages 301--H324. Springer-Verlag, September 1987.
[App90] Andrew W. Appel. A runtime system. Lisp and Symbolic Computation, 4(3):343--H
380, November 1990.
[AR92]
[HW88]
William E. Aitken and John H. Reppy. Abstract data constructors. In Proceedings
of the ACM SIGPLAN Workshop on ML and its Applications, pages 1--H11, June
1992.
Paul Hudak and Philip Wadler. Report on the functional programming language
haskell (draft proposed standard). Technical Report YALEU/DCS/RR-666, Yale
University, Department of Computer Science, December 1988.
[KR88j Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language.
Prentice Hall, Englewood Cliffs, N.J., 2nd edition, 1988.
[MT91] R. Milner and M. Tofte. Commentary on Standard ML. The MIT Press, Cam-
bridge, Mass, 1991.
[MTH90] R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. The MIT
Press, Cambridge, Mass, 1990.
?Nel91] Greg Nelson, editor. Systems Programming with Modula-3. Prentice-Hall, Engle-
wood Cliffs, N.J., 1991.
[Sli91] Konrad Slind. Varieties of object language embedding in standard ML, 1991.
unpublished.
[Wad87]
Philip Wadler. Views: A way for pattern matching to cohabit with data abstrac-
tion. In Conference Record of the 1?th Annual ACM Symposium on Principles of
Programming Languages, pages 307--H313, January 1987.
[WAL+j Pierre Weis, Marfa-Virginia Aponte, Main Laville, Michel Mauny, and Ascander
Suarez. The CAML Reference Manual (Version 2.6). Projet Formel, INRIA-ENS.
17
A Appendix
A.1 Introduction
In this appendix, we give the semantics of our proposed mechanism. This semantics is presented as
a modification of the formal semantics of Standard ML published in [MTH9O], referred to as "the
Definition," and [MT9ij referred to as "the ?omment?yfl For reasons discussed in the body of the
paper, our semantics assumes a language without references (but with exceptions). In section A.5,
we show how references may be partially restored to the language.
A.2 Syntax
A.2.1 Core
Two new reserved words, const and proj must be added to the language, so must a new class Proj
of identifiers, containing those identifiers that denote value projections. We define a number of new
phrase classes:
ConstBind			constant declarations
ProjBind			projection declarations
TrivPat			template arguments
TrivPatRow template arguments
AtPatExp			atomic template expansions
PatExpRow template expansion rows
PatExp			template expansions
We define these phrase classes, and extend a few phrase classes of the Definition as follows: (The
symbol +::--H--H is used to denote extension of existing phrase classes.)
conatbind			::--H--H			con (trivpat) = pate:p(and conatbind)
projbind			::=			proj (trivpat) = pat(and projbind)
atpate:p
scon			special constant
(op) var			variable
(op)Longcon			value constructor
(op) Longe:con			exception constructor
? (pate:prow ) ?			record
trivpat : ty			typed
(pate?p)
pate:prow ::= Lab = pate:p(, pate:prow)
pate:p . .--H
dec
atpate:p			atomic
(op) iongcon atpate:p			value construction
(op) Longe:con atpate:p			exception construction
pate:p1 con pate:p2			infixed value construction
patezp1 e:con pate:p2			infixed exception construction
pate:p :
+::= conat con'tbind			constant declaration
proj projbind			projection declaration
+::= (op)Longproj			projection
atpat
18
pat			+::= (op) iongproj atpat			projection
pat1 proj pat2			infixed projection
tri'vpat			. . --H
(op) var			variable
? trivpatrow+			record
trivpat : ty			typed
(trivpat)
trivpatrotv			tab = trivpat(, trivpatrotv)
Certain new syntactic restrictions are required:
o+ No template argument or template expansion may contain the same variable twice.
In a constant declaration con = pate:p, no variables may occur in patezp, while in a constant
declaration contrivpat = patezp, the set of variables occurring in trivpat must be identical to
the set of those appearing in pate:p.
In a projection declaration proj = pat, no variables may occur in pat, while in a projection
declaration proj trivpat = pat, the set of variables occurring in trivpat must be identical to the
set of those appearing in pat.
0 No constant or projection declaration may bind the same identifier twice.
A.2.2 Modules
Two new phrase classes are required.
ConstDesc			constant specification
Proj Desc			projection specification
They are defined, and the necessary changes made to the Definition's phrase classes by the following
productions.
spec
+::--H--H const constdesc			constant specification
proj projdesc			projection specification
constdesc			::--H--H			con : ty(o? ty)(and constdesc)
projdesc			::=			proj : ty(oi ty)(and projdesc)
One new syntactic restriction is required.
o+ No constant description constdesc or projection description projdesc may describe the same
identifier twice.
19
A.2.3 Derived Forms
Certain new derived forms, analogous to derived forms given by the Definition are introduced.
Template Arguments trivpat
(trivpat1,...,trivpat?)?t=trivpat1,--?n=trivpat?? (n?>2)
Template Argument Rows t'ivpatrow
id : ty? (, trivpatrow, id = id :ty , trivpatrow
Template Expansions patezp
(pate:p1,			, pate:p?)			?t = pate:p1, . ,n= pate:p?>			(n >2)
-			pate:p1 ::			:: pate:p? :: nil (n > 0)
Lpate:p1,,patezp?i
Template Expansion Rows pate?prow
id :ty , paterproin id = id : ty? , patezprow',________
A.2.4 Identifier Status
The status of an identifier, that is whether it is an exception constructor, an ordinary constructor,
a variable, or a projection is determined by the rules of Appendix B of the Commentary and the
following new rules. Rule 4 of Appendix B must be modified as below to account for the possibility
that an identifier is already declared as a projection. A new status p is added for projections.
3'. A const declaration (specification) assigns id : c for each id which it declares (specifies).
3". A proj declaration (specification) assigns id : p for each id which it declares (specifies).
4. A pattern pat assigns id : v for every id which does not already have c, p or e status and
occurs in pat.
4'. A template argument trivpat (template expansion pate:p) assigns id : v for every id which
does not already have c, p or e status and occurs in trivpat (pate:p).
A.3 Static Semantics
A.3.1 Core
We define a new class of simple semantic objects called Status. We use it to represent the status
and arity of identifiers. It has seven members: one for variables, and two for each of constructors,
projections and exceptions (one for each possible arity).
3 E Status = fv,cQcI,pO,pI,eO,eI?
This set is ordered by the partial order E which is defined as the least partial order in which the
following hold: v E cO, po E cO, cO E sO v E cl, pi E cl, and cl E el.
Only one new set of compound semantic objects is needed, the status environments StatEnv.
These are finite mappings from identifiers to their statuses. We must also modify the definition of
a (static) environment to include the status environment.
XE E StatEnv = Id ? Status
E or (SF, TE, VE, FE, XF) E Env = StrEnv x TyEnv x VarEnv x ExConEnv x StatEnv
20
Type variables may occur in constant and projection declarations, so we must explain their
binding structure. Recall that constant and projection declarations are typed analogously to value
declarations. This means that we must ensure that free type variables occurring in constant and
projection declarations become bound at the correct point. For this reason, we must extend the
definitions of unguarded type variables and type variable scope that appear in Section 4.6 to account
for projection and constructor declarations. An occurrence of a type variable a in a value, projection
or constant declaration dec is unguarded if it is not part of a smaller value, projection or constant
declaration contained in dec. Since constant and projection declarations never contain declarations,
this means that every type variable occurring in a constant or projection declaration is unguarded. A
type variable a is scoped at a given value, projection or constant declaration if it appears unguarded
in that declaration, but not in any larger value declaration containing that declaration. Associated
with each value, projection and constant declaration is a (finite) set of the explicit type variables
that are scoped there. We think of these declarations as being implicitly decorated with this set,
which we indicate in the static semantics by subscripting the reserved words const, proj and val.
Because we intend the static semantics of constant declarations to be easily explicable in terms
of the static semantics of variable declarations, we must be careful in closing an environment VE
that arises from a constant declaration. Since no constant declaration may bind the same identifier
twice, for each constructor con bound by VE there is a unique con (trivpat) = pate:p that binds con.
If VE(con) r, let C105c,constbindVE(COfl) = Va(k).r, where
a(k) =			tyvars T \ tyvars C			if patezp is a variable or constructor, or trivpat is present.
apptyvars r \ tyvars C			otherwise.
Similarly, for projection declarations, each proj that is bound by VE is bound by a unique clause
proj (trivpat) = pat. If VE(proj) = r, let Closc,pwj?j1L?VE(pr0j) = Va(k).r, where
tyvars r \ tyvars C			if pat is a variable, constructor, or projection,
or trivpat is present.
apptyvars r \ tyvars C			otherwise.
We adopt the convention that for any iongid = strid1. . . stridk . id and any environment E,
E(iongid) = (VE'(id),XE'(id)) where VE' and XE' are defined by
VE'			=			VE of ((SE of ... ((SE of E)stridi).. .)stri4)
XE'			=			XE of ((SE of ... ((SE of E)stridi).. )strid?).
In what follows, inference rules that are numbered are to be read as replacements for the cor-
responding rules in the Definition, and those without numbers are to be read as additional rules.
Unless explicitly mentioned, the remaining rules of the Definition should be used without change.
Declarations			[Aec?\
C+UHvatbind?VE,XEVE'=O105C,,,aThindVEUntyvarsVE'=
C H valo vaibind ? (VE',XE) in Env			(17)
C+UHconatbind?VE,XEVE'=Cl0SC,co?tbtndVEUntyvarsVE'=
C ? constu constbind ? (VE',XE) in Env
C+UHpiojbind?VE,XEVE'=Cl0SC,projbindVEU ntyvarsVE'=
- proju projbind ? (VE',XE) in Env
Ce TE H datbind ? VE, TE,XE			V(t, GE) E Ran TE,t ? (T of C)
TE maximizes equality
C ? datatype datbind ? (VE, TE,XE) in Env
(19)
21
C e TE H datbind ? VE, TE, XE			V(t, GE) c Ran TE, t ? (T of C)
C e (VE, TE, XE) H dec ? E			TE maximizes equality
C H abstype datbind with decend ? Abs( TE, E)
CH e:bind ? EE,XE			VE EE
(20)
(21)
C H exceptione:bind ? (YE, EE, XE) in Env
The only changes made in these rules were those required to propagate the status environment.
The rule for constant declarations, together with the rule given below for the static semantics of
constant bindings ensures that the following pairs of declarations assign the same type to id (recall
that PatExp C Exp and TrivPat C Pat).)
val id = pate:p			and const id = pate:p
val id = fn trivpat => pate:p and const id trtvpat = patezp
Value Bindings			[GHvatbind?VE,
G?pat?(VE,XE,r)GHe:p?r(GHvalbind?VE',XE') (26)
C H pat: e:p( and vaibind) ? VE(+VE'), XE(+XE')
G+VEHvaibind?VE,XE			(27)
CH recvalbind ? VE,XE
Once again the only changes are those required to propagate the status environment.
Constant Bindings			G H constbind ? VE, XE
(C H trivpat in Pat ? (VE, XE, r')) G (+( VE, XE)) ? patezp in Exp ? r
H con'tbind ? VE', xE'?)
C H con(trivpa4) = patezp((and constbind)) ?
?con :? r?(+fcon:: r'			r1)((+VE')),?con ,?: co?(+?con ?
This rule formalizes the earlier discussion of the static semantics of const declarations. The trivpat
(if present) is treated as a pattern, and the pate:p is treated like an expression. Note the assignment
of constant status and the appropriate arity (1 if trivpat is present, 0 otherwise) to each constant
bound.
Projection Bindings			C H projbind ? VE, XE
(GH trivpat in Pat => (VE,XE,r')) GH pat ? (VE,XE,r)
H p?jbind ? VE', xE'))
C H proj(trivpat) = pat((and projbind)? ?
?p?oj :? r?(+?proj:: r' r?)((+VE')),?proj?: po?(+?proj pll)((+XE'))
Note that the same variable and status environments are produced for both trivpat and pat. This
ensures that their variables are typed identically. Recall that the syntax requires that they contain
exactly the same variables. Note the assignment of projection status and the appropriate arity to
each declared projection identifier.
C Hdatbind?VE,TE, XE
tyvarseq --H a(k)			C, a(k)t H conbind => GE, XE
?datbind?VE,TE, XE'V(t',GE)ERanTE,t#t')			(29)
G H tyvarseq tycon = conbind ( and datbind) ?
Clos CE(+ VE), ?tycon			(t, ClosGE)?(+ TE), XE(+XE')
22
Datatype Bindings
This rule propagates the state environment produced by its hypothesis. It is otherwise the same as
the corresponding rule in the Definition.
Constructor Bindings
datbind ? CE,XE
H tv ? Y?			((C,r H conbind ? CE,XE??			(30)
C,TH con(of t??(( I conbind)) ?
(con			r?((con			r'			r))((+CE)),(con :? c01((con H+
Note the assignment of constructor status, and the appropriate arity to each declared datatype
constructor. Otherwise, this rule is the same as the corresponding rule in the Definition.
Exception Bindings			CHe:bind?EE, XE
H ty ? r r is imperative) ((C H e:bind ? EF,XE))
C H e:con(o? ty)((and e:bind)) ?
(e:con			exn?(+(e:con			r			exn?)((+EE)),(e:con			eo?(+(e:con			eIl)((+XE))
(31)
(32)
C(Longexcon)=(r,s)(C?e:bind?EE,XE)
C H e:con = Longe:con(and ezbind) ?
(e:con			r)(+EE),(e:con:: s?(+XE)
Note the assignment of exception status in rule 31. In rule 32, s is either e0 or el, because excep-
tion constructors are declared (specified) only by exception declarations (specifications), and these
always assign exception status. Otherwise, these rules are the same as the corresponding rules in
the Definition.
Atomic Patterns
CH --H? ((),(?r)
C H scon ? ((),(?,type(scon))
C H var ? ((var			ri,(var H4 v?, r)
C(longcon)=(QcC)??r
CH Longcon ?
C(iongproj)=(QpO)??T
C H tongproj ? ((l?(?,r)
C(longe:con) = (eni, eC)
C ? ?nge:con ? ((l?(),exn)
(C H patrow ? (vE,xE,?)
C H ?(patrow)1 ? ((l(+VE),(1(+xE),(i(+? in Type)
CF pat ? (VE,XE,r)
CF atpat ? (VE,XE,r)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
CF (pat) ?(VE,XE,r)
Rule 35 and rule 46 (below) are the only rules concerning patterns that assign status to identifiers.
Both only assign variable status. Note how the three rules for constructors, and exceptions con-
structors use the status to ensure that the identifier really is nullary. The only changes to rules 33,
34, 38 and 39 are those required to propagate the status environment properly.
23
Pattern Rows
WH patrow ? (vE,xE,T\)
(40)
CH pat ? (VE,XE,r)
? patrow ? (vE',xE',?			tab ? Dom?
C H tab = pat(, patrow) ? (vE(+YE'),xE(+xE'),ftab H* r?(+?) (41)
The only changes to these rules are those required to propagate the status environment properly.
Patterns			CH pat ? (VE,XE,T)
CHatpat?(YE,XE,v)
CH atpat?(YThXThr)
C(tongcon)=(?c1)??v'?vC?atpat?(YE,XE,v')
C H Longcon atpat ? (VE,XE,v)
C(tongproj)=(a,pI)cT?v'?+vCHatpat?(VE,XE,v')
C H longproj atpat ? (YE,XE,v)
C(longe:con)=(vexn,el)CHatpat?(YE,XE,v)
C H Longe:con atpat ? (YE, XE, exn)
C?pat?(YE,XE,v)CHii'?v
CHpat : ty?(VE,XE,v)
C H pat ? (YE,XE,v)			(CH ty ? v)
(42)
(43)
(44)
(45)
(46)
c id(:ty)aspat?(YE+fid??v1,XE+?id?v?,v)
Note how the three rules for constructors, projections, and exception constructors use the status to
ensure that the identifier really does accept an argument. Rule 46 has been modified to generate a
variable status for the identifier it binds, and to propagate the status information derived from its
sub-pattern. Rules 42 and 45 now propagate the status environment but are otherwise unchanged.
A.3.2 Modules
We need to refine the definition of enrichment so that it accounts for the status environment. All
that needs to be changed is the definition of environment enrichment, and the only change required
there is the addition of a further clause describing the appropriate relationship between the two
status environments. An Environment E1 (5E1, TE1, YE1, EE1, xE1) enriches an environment
E2 = (SE2, TE2, YE2,EE2,xE2), written E1 ? E2 if:
1. DomSE1 ? DomSE2 and SE1(strid) ? SE2(strid) for each strid E DomSE2.
2. Dom TE1 D Dom TE2 and TE1(tycon) ? TE2(tycon) for all tycon E Dom TE2.
3. Dom YE1 D Dom YE2 and VE1(id) ? VE2(id) for all id E Dom YE2.
4. DomEE1 ? DomEE2 and EEi(e:con) = EE2(e:con) for all e:con E DomEE2.
5. DomXE1 D DomXE2 and XE1(id) m XE2(id) for all id E DomXE2.
Otherwise, the only changes required are in the semantics of specifications: we show how the
status of identifiers is inferred and propagated.
24
Specifications
Value Descriptions
C of B H constdesc ? VE, XF
B H const constdesc ? (Clos VE, XF) in Env
C of B H projdesc ? VE, XF
B H proj projdesc ? (Clos VE, XF) in Env
C of B H datdesc ? TE, VE, XF
B H datatype datdesc ? (TF, Clos VE, XF) in Env
Cof B H e:desc ? FF,XF
B H exception e?desc ? (??, ??, FE, FE, XF)
H spec ? FJ
(73)
(74)
Fc Hvaldesc?VE, XF
(82)
CFty?r			(CHvaLdesc? vF,XF)
C H var : ty(and vaLdesc? ? fvar :? rl(+ VE), ?var v?(+XF)
Constant Descriptions			CHconstdesc?VE,XF
C H ty ? r ? ty' ? r') ((C H constdesc ? VE, XE))
C H con : ty(of ty')((and constdesc)) ?
fcon			r?(+?con			r1			r?((+VF)),?con:: cO?(+?con			clb((+XF))
Projection Descriptions			Fc Hp??jdesc ?VE, xEl
C ? ty ? r ? ty' ? Y) ((C ? projdesc ? VE, XF))
C H proj : ty(oi ty')((andprojdesc)) ?
?proj H+ r?(+?proj r' r?)((+VE)),?proj i,' po?(+?pr0j pIl)((+XE))
Datatype Descriptions			CHdatdesc?TE,VE,XE
tyvarseq --H a(k)			c, a(k) ? condesc ? CF, XF
(C H datdesc ? TE, VE,xF') (84)
C H tyvarseq tycon = condesc(and datdesc) ?
ftycon (t, ClosCE)?(+ TE), ClosCE(+ VE), XF(+XF')
Constructor Descriptions
? ty ? r') ((Cr ? condesc ? CF, XF))
C,rH con(oi ty)((Icondesc)) ?
(con:: r?(+(con:: r' H r?)((+CF)), (con H+ cOl(+(con:: cI?)((+XF))
C, r?H condesc ? CF, XF
(85)
Exception Descriptions			C e:desc ? FE, XF
H ty ? r tyvars(r) ?) ((C H e:desc ? FE, XF))
C H e:con(os ty)((and e:desc)) ?
(e:con :: exnl(+(e:con I: r exn?)((+FF)),(e:con:: eOi(+(ezcon:: et?)((+XF))
(86)
25
A.4 Dynamic Semantics
A.4.1 Core
The values that expressions may yield are changed. We remove Addr and : = because they are
required only for the support of references. We also need to add a class ConClosure of values to
represent the values of templates. There is no need for a new class to represent the values of symbolic
constants, since they are evaluated when they are declared.
We also need to define an environment in which constant identifiers are mapped to their pro-
jection value. The possible values include constructors, exception values, and projection values for
symbolic constants. The projection values of symbolic constants are represented using two classes,
Constant and ProjClosure; the first is used for unparametrized constants, and the second, for tem-
plates.
(trivpat, pate:p, PE) ?
(trivpat,pate:p, VE) ?
(pate:p,PE) E
PE
(SE, VE,PE,EE) orE E
Atomic Expressions
Val = SVal U BasVal u Con
u (Con x Val) u ExVal u Record
u Closure u ConClosure
ProjClosure = TrivPat x PatExp x ProjEnv
ConClosure = TrivPat x PatExp x VarEnv
Constant = PatExp x ProjEnv
Proj = Constant u Con U ExVal U ProjClosure
ProjEnv = (Con u ExCon u Proj) ? Proj
Env = StrEnv x VarEnv x ProjEnv x ExConEnv
VE(iongvar) = v
(SE, VE, PE, EE) ? iongvar ? v
VE(Longcon) = v
(SE, VE, PE, EE) H longcon ? v
VE(longe:con) = v
(SE, VE,PE,EE) H Longescon ?v
H ate:p ?
(104)
(105)
(106)
We consistently obtain the value of an identifier by looking it up in the vaLve environment. We include
all three rules for emphasis, but in fact only rule 105 is changed (while a note in the Definition states
that exceptions are to be looked up in the exception environment, the value of every expression must
be stored in both the value and exception environments, so this note may be safely ignored).
EHe:p?conEHate:p?v
E H ezp aie:p ? (con, v)
E H e:p ? (trivpat, pate:p, VE)			E ? ate:p ? v
vHtrivpatinPat?VE'VE +VE'HpatezpinExp?v'
E H e:p ate:p ? v'
e:p ? vIp
(112)
Expressions
26
Rules 114 and 115, which give the semantics of re? and need to be eliminated. For the same
reason, the restriction that the constant not be rei in rule 112 is dropped. The second rule gives
the semantics of template application. Note the use of the fact that both TrivPat and PatExp are
subsets of Pat A Exp to avoid the need for new rules to explain their semantics.
Declarations
E H va?bind => VE
EHvalvaibind ? (?l? VE,fl,?l)
E H conatbind ? VE, PE
E H const constbind ? (??, VE, PE, f?)
E F projbind ? PE
EHprojprojbind ? (fl,fl,PE,?Y)
H datbind ? GE
E H datatype datbind ? (f?, GE, GE, (?)
E H e:bind ? EE
H dec ? m\'i?
(129)
(130)
E H exceptionezbind ? (??, EE, EE, EE)
These rules are concerned with ensuring that the projection (and value) environments are correctly
maintained. Note that these rules take into account the discussion in section 2.9 of the Commentary.
Constant Bindings			[EHconstbind?VE,PE/p
(SE, VE, PE, EE) H pate:p in Exp ? v ((SE, VE, PE, EE) H conatbind ? VE', PE')
(SE, VE, PE, EE) H con = pate:p(and constbind? ?
fcon:: v?(+ VE'), ?con (pate:p, PE)?(+PE')
((SE, VE, PE, EE) H conatbind ? VE', PE')
(SE, VE, PE, EE) H con trivpat = paterp(and constbind) ?
fcon :. (trivpat, pate:p, VE)?(+ VE'), fcon:: (trivpat, pate:p, PE)l(+PE')
Two rules describe the semantics of symbolic constant declarations, one for templates and the other
for constants. Note that constants are evaluated as soon as they are declared.
Projection Bindings
E ? projbind ? PEIp
((SE, VE, PE, EE) H p?jbind ? PE')
(SE, VE, PE, EE) H proj = pat(and projbind) ? fproj (pat, PE)?(+PE')
((SE, VE, PE, EE) H projbind ? PE')
(SE, VE, PE, EE) H proj trivpat = pat (and prolbind) ? ?proj :?` (trivpat, pat, PE)l(+PE')
Two rules describe the semantics of symbolic projection declarations, one for nullary projections
and the other for non-nullary projections
Datatype Bindings
H conbind ? GE			(H datbind ? GE')
H t?varseq tycon = conbind(and datbind) ? GE(+GE')
datbind ? GEIp
This rule, which gives the dynamic semantics of datatype declarations, is required, in essentially this
form, even in the semantics of the Definition. It is included here only because it does not appear in
the Definition. See section 2.9 of the Commentary, for a discussion of the issues involved.
27
Constructor BIndings
conbind?cE/p]
conbind ? CE'?
H con((o? ty??(I conbind? ? (con			con)(+CE')
Like the rule that gives the dynamic semantics of datatype declarations, a rule of this form is required
in the semantics of the Definition, but not included there. See section 2.9 of the Commentary, for a
discussion of the issues involved.
Atomic Patterns
E,vHatpat?VE/FAIL1
PE'FEofEPE'(Longcon)=conv=con 144
v H Longcon ?
PE'=FEofEPE'(Longcon)=convcon (145)
E,v H Longcon ? FAIL
=FEofEpE?(longcon)=env=en
v H Longcon ? (1
PE'=PEofEFE'(Longcon)=env#en
E, v H Longcon ? FAIL
FE' = FE of E			FE'(iongcon) = (pate:p,FE")
FE", v H pate:p in Pat ? VE'/FAIL
E,v H tongcon ? VE/FAIL
FE'=FEofEFE'(iongproj)=conv=con
E,vH Longproj ?
FE'=FEofEFE'(Longproj)=convcon
E,v ? iongproj ? FAIL
FE'=FEofEFE'(iongproj)=env=en
E,v H Iongproj ?
FE'=FEofE FE'(tongproj)=en v#en
E,v H Longproj ? FAIL
FE' = FE of E			FE'(tongproj) = (pate:p,FE")
FE", v ? patezp in Pat ? VE'/FAIL
v ? Longproj ? VE'/FAIL
FE'=FEofEFE'(Longe:con)=env=en
v H ionge:con ?
FE'=FEofEFE'(longe:con)=enven
(146)
(147)
E, v ? Longe:con ? FAIL
The first five rules give the only substantive change. The remaining rules echo these rules for
projections and exception constructors. We cannot simply use longid because variables are treated
differently. Note the use of the projection environment in the interpretation of constants. In rule *,
VE' is always empty, because pate:p contains no variables.
28
Patterns
PE' FE of E FE'(Longcon) = con
v=(con,v')E,v'Hatpat?VE1/FAIL
v F Longcon atpat ? VE1/FAIL
FE'=FEofEFE'(Longcon)=conv?fconlxVal
v H iongcon atpat ? FAIL
FE' = FE of E FE'(Longcon) = en
v=(en, v')E, v'Hatpat?VE1/FAIL
v H iongcon atpat ? VE'/FAIL
FE'=FEof EFE'(iongcon)=env??en1xVal
E,v k
FE' = FE of E
FE", v H pate:p in Pat ?
[E,vHpat=>VE/FAILl
longcon atpat ? FAIL
FE'(Longcon) = (trivpat, pate:p, FE")
VE' H trivpat in Exp ? v' E, v' ? atpat ? VE11/FAIL
v H Longcon atpat ? VE"/FAIL
FE' = FE of E FE'(tongcon) = (trivpat, pate:p, FE")
FE". v H pate:p in Pat ? FAIL
E7v F Longcon atpat ? FAIL
FE' = FE of E			FE'(Longproj) = con
v=(con, v')E, v'Hatpat?VE'/FAIL
v H Longproj atpat ? VE'/FAIL
FE' = FE of E FE'(Iongproj) = con v ? ?con1 x Val
v H iongp'oj atpat ? FAIL
FE' = FE of E FE'(iongproj) = en
v = (en,v') E,v' H atpat ? VE'/FAIL
v H Longp?j atpat ? VE'/FAIL
FE' = FE of E FE'(longproj) = en v ? fen? x Val
FE' = FE of E
FE", v H pate:p in Pat ?
v ? ?ngproj atpat ? FAIL
FE'(Longp?j) = (trivpat, pate:p, FE")
VE' F trivpat in Exp ? v' E, v' H atpat ? VE"/FAIL
v ? tongp'oj atpat ? VE"/FAIL
FE' = FE of E FE'(longproj) = (trivpat, patesp, FE")
FE" v H pate:p in Pat ? FAIL
v H iongproj atpat ? FAIL
FE' = FE of E FE'(Longe:con) = en
v = (en, v') E, v' F atpat ? VE'/FAIL
v H Longe:con atpat ? VE'/FAIL
FE'=FEof EFE'(1onge?on)=env??en?xVal
E,v
(154)
(155)
(154)
(155)
(156)
(157)
longe:con atpat ? FAIL
Rule 158, which gives the semantics of re? when it appears in patterns, must be omitted. The
first five rules above give the only other substantive changes. The remaining rules echo these rules
for projections and exception constructors. We could have just used longid (since the syntax of the
language doesn't allow variables to be applied in patterns), but chose not to for the sake of symmetry
with the atomic case. Again, note the use of the projection environment in the interpretation of
constants.
29
A.4.2 Modules
The only changes required here are some minor modifications to the way interfaces are defined and
built to ensure that structures are cut down appropriately by signatures.
Interfaces now contain an injection (value) and a projection component.
(IF, vars, projs, e:cons) or I E Int = IntEnv ? Fin(Id) x Fin(Id) x Fin(ExCon)
Inter and ? are changed to handle projection environments and the projection component of inter-
faces.
Inter(SF, VE, PE, FE) = (IF, Dom VE, DomPE, DomEF)
where IF is as before, IF = ?strid , InterE; SE(strid) = Fl.
(SF, VE, PE, FE) ? (IF, vars, e:cons, prois) = (SF', VE', PE', FE')
where SE' is as before, SF' = ?strid :?` E I I; SF(strid) = F and IF(strid) = I?, and
VE'			=			VElvars
PE'			=			PEiprois
FE'			=			FE I e:cons
with I now denoting function domain restriction.
We also provide the following semantic rules that ensure that the appropriate identifiers are
added to the interface for each specification.
Specifications
Constant Descriptions
Projection Descriptions
H valdesc ? vars
IB H val va?esc ? (f?, vars, ?, ?)
H constdesc ? cons
IB H const constdesc ? (?1 cons, cons,
H proideac ? prois
IB H proj projdesc ? (f1? ? prois ?)
C H datdesc ? cons
IB H datatype datdesc ? (?l? cons, cons,
C H e:desc ? e:cons
IB H exception e:desc ? (?1? e:cons, e:cons, e:cons)
(H constdesc ? cons?
H con(and constdesc) ? con(Ucons?
(H projdesc ? prois)
H proj (and projdesc) ? proj (Uprois)
30
[IF H spec ?_II
(176)
(177)
constdesc?conal
[H projdesc ? projs?
Datatype Descriptions
Constructor Descriptions
A.5 References
%Fdesc ? consi
Hcondesc?cons(Hdatdesc?cons')
F condesc(and datdesc) ? cons(Ucons?)
(H condesc ? cons)
H con( I condesc) ? con(Ucons)
condeac?consi
As commented at the beginning of the appendix, we have restricted our attention to a language
without references. This is actually a greater restriction than is really needed. It suffices to treat
re? as a variable rather than a constructor. The required changes to the dynamic semantics presented
earlier are quite simple. First, addresses and the special value : = must be restored to the set of
values. Additionally, two new special values are required, ref and !. Rules 114 and 115 are restored.
A new rule for is also required.
s,EHezp?s,EFate:p?a
s, E H e:p ate:p ? s(a)
The initial status of ref is v rather than c, the initial value environment, VE01, is defined by
vEo'=fid??id;id?Basval?+f:=?:=1+?ref?ref1+?!?!1.
The Definition's declaration of ! in terms of ref, which is no longer valid, is also removed.
31
