BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1401
ENTRY:: 1994-03-17
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: AML: Attribute Grammars in ML
AUTHOR:: Efremidis, Sofoklis G.
AUTHOR:: Mughal, Kahlid A.
AUTHOR:: Reppy, John H.
DATE:: December 1993
PAGES:: 27
ABSTRACT::
Attribute grammars are a valuable tool for constructing compilers and 
building user interfaces. This paper reports on a system we are developing, 
called AML (for Attribution in ML), which is an attribute grammar 
toolkit for building such applications as language-based programming 
environments using SML. This system builds on the proven technology of 
efficient attribute evaluation, while using a higher-level foundation for the 
implementation of interactive systems. It supports a general and uniform 
platform for building applications that can manipulate attributed terms and 
allow access to attribute values. We describe the design of the AML system, 
its current implementation status, and our plans for the future.
END:: CORNELLCS//TR93-1401
BODY::
AML: Attribute Grammars in ML*
Sofoklis G. Efremidis
khalid A Mughal
John H. Reppy
TR 93-1401
December 1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
This report is available as Report No. 89, Reports in Informatics, University of
Bergen.
AML: Attribute Grammars in ML*
Sofoldis 0. Efremidis
Cornell University
Kilalid A. Mughal
University of Bergen
John II. Reppy
AT&T Bell Laboratories
December 1993
o+This reportis available as Technical Report 93-1401,DepartmentofComputer Science, Cornell University
and as Report No. 89, Reports in Infor'matics, University of Bergen.
AML: Attribute Grammars in ML
Sofoklis 6. Efremidis*
Cornell University
sofoklis?cs cornell. edu
Khalid A. Mughalf
University of Bergen
khalld?ll .uib.no
December 17, 1993
Abstract
John H. Reppy
AT&T Bell Laboratories
jhr?research. att corn
Attribute grammars are a valuable tool for constructing compilers and building user inter-
faces. This paper reports on a system we are developing, called AML (for Aunbution in ML),
which is an attribute gramnur toolkit for building such applications as language-based pro-
gramming environments using SML. This system builds on the proven technology of efficient
-ttrit?1llatichile ?iqin? s h-			.1 f?			.			- -			-
a.?w CVL????Jfl, w??			- .JgneHeve? ?anciation tor tne Implementaflon otinteractlve
systerus. It supports a general and uniform platform for building applications that can manipu-
late attributed terms and allow access to attribute values. We describe the design of the AML
it
system, -? current Implementation status, and our plans tor the ftture.
1 Introduction
Attribute warnmars provide a formalism for assigning meaning to parse trees of a context-free
language [Knu68]. Because of their syntax-directed form and declarative style, they provide a
useful notation for specifying compilers [KHZ82] and language-based editors [RT88j. This paper
reports on a system we are developing, called AML (for Attributirn? in ML), which is an attribute
granunar toolkit for building applications such as language-based editors using SML [MTH9O].
AML is a spiritual heir to the Synthesizer Generator project at Cornell University [RT88],
which focused on efficient incremental evaluation techniques, and the Pegasus project at AT&T
Bell Laboratories [RG86J, which focused on providing a high-level foundation for interactive
systems. In our system, we are building on the evaluation technology of the Synthesizer Generator,
while using a higher-level foundation for the implementation.
In the next section, we give some background about attribute grammars. Then we describe
our specification language for attribute grammars, followed by a description of the internals of our
system. Lastly, we discuss the project's status and future plans. An earlier description of this project
was presented in [EMR92].
This work was supported, in part, DARPA-ONR Grant N()()O14-9l4-4123
tsupport for this work provided, in part, by the Norwegian Research Council and the University of Bergen. The
author would like to express his sincere thanks to the Dept. of Computer Science, Cornell University, for giving him the
opportunity of spending his sabbatical (academic year 91-92) at Cornell.
Let: eo			ID el e2
Const: eo ::= NUM
Use:			?			::=			ID
Sum:			eo			::=			ej e2
ei.env			=			e().env
e2.env			=			insert(eo.env,(ID, ei.valne))
eo.value			=			e?.value
eo.val'ue			=			NUM
eo.value			=			lookup(e0.emv.ID)
ei.env			= e().env
env			=			.env
value = e1.value + e2.value
Figure 1: A simple attribute grammar
2 Attribute grammars
An attribute grammar is a context-free grammar (CFO), together with a set of attributes for each
nontermmal and a set of attribute evaluation rules for each production. An attribute is either
synthesized or inherited. For each production p : X0 : := .... . Xnp there are evaluation rules that
define the synthesized attributes of X0 and the inherited attributes of X1.... Xnp. These attributes
are known as the output attributes of p. Each evaluation rule defines the value of an output attribute
in terms of other attributes in the production. Systems based on attribute grammars tend to extend
this basic model in various ways. Two common extensions, which are supported by AML, are local
attributes and syntactic references. Local attributes are attributes that are associated with a specific
Syntactic references are references to grammar symbols in the attribute evaluation
production.
rules.
Figure 1 gives an example of a simple attribute grammar. There is a single nonterminal (e)with
two attributes (value, env), and four productions (Let, Const, Use, Sum). The symbols NUM
and ID are terminals, representing integer literals and identifier names. The attribution rules are
given to the right of the productions. This grammar computes the value of expressions involving
integer constants, addition, and a simple variable scheme. Note the use of references to the terminal
values. The environment is passed down the expression tree using the inherited env attribute, and
the resulting value is passed up via the synthesized value attribute. An expanded version of this
grammar is given as an AML specification in Appendix B.
2.1 Attribute evaluation
Each node of a parse tree in the underlying CFO is labeled with instances of the attributes associated
with the nonterminal at the node. The evaluation rules define a set of constraints on the attribute
mstances, and computing the attribute values can be viewed as a constraint solving problem. An
2
Inherited attributes
Synthesized attributes
Semantic flirictions
Figure 2: The attributed tree for the expression `?1et x = 1 in x + 2 end"
attributed tree is said to be consistent if its attribute values satisfy the constraints defined by the
attribution rules. Figure 2 gives the attributed tree for the expression:
let x = 1 in x + 2 end
using the grammar from Figure 1. The graph formed from the attribute instances and the dependen-
cies between them is called the attribute dependency graph. Most evaluation strategies require that
the dependency graph be acyclic, although there are fixed-point techniques for handling grammars
with cyclic dependencies [Far86, WJ88, Jon9O].
The simplest way to consistently attribute a tree is to topologically sort the attribute dependency
graph, and then to evaluate the semantic rules in topological order. This strategy guarantees that the
mputs to a semantic rule will be available when the rule is evaluated. For many grammars, however,
more efficient and specialized evaluation strategies can be used. Attribute grammars are classified
by their evaluation strategies; for example, the parser generator yacc implements a grammar in
which all attributes are synthesized and can be evaluated in a singie bottom-up pass.
Techniques for evaluating attribute grammars can be divided into dynamic and static [Alb9la].
Dynamic evaluators use the dependency graph of the specific tree that they are evaluating to order
their work, while static evaluators use a static ordering of dependencies between attributes of
productions in the grammar. One can also distinguish between batch and incremental evaluation. A
batch evaluator takes an unattributed tree and attributes it, whereas an incremental evaluator takes
3
Productions
Let: ?			IDe1e2 Const: ?			NUM Use: e0 ::= ID Sum: e0			ele2
EVAL eo.valne			EVAL eo.valtce
sUSP			SUSP
EVAL el.env			EVAL e2.env
VISIT e1			VISIT e?
EVAL e2.env			EVAL e?.env
VISIT e?			VISIT e?
EVAL eo.valne			EVAL eo.va[ue
SUSP			SUSP
Figure 3: Evaluation plans for the simple attribute grammar.
an already attributed tree plus a modification to the tree, and re-attributes it. Incremental evaluation
is useful in uptimiz?ig compilers, where optimizations rewrite the parse tree, and is crucial in
programming environments, where edits are represented as sub-tree replacements (see Section 2.3).
2.2 Ordered attribute grammars
One important class of attribute grammars is the class of Ordered Attribute Grammars (OAOs)
[Kas8O]. This class of grammars is interesting because it includes most useful "real-world" gram-
mars, and it is possible to generate efficient evaluators for them.
OA(3s have the characteristic that for each symbol in the grammar, there is a partial order over
the attributes of that symbol, such that in any context of the symbol, the attributes instances are
evaluable in that order. This property allows the construction offt?edLptan evaluators, which consist
of statically determined evaluation plans for each production. These plans consist of a sequence of
three kinds of instructions:
EVAL: evaluate an attribute,
VISIT?: visit a child for the ith time (i.e., transfer control to the child's plan), and
SUSP: suspend evaluation (i.e., transfer control back to the parent).
The VISIT and SUSP instructions are essentially "call" and "return." Kastens gives a number of
run-time evaluation techniques for these plans in [Kas8O].
The grammar given in Figure 1 is an OAO. The attributes of e are ordered by env ? v. The
evaluation plans for this grammar are given in Figure 3. Note that since each node is visited exacfly
once, the subscripts on the VISIT instructions have been omitted. Also note that the children of
Sum are visited from right to left; this is just an artifact of the planning algorithin, the left to right
4
order would also be correct
2.3 Subtree replacement and incremental evaluation
in his seminal thesis, Reps showed that attribute grammars provide a useful formallsm for defining
language-based editors [Rep82]. Such systems use attributed abstract syntax trees to represent
programs being edited and map editing operations to subtree replacements (e.g., a delete operation
is implemented by replacement with a null tree). Mter a subtree replacement operation, the tree will
no longer be consistent; Reps and others have described so called ?`optimal" evaluation algorithins
for restoring attribute consistency [Rep82, Yeh83, Hoo87, Hud9l]. These algorithms basically work
by changepropagation; i.e., by starting with a set ofpossibly inconsistent attributes and propagating
changes through the tree (when an attribute value is changed to become consistent, its successors
may become inconsistent). Applications of this technology include interactive systems, such as
structured editors for progranuning languages [FJM+84, RT88], interactive theorem provers [6ri87]
and incremental code generators [Mug88], as well as optimizing compilers [Alb9lb].
3 The design of AML
We have been developing a system, called AML, for supporting attribute evaluation in ML. Our
approach to supporting attribute grammars m SML is based on compiler (or generator) technology.
Our system takes a specification as input, and generates a collection of SML modules that implement
the specification and support code. These are then combined with additional grammar-independent
modules to construct a complete system. We chose this approach, rather than trying to embed AML
in the interactive SML system, because it provides more flexibility for analysis and optimization.
in addition, we have a number of design goals:
o+ Easy interaction between SML and AML. It should be straightforward to feed attributed trees
into SML code, and, likewise, to attribute the results of SML computations.
o+ Attribute grammars can be large (for example, the specification of a Pascal editor in the
Synthesizer Generator system is over 9,000 lines of SSL code), soit is important to support
modular specifications.
o+ The specification language should be a minimal extension of SML; we have tried to avoid
unnecessary new keywords, and to follow the syntactic style of SML.
o+ There are many different classes of attribute grammars, and extensions to attribute grammars,
as well as different run-time evaluation techniques. The system should be designed to support
experimentation in all of these areas.
4 The AML specification language
An AML specification consists of a collection of related declarations; in many ways, it is similar to
an SML structure definition. It has the form
5
grammar name = struct declarations end
where name is the name of the grammar being specified. There are six basic kinds of declarations
in an AML specification:
o+ terrntype declarations define terminal symbols.
o+ prodtype declarations define nonterminals and their productions.
o+ root declarations define root nonterminals.
o+ attribute declarations define attributes.
o+ attribution declarations define semantic rules.
o+ SML top-level declarations are used to define auxiliary types and functions.
In the remainder of this section, we describe the first five of these in more detail, and illustrate them
with examples. A complete syntax for ANIL specifications can be found in Appendix A.
4.1 Syntax
The syntax ofthe grammar is defined by the terrntype, prodtype,and root declarations. AML
specifications deal with abstract syntax and most terminals, such as keywords and punctuation, are
omitted from the grammar. Some terminals, however, carry semantic information, and are included
in the grammar. The terrntype declaration is used to define these terminals. For example:
termtype int = int
termtype ident = Name.name
Ingeneral,anymonomorphicSMLtypeexpressionisallowedonthenght?hand-sideofaterrntype
declaration.
Thenonterminals andproductions ofagrammarare definedusingthe prodtypedeclaration.
prodtype exp
= Use of ident
Const of int
Sum of (exp * exp)
Diff of (exp * exp)
Mutuallyrecursiveprodtypes are declaredusingthekeyword and, as inthefollowingexample:
prodtype stmt
= Block of stmt_list
I Assign of (ident * exp)
While of (exp * stmt)
and stmt_list
= StmtListNil
I StmtListcons of (stmt * stmt_list)
6
In addition to defining the syntax of the grammar, the termtype and prodtype declarations
also result in the generation of equivalent SML type and datatype dedarations.1
The other syntax declaration is the root declaration. This is used to distinguish certain
nonterminals as roots of free standing abstract syntax trees. The root declaration does not affect
other aspects of the specification, but is used in the interface to the outside world. For example, the
evaluator generator may define an evaluation function for each root.
4.2 Attributes
Once a nonterminal has been defined, attributes can be declared for it. For example:
attribute exp
with
synth value : int
inher env : (ident * int) list
end
Attributes can have any monomorphic SML type. To allow for more concise specifications, factored
declarations are allowed, which define a collection of attributes for several nonterminals:
attribute stmt, stmt_list
with
inher env : (ident * int) list
end
The attributes of a nonterminal can be defined by several different attribute declarations.
The local attributes of productions are also defined using attribute declarations. For
example:
attribute Let, Use
with
local error			string option
end
defines the local error attribute for the Let and Use productions.
The AML specification language also supports combined prodtype and attribute decla-
rations as a syntactic extension. The declaration:
prodtype nonterm with attrs end
= Prod of rhs
with local-aars end
isequivalenttothe declarations:
1le., replacetermtype withtype. andprodtypewithdatatype?
7
prodtype nonterm
= Prod of rhs
attribute nonterm with attrs end
attribute Prod with Iocal-aars end
4.3 Semantic rules
The semantic rules of a grammar are defined using attribution declarations. In its simplest
form, an attribution declaration defines names for the symbols of a production using a SML pattern,
and evaluation rules for the attributes. For example:
attribution eO : exp
= (Surn(el, e2))
with
rule
rule
rule
end
el$env = eO$env
e2$env = eO$env
eO$value = el$value + e2$value
In this declaration, the identifier e 0 is bound to the exp nonterminal on the left-hand-side of the Sum
production, and the identifiers el and e2 are bound to the right-hand-side children. The notation
nonterm$attr is used to refer to the attribute attr of the nonterminal referred to by nonterm.
As with the attribute declarations, the semantic rules of a production can be split across
several attribution declarations, and factoring is supported to reduce the size of the specifi-
cation. We use the "or-pattern" notation to support this.2 For example, the following declaration
defines the rules for passing the environment to the children of a binary operator:
attribution eO : exp
= (Surn(el, e2) Diff(el, e2)
with
rule el$env = eO$env
rule e2$env = eO$env
end
The set of symbol identifiers bound on the right-hand-side must be the same for each production
in a factored attribution rule. Furthermore, only those local attributes that are defined for all of the
productions may be referenced.
Local attributes and syntactic references are referred to by name in semantic rules. This is
illustrated in the following example:
2Or-patterns are an extension of SML supported in SML?NJ (version 0.94 and later).
8
attribution eO : exp
= (Use id)
with
rule (error, e0$value) = (case (lookup (id, e0$env))
of (SOME v) => (NONE, v)
I NONE => (SOME ??<* undeclared identifier ?>,,, 0)
(* end case *))
end
This is also an example of multiple attribution; i.e., the definition of multiple attributes in a single
semantic rule. In general, the left-hand-side of an semantic rule can be any SML pattern, as long as
the bound variables are attributes, local attributes, or syntactic references.
5 Attribute evaluation in ML
A number of issues m			?di			th?			?1i			hn? ?f			?i1
- --			? dLW?SS?U m			un???menta??			tile tree e?tor anci evaluator:
0
0
It should be possible to convert between attributed and non-attributed versions of terms.
Attribution of a non-attributed term is done by the evaluator; the converse should also be
possible.
A mechanism for associating attributes with tree nodes must be provided. It is especially
useful for this mechanism to support sparse attribution (e.g., because of copy rules). In
addition, operations for accessing and setting the values of attributes must be provided.
o+ Navigation and subtree replacement operations for attributed terms must be supported.
The Synthesizer Generator uses a heavy-weight tree-node representation that relies heavily on
mutable fields (attribute instances, parent and child pointers, and other status fields). This represen-
tation supports efficient navigation, tree editing and attribute evaluation but does not support sparse
attribution, sharing of trees and easy mapping between values computed by user code and abstract
syntax trees. Furthermore, the heavy reliance on mutable fields does not map well to ML, since it
requires ref cells, which add extra space overhead.
Our approach is light-weight: we use the datatypeequivalentofthe prodtypedeclarations
as our tree representation. We use paths in the tree to label nodes and support navigation, and
auxiliary hash tables to hold the attribute values. Subtree replacement operations can be supported
by the use of efficient incremental attribute evaluators. These features are described in more detail
below.
6 The AML compiler
We have implemented a prototype of the AML compiler in SML. Our compiler is designed in a
modular fashion to allow easy experimentation with different classes of grammars and different
9
Front-end
AML
Specification			Parser			Typechecker
Grammar
fleschption
Evaluation
Plan
?va1uatsr --			Attributes
otneteor
Evaluator
Figure 4: Organization of the AML compiler
Lines of code
Component			Grammar specific			Run-time specific			General purpose
Front-end			3,764
OAG analyzer			718
Back-end			344			1,166			386
Misc.			73
Total			1,262			1,166			4,223
Table 1: Compiler size break-down
evaluation techniques. Figure 4 gives a schematic view of the compiler. The shaded components
are specific to particular choices of evaluation strategy and run-time representation; the other
components are reusable. Table 1 gives a break-down of the number of source lines (including
comments) in each major component for our current implementation. The first column refers to
code in modules that are specific to the class of grammar (OAOs in this case); the second column
refers to code in modules that are specific to the run-time representations; and the last column refers
to code that is common across different versions. Note that almost two thirds of the code is general
purpose; thus, we predict that the effort to retarget the system to a different class of grammars, or
run-time system should be no more than a couple of weeks.
6.1 The front-end
The front-end of the AML compiler is responsible for translating an AML specification into an
abstract grammar description. The abstract grammar description is a compiled representation of
10
the grammar description. It allows subsequent phases to efficiently extract information about the
grammar. The front-end uses two passes, which are discussed in the remainder of this section.
The first pass parses the specification producing an abstract syntax tree, which contains em-
bedded SML abstract syntax for the SML code in the specification. We do not analyze the fixity
or precedence of SML identifiers, so the SML abstract syntax contains fairly detailed syntactic
information, such as parenthesization information, to allow the back-end to emit the equivalent
code.
The second pass typechecks the abstract syntax tree, producing the abstract grammar descrip-
tion. For most parts of the specification, the analysis is straightforward. It records the declared
nonterminals, terminals, production identifiers, and attributes. The tricky part is the analysis of
the semantic rules. An identifier in the SML code for a rule might be a syntactic reference, local
attribute or an SML constructor or variable identifier. This requires keeping track of the binding and
scoping of SML identifiers while analyzing a semantic rule. During the analysis, the SML abstract
syntax tree is rewritten, by attribute references and syntactic references being replaced with new
SML variables. The resulting expression is then wrapped up as a function expression. For example,
the attribution rule (taken from the Calculator example in Appendix B):
attribution e : exp
= (Use id)
with
rule (error, eO$value) = (case (lockUp (id7 eO$env))
of (SOME v) => (NONE, v)
NONE => (SOME ,,<* undeclared identifier ?>?, 0)
(* end case *))
end
is translated into the following SML expression:
fn
(5id1, aeOenv21) => let
val (1error1, a eO value20) =
case (lookup (5id1, aeOenv21))
of (SOME v) => (NONE, v)
NONE => (SOME ,<* undeclared identifier ?>,,, 0)
(* end case *)
in
(1error1, aeOvalue20)
end
The semantic rule defines two results (error, and e0$value), and has two free attribute refer-
ences (id, and eO $env). Thus, the resulting function takes a pair of arguments, and returns a pair
of results. We first bind the results and then build the pair, since the leftThand-side pattern might be
more complicated than a simple tuple.
The other complication in analyzing the semantic rules is the factoring allowed in the specifi-
cation. This means that a given semantic rule may be used in diiferent productions, with different
argument types. We represent this sharing in the grammar description by splitting the representation
11
of a semantic action into two pieces: the semantic function (which is independent of the produc-
tion), and the semantic rule associated with each production. in the resulting SML code, we bind
the function expression that implements the rule to an identifier in the beginning of the evaluator
module, and then apply that function in the evaluator steps that use the rule.
The analysis of the semantic actions also detects and identifies "copy rules," which allows
subsequent phases to employ copy-rule optimizations [Hoo86].
6.2 The grammar analyzer
The grammar analyzer taa:es the abstract grammar produced by the front-end, and generates an
evaluation stategy for the grammar. The form that the evaluation strategy takes depends on the class
of AGs being supported, and the run-time evaluation model. For example, a topological-order batch
evaluator requires no additional analysis, while an incremental OAG evaluator must determine a
static evaluation plan, as well as change-propagation information.
Our current implementation supports batch evaluation of OAGs. The grammar analyzer applies
Kasten's five-step analysis algorithm [Kas80].3 It produces a static set of plans, one per production,
for evaluating the grammar. See [Kas80] or Chapter 12 of [RT88] for more information about OAG
evaluation.
6.3 The back-end
The back-end is responsible for generating the evaluator code and various support modules. in our
current implementation, this output consists of four SML modules:
o+ The user module, which consists of the SML equivalents of the prodtype and terrntype
declarations, as well as any SML declarations in the AML specification.
o+ The tree module, which supports a tree machine interface to the abstract syntax trees defined
in the specification.
o+ The attributes module, which supports the storage of attribute values.
o+ The evaluator module, which implements the tree-walk evaluator.
These are discussed in more detail in Section 7.
The back-end consists of a general-purpose pretty-printer for SML code, a translator for gen-
erating the support modules from the abstract grammar description; and a translator for generating
the evaluator from the evaluation strategy and grammar description. For the most part, this code is
boiler-plate. The one exception is the evaluator generator, which attempts to optimize the evalua-
tion code by caching values. For example, fetching an attribute value might involve computing a
3The actual implementation is a mix ofKasten's algoritlim and the one presented by Reps and Teitelbaum [RT881.
12
hash-key and then doing a hash-table look-up; if the attribute is used more than once, then caching
its value can avoid the hash-table overhead. For batch evaluation, this optimization is straight-
forward, but for incremental evaluators, the analysis and resulting program structure can be quite
complicated.
7 A prototype attribution system
There are many possible design choices that will satisfy the requirements discussed in the previous
section. In this section, we describe the straighfforward scheme for OAG evaluation that we have
implemented as a first prototype.
Compiling an AML specification results in four modules: the w?er-types module, the tree
module, the attributes module, and the evaluator module. These modules are discussed in the
sequel.
7.1 User types
The various SML definitions given in the specification, plus the definitions generated from the
prodtype and termtype declarations are collected together in the user-types module. The user
definitions are analyzed for syntactic correctness by the AML compiler, but are not typechecked.
7.2 The tree machine
Operations such as navigation and subtree replacement require a uniform interface to the abstract
syntax represented by the prodtypes and termtypes of the grammar. The back-end generates a
module, called the tree module, that provides a typesafe collection of basic operations on trees.
This structure matches the abstract signature given in Figure 5. The type tree is a union of the
nonterminal and terminal types (i.e., a datatype with one constructor per nonterminal and terminal
type). The types symb_kind and prod_kind are enumerations of the symbols (terminals and
nonterminals) and productions, respectively. The operations are defined as follows:
symbKind t returns the symbol kind of the root of the tree.
prodKind t returns the production kind of the root of the tree.
sameSymb (ti, t2) returns true, if the two trees have the same symbol at their roots.
sameProd (ti, t2) returns true, if the two trees have the same production at their roots.
isTenninal sk returns true, if the symbol kind sk is a terminal symbol.
isRoot t returns true, if the root of the tree t is the root nonterminal.
isLeaf t returns true, if the root of the tree t is a leaf.
13
signature TREE =
sig
type tree
eqtype symb_kind
eqtype prod_kind
val synibKind
val prodKind
val samesymb
val sarneProd
val isTerininal
exception Child
tree -> synib_kind
tree -> prod_kind
(tree			* tree)			--H> bool
(tree			* tree)			-> bool
syffib_kind -> bool
val isRoot : tree -> bool
val isLeaf : tree -> bool
val nChildren : tree -> int
val children : tree -> tree list
val nthchild : (tree * int) -> tree
val replaceNth : (tree * int * tree)
val nodeName : tree > string
end
Figure 5: The signature of the tree operations
-> tree
nChildren t returns the nuinher of children of the tree t.
children t returns alist of the children of the tree t.
nthChild (t, n) returns the nth child oft; it raises the exception Childif n is out of range.
replaceNth (t, n, s) replaces the nth child of t with s; it raises the exception Childifn is out of
range.
nodeName t returns a string representation of the t's root; this is mainly provided for debugging
purposes.
7.3 Storing attribute values
The most cornmon place to store attributes is in the abstract syntax tree nodes. This approach has the
advantage that the attributes are immediately accessible, and is used by the Synthesizer Generator.
The main disadvantage of this approach is that it makes the translation between attributed and
unattributed values difficult. Unattributed values may have sharing, which must be broken before
attribution [TC9O], and attributed terms may have a different representation than their unattributed
counterparts. We have chosen a different approach, which is to store the attributes in an auxiliary
structure. We use paths in the tree to define unique names for the nodes (Hood describes a similar
14
approach in [Hoo85]). This has the advantage that our abstract syntax trees are represented as plain
SML datatypes, which means that mapping between attributed and unattributed terms is trivial.
our current unpiementation, we use a Ilasn table as the auxiliary attribute stora? mechanism.
For hash keys, we use a function of the path from the root to the node. The hash keys are represented
as a pair of the integer hash value and the path from the root to the node (represented as a list of
integers). From the hash key of a node, we can compute the hash key of one of its children in
constant time. Thus, as we walk the tree, we can incrementally maintain the key of the current tree
node, which we can use to access the current node's attributes. The values in the hash table are
members of a tagged union, which is generated from the specification. For example, the grammar
given in Appendix B produces the following datatype declarations:
datatype attr_types
ATTR_exp of
value : int attr,
env : (string * int) list attr,
local attrs : locals_of?exp ref
ATTR_calc of ( value : int attr
and
locals_of?exp
LOCAL_Quot of ( error : string option attr
LOCAL_Use of ( error : string option attr
LOCALS_exp_VOID
where the attrtype constructor is defined as:
type `a attr = `a option ref
Each nonterminal (exp and cal c in this example) has a constructor of a record of its attributes.
The localattrs field is for those productions that have local attributes (Quote and Use in
this example).
For each attribute, there are get and put functions generated, which take an attribute table and
a hashed path as arguments. Since an attribute record consists of references to attribute values,
functions to fetch the reference for a particular attribute of a nonterminal are also provided. This can
be useful in reducing the number of table lookups; for example, in incremental evaluation, where
the old and new attribute values need to be compared before storing the new value. Figure 6 gives
the signature of these functions for the Calculator example in Appendix B. The types attr_tbl
and key are the attribute table and hash keys, respectively.
The first tinie an attribute of a production is stored in the table, it is necessary to allocate an
attribute-value record for the production. The AML compiler generates default records for every
nonterminal and every production that has locals. These default records are passed to the generic
lookup routine as part of the iniplementation of the attrRef_nt_attrfunctions.
15
val
val
val
val
val
val
val
val
val
val
val
val
val
val
val
attrRef_calo_value : (attr_tbl * key) -> int attr
attrRef_exp_env : (attr_tbl * key) -> (string * int) list attr
attrRef_exp_value : (attr_tbl * key) -> int attr
attrRef_Quot_error : (attr_tbl * key) -> string option attr
attrRef_Use_error (attr_tbl * key) -> string option attr
get_calc_value : (attr_tbl * key) -> int
get_exp_env : (attr_tbl * key) -> (string * int) list
get_exp_value : (attr_tbl * key) -> int
getLocal_Quot_error : (attr_tbl * key) -> string option
getLocal_Use_error : (attr_tbl * key) -> string option
put_calo_value : (attr_tbl * key * int) > unit
put_exp_env : (attr_tbl * key * (string * int) list) -> unit
put_exp_value : (attr_tbl * key * int) -> unit
putLocal_Quot_error : (attr_tbl * key * string option) -> unit
putLocal_Use_error : (attr_tbl key * string option) -> unit
Figure 6: The attribute functions for the Calculator example
7.4 The treewalk evaluator
The treewalk evaluator is implemented as a collection of mutually recursive visit functions. For
each visit i down to a non4erminal X, there is a function visit_X_i. These functions take three
argunients: the tree node being visited, the attribute table, and the hash key for the node being
visited. A visit function consists of a pattern match on the productions of the nonterminal.
In general, the evaluation ofan attribute involves first fetching the arguments to the semantic rule
from the attribute table, then computing the function, and finally storing the result in the attribute
table. Fetching and storing the attributes of a node requires the hash key for the node, which must
be computed for the child nodes. Likewise, visiting a child requires computmg its hash key. As an
optimization, the evaluator generator keeps track of already computed values, such as hash keys,
so that it can avoid computing the same thing twice. Figure 7 gives the visit function code for the
Sum production in the Calculator example in Appendix B. Note that the visits are numbered from
o in the generated code. The function r_0004 is the generated code for the semantic rule, and the
function down computes a child's hash key from its parent's hash key.
7.5 Incremental evaluation
We also have an incremental evaluator for OAGs. This evaluator is based on the algorithm described
in Chapter 12 of [RT88j. Unlike the batch evaluator described above, the incremental evaluator is
not compiled. Rather, it is a generic interpreter that executes plans represented as actions, where an
action is defined as:
16
fun visit_exp_0 (tbl, lhs_key, Sum(cl, c2)) =
(* Sum : eo ::= ej e2 *)
(* EVAL e2.env *)
val lhs?env = get_exp_env(tbl, lhs_key)
val (c2_env) = (lhs_env)
val c2_key = dawn (lhs_key, 2)
val = put_exp_env(tbl, c2?key, c2_env)
(* visit1 e? *)
val = visit exp_0 (tbl, c2_key, c2)
(* EVAL e1.env *)
val (cl_env) = (lhs_env)
val cl_key = dawn (lhs_key, l)
val = put_exp_env(tbl, cl?key, cl_env)
(* ViSit1 e1 *)			_
val = visit exp_0 (tbl, cl_key, cl)
(* EVAL eo.v *)
val cl_value = get_exp_value(tbl, cl_key)
val c2_value = get_exp_value(tbl, c2_key)
val (lhs_value) = r_0004 (cl_value, c2_value)
val = put_exp_value(tbl, lhs_key, lhs_value)
(*			*)
in () end
I...
Figure 7: Visit function code for Sum production
datatype action
= Eval of (attr_tbl
Visit of (int * int)
I Suspend of int
Endofplan
* path * key) -> key list
The AML back-end generates the plans in this form, which are then passed to the interpreter by
functor application.
As described in Section 2.3, we model editing as subtree replacement. After a consistent tree
has been edited (i.e., a subtree in it has been replaced with a new subtree), the evaluator is called
with a path to the root of the new subtree as an argument. The evaluator maintains a set of active
productions, called the REACTIVATE set. Initially, the root of the new subtree and its parent are the
only members of REACTIVATE. The Eva 1 instructions in the plan contain an attribute evaluation
function. This function checks to see if the evaluated attribute has changed, and, if so, returns a
list of hash keys of productions that should be added to the REACTIVATE set. The Visit and
Suspend instructions check to see if the node to be visited is in the REACTIVATE set; if not, they
do not move the locus of control. Thus, the number of attributes re-evaluated is O(IAFFEcTEDI),
where AFFECTED is the set of attributes that change value.
Each plan is terminated with an EndOfP ian instruction (following the last Suspend instruc-
tion). When one of these is encountered, the evaluator stops.
17
8 Current status and future work
We have implemented a prototype of our system, consisting of a front-end, OAO plan generator,
and back-end for the runtime model described in Section 7. We have tested this prototype on
several simple grammars, including a typechecker for a subset of C, and the Hoare-logic proof
checker described in [RA84]. We have also experimented with connecting the generated evaluator
with an incremental pretty-printer being developed for eXene [(3R91, GR93], resulting in a simple
structured editor. While the resulting editor is far from being a useful tool, it is a good demonstration
of the technology.
The work reported in this paper is a snapshot of an ongoing research effort. While we now have
the basic infrastructure in place, there many issues that we intend to explore in the future:
o+ Design and implement support for modular specifications. We feel that for AML to be a
practical tool, modular specifications must be supported. Attribute grammars as such do not
provide any means for modularization. As pointed out in [Kas9l], an effective strategy is
to allow a module for an AG to contain the attribution of one semantic aspect only. Several
approaches have been advocated to modularize AGs [DC88, FMY92, Far92]. While these
approaches address the problems of partitioning a grammar specification, they do not provide
the concept of a module. We hope that this can be done in a way that builds on the SML
module system and interacts well with the SML/NJ separate compilation mechanism.
Lists of items are a common structure in most grammars. Currently, each different kind of
list requires a different prodtype declaration, and attribution rules. We would like to add
a general-purpose mechanism for defining the syntax of lists, and higher4evel syntax for
defining the attribution of lists. Such a mechanism might build on the polymorphic lists of
SML.
o+ Lists are one exarnple of polymorphic structure, but there may be others. It may be useful
to have a general-purpose mechanism for parameterizing prodtypes. One possibility is
polymorphic prodtype declarations, but it is not clear how attribution rules would work.
Prodtypes are interpreted types (i.e., they have associated semantics), thus a mechanism based
on functors is probably the right way to support parameterization.
o+ Our front-end supports the syntax of higher-order attribute grammars [VSK89, TC90]. We
plan to implement support for these in our OAO analyzer and back-end.
o+ Our compiler is designed for experimentation. We plan to use it to explore other evaluation
strategies, including demand-driven evaluation [Hud9l] and parallel attribute evaluation
[Jou9l, KG92].
o+ Our current technique for storing attributes is simple, but not very space efficient. Also,
removing subtrees requires expensive hash table operations, since the attributes of the subtree
18
must be removed from the table. We have some ideas for other storage schemes for attributes
that we would like to experiment with. We would also like to examme techniques for
supportmg sparse attribution; i.e. only caching a subset of the attributes in the incremental
evaluator. The most obvious example of this is so-called copy rules (i.e., semantic functions
that are the identity).
o+ We view language-based editors as the most important application area for the AML technol-
ogy. We have already experimented with hooking an evaluator to an incremental pretty-printer
based on eXene, and we plan to continue this work by exploring editing issues (text vs. struc-
ture), and automating the generation of pretty-printers from higher-level specifications.
o+ ffwe generalize the patterns allowed in attribution rules, we would have something very close
to Farnum's attribute patterns [Far92]. We plan to explore this similarity.
o+ The implementation of real applications using AML; in particular, a programming environ-
ment for SML.
.9 Acknowledgements
We would like to thank Tim Teitelbaum, David Gries, Sanjiva Prasad, Chet Murthy and Aswin van
den Berg for stimulating discussions pertaining to this project. Lars S?aas implemented much of
the AML compiler back-end.
19
References
[AIb9la]
[AIb9lb]
Albias, H. Attribute evaluation methods. In H. Alblas and B. Melichar (eds.), Attribute
Grammars, Applications and Systems, pp. 48-113. Springer-Verlag, 1991. LNCS rir.
545.
Albias, H. Incremental attribute evaluation. In H. Albias and B. Melichar (eds.),
Attribute Grammars, Applications and Systems, pp. 215-233. Springer-Verlag, 1991.
LNCS rir. 545.
[DC88] Dueck, G. D. and 0. V. Cormack. Modular attribute granunars. Technical Report
CS-88-19, Faculty of Mathematics, University of Waterloo, Canada., May 1988.
[EMR92] Efremidis, 5. 0., K. A. Mughal, and J. H. Reppy. Attribute grammars in ML. In
Proceedings qf the ACM SiGPLAN'92 Workshop on ML and its Applications, June
1992, pp. 194-200.
[Far86j
Farrow, R. Automatic generation of fixed-point-findmg evaluators for circular, but
well-defined, attribute grammars. In Proceedings of the SiGPLAN'86 Symposium on
Compiler Construction, June 1986, pp. 85-98.
o+ [Far92] Farnum, C. Pattern-based tree attribution. In Conference Record ofthe 19th AnnualACM
Symposium on Principles ofProgramming Languages, January 1992, pp. 211-222.
[FJM+84] Fischer, C., 0. Johnson, J. Mauney, A. Pal, and D. Stock. The POE language-based editor
project. In ACM SIGSOFT/SiGPLA? Symposium on Practical Software Development
Environments, April 1984, pp. 21-29.
[FMY92j Farrow, R., T. J. Marlow, and D. M. Yellin. Composable attribute grammars: Support
for modularity in translator design and implementation. In Conference Record of the
19th Annual ACM Symposium on Principles ofProgramming Languages, January 1992,
pp. 223-234.
[GR91] Gansner, E. R. and J. H. Reppy. eXene. In Third international Workshop on Standard
ML, Carnegie Mellon University, September 1991.
[GR93] Gansner, E. R. and J. H. Reppy. A Multi-threaded Higher-order User interface Toolkit,
vol. 1 of Software Trends, pp. 61-80. John ?iley & Sons, 1993.
[0ri87] Griffin, T. 0. An environment for formal systems. Technical Report 87-846, Department
of Computer Science, Cornell University, June 1987.
[Hoo85] Hood, R. Effit;l?1L au?trat;tions for th? unplernentation (
[Hoo86]
)f structured editors. In Pro-
ceedings of the ACM SiGPLAN'85 Symposium on Language issues in Programming
Environments, June 1985, pp. 171-178.
Hoover, R. Dynamically bypassing copy rule chains in attribute granunars. In Con-
ference Record of the 13th Annual ACM Symposium on Principles of Programming
La?guages, January 1986, pp. 14-25.
Hoover, R. incremental Graph Evaluation. Ph.D. dissertation, Department of Computer
Science, Cornell University, Ithaca, New York 14853, May 1987. Also Tech Report
87-836.
20
[Hoo87]
[Hud91] Hudson, 5. Incremental attribute evaluation: A flexible algoritbin for lazy update. ACM
Transactions on Programming Languages and Systems, 13(3), 1991, pp. 315-341.
[Jon90] Jones, L. G. Efficient evaluation of circular attribute grammars. ACM Transactions on
Programming Languages and Systems, 12(3), July 1990, pp. 429-462.
[Jou91]			In H. AIblas and B. Melichar
pp. 234-255. Springer-Verlag,
Jourdan, M. A survey of parallel evaluation methods.
(eds.), Attribute Grammars, Applications and Systems,
1991. LNCSnr.545.
[Kas80] Kastens, U. Ordered attribute grammars. ACTA Informatica, 13(3), 1980, pp. 229-256.
[Kas91j
Kastens, U. Attribute grammars as a specification method. In Attribute Grammars,
Applications andSystems, pp. 16-47. Springer-Verlag, 1991. Lecture Notes in Computer
Science 545.
[KG92] Klaiber, A. and M. Ookhale. Parallel evaluation of attribute grammars. Transactions on
Parallel and Distributed Systems, 3(2), March 1992, pp. 206-220.
[KHZ82] Kastens, U., B. Hutt, and E. Zimmermann. GAG: A Practical Compiler Generator, vol.
141 of Lecture Notes in Computer Science. Springer-Verlag, 1982.
[Knu68] Knuth, D. E. Semantics of context-free languages. Mathematical Systems Theory, 2,
1968, pp. 127-145.
[MTH90] Milner, R., M. Tofte, and R. Harpen The Definition of Standard ML. The MIT Press,
Cambridge, Mass, 1990.
[Mug88] Mughal, K. A. Generation ofRuntime Facilitiesfor Program Editors. Ph.D. dissertation,
University of Bergen, Norway, 1988.
[RA84]
[Rep82]
[RG86j
Reps, T. and B. Alpem. Interactive proof checking. In Conference Record of the 11th
Annual ACM Symposium on Principles ofProgramming Languages, January 1984, pp.
36-45.
Reps, T. W. Generating Language-Based Environments. Ph.D. dissertation, Department
ofComputerScience, Cornell University, Ithaca, NX 1982. Published by The MIT Press,
1984.
Reppy, J. H. and E. R. Oansner. A foundation for programming environments. In
Proceedings qf the ACM SJGSOFTISIGPLAN Software Engineering Symposium on
Practical Software Development Environments, December 1986, pp. 218-227.
[RT88] Reps, T. and T. Teitelbaum. The Synthesizer Generator: A System for Constructing
Language-based Editors. Springer-Verlag, New York, NY; 1988.
[TC90]
Teitelbaum, T. and R. Chapman. Higher-order attribute grammars and editing environ-
ments. In Proceedings oftheACMS1GPLAN'90 Conference on Programming Language
Design and Implementation, June 1990, pp. 197-208.
Vogt, H., 5. Swierstra, and M. Kuipen Higher order attribute grammars. In Proceed-
ings of the ACM S1GPLAN'89 Conference on Programming Language Design and
Implementation, June 1989, pp. 131-145.
21
[VSK89]
[WJ88]
Walz, J. A. and 0. F. Johnson. Incremental evaluation for a general class of circular
attribute grammars. In Proceedings of the SIGPLAN'88 Conference on Programming
Language Design and Implementation, June 1988, pp. 209-221.
[Yeh83] Yeh, D. On incremental evaluation of ordered attributed grammars. BIT, 23, 1983, pp.
308-320.
22
A The collected syntax of AML
The syntax of AML is an extension of the SML syntax (see Chapter 2 and Appendix B of the
Definition [MTH9O]). In presenting the grammar, we use an extended-BNF style with the following
conventions:
o+ grammar nonterminals are given in italic font; e.g., Grammar.
o+ terminal symbols are written in typewriter font; e.g., val,
o+ optional items are enclosed in angle brackets; e.g., (;?.
o+ a sequence of zero or more items is denoted by braces; e.g., 1Dec1
A.1 Reserved words
In addition to the those of SML, AML has the following reserved words, which may not be used as
identifiers:
attribute attribution grammar prodtype root rule
termtype $
A.2 Identifiers
AML requires a number of new identifier classes; all of which are restricted to be aiphanumeric
identiflers. Some of these classes overlap with the SML identifier classes. The identifier classes are
as follows (overlaps with SML identifier classes are given in parentheses):
Gramid Grammar name
Nontermid Nonterminals
Termid Terminals
Prodid Production constructors
Attrid Attribute names
Localld Local attribute names
Symbid Symbols (nonterminal and terminal)
(TyCon)
(TyCon)
(Con)
(Var)
(Var)
In addition, there is the class Attrinstance of attribute instances, which are written
Symbid $ Attrid
These can appear in patterns and expressions, as described in the next section.
23
A.3 SML nojiterminals
As mentioned above, the AML grammar is an extension of the SML grammar. We use a small
collection of nonterrriinals to refer to parts of the SML grammar. These are summarized in the
following table:
AMLnonterminalSMLnonterminalDescnpflon
SMLVec			dec
SMLType			ty
Pat			pat
Exp			exp
?ML declarations
Monomorphic SML types
Extended SML patterns
Extended SML expressions
The extended patterns and expressions allow attribute instances (Attrliistance), symbols (Symbid),
and local attributes (Locaild), wherever variables (var) are allowed.
A.4 Grammar
Grammar
Dec
grammarGramld= structDeci(;? Decl end (;)
rootNontermid
prodtypePrnffvDec ?andPrndTyDecl
termtypeTermTyDec ?andThrmTyDec?
attributeAttributeDec fandAttributeDec?
attributionSemanticRules ?andSemanticRulesl
SMLDec
PrnJFyDec			::--H			Nontermid (WithAttrs) = ProdDec f PrndDec?
ProdDec			:--H			Prodid (ofPrndRHS? (WithLocaIs?
ProdRHS			::--H			(Nontermid ? * NontermJd?
I			Nontermid f * Nontermldf
TermTyDec			..			Termid = SMLType
AttributeDec			::--H			Nontermid f, NontermId? WithAttrs
I			Prodid ?, Prodldj WithLocals
WithAttrs ::--H withAttribute f(; Attributel end
Attribute			::--H			synthAttrid: SMLType ?andAttrid : SMLTypeJ
I			inherAttrld: SMLThpe fandAttrid: SMLThpeY
WithLocals			withLocalAttribute f(; ? LocaMaributel end
LocalAttribute			::=			localLocaild : SMLType ?andLocalld : SMLTypeJ
24
SemanticRules Symbid : Nontermid = Rules ? I Rules1
Rules ProdPat withRule f(;? Rule1 end
rule Pat = Exp tandPat = ExpJ
Rule
ProdPat
(ProdPat ? I PrndPati
I Prodid AtomicProdPat
I AtomicProdPat
I Symbid
I Prodid
I (ProdPat ?, ProdPat1
25
AtomicProdPat
B An example grammar
(* Calc.aml
* A simple calculator example
grammar Calc
struct
termtype int = int
termtype ident = string
prodtype exp
--H NullExp
I Sum of (exp * exp)
Diff of (exp * exp)
Prod of (exp * exp)
Quot of (exp * exp)
Const of int
Let of (ident
Use of ident
prodtype calc
- NullCalc
Top of exp
root caic
* exp * exp)
attribute caic, exp with
synth value : int
end
attribute exp with
inher env : (string * int) list
end
and Quot, Use with
local error : string option
end
(* look up an identifier in an environment *)
fun lookup (id, []) = NONE
I lookup (id, (x, v)::r) =
if (id = x) then (SOME v) else lookup(id, r)
26
attribution cO			caic
--H NuilCaic
with rule cO$value = 0 end
Top(el)
with
rule cO$value = el$value
rule el$env =
end
attribution eO : exp
- NullExp
with rule eO$value
I (Sum(el, e2)
with rule eO$value
(Diff(el, e2))
with rule eO$value
(Prod(ei, e2))
with rule eO$value
I (Quot(el, e2))
with
rule
= 0 end
= ei$value + e2$value end
= el$value - e2$value end
= el$value * e2$value end
(error, eO$value) = if (e2$value = 0)
then (SOME wv<* division by zero *>vv, e1?value)
else (NONE, ei$value div e2$value)
end
(Const n)
with rule eO$value
(Let (id, el, e2))
with rule eO$value
(Use id)
with
rule
= n end
= e2$value end
(error, eO$value) = (case (lookup (id, eO$env))
of (SOME v) => (NONE, v)
NONE => (SOME ?V<* undeclared identifier *>??, 0)
(* end case *))
end
attribution eC : exp
--H (Sum(el, e2) I Diff(el, e2) Prod(el, e2) I Quot(el, e2))
with
rule el$env = eC$env
rule e2$env = eO$env
end
I (Let (id, el, e2))
with
rule el$env = eO$env
rule e2$env = (id, ei$value) eC$env
end
27
end (* Caic *)
