BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1396
ENTRY:: 1994-03-17
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Developing Efficient Interpreters Based on Formal Language 
        Specifications
AUTHOR:: Poetzsch-Heffter, Arnd
DATE:: October 1993
PAGES:: 16
ABSTRACT::
The paper reports on extensions to the MAX system enabling the generation and 
refinement of interpreters based on formal language specifications. In these 
specifications, static semantics is defined by an attribution mechanism that 
allows to enrich syntax trees by control flow graphs. The dynamic semantics is 
defined by evolving algebras, a framework that has been successfully used to 
specify realistic programming languages.

We apply the combined framework to a non-trivial example language and show how 
the resulting language specification can be refined in order to improve the 
efficiency of the generated interpreters. The framework provides enough 
modularity and flexibility so that such refinements can be carried out by 
local changes. We explain the implementation of the extensions to MAX and 
discuss applications of the system.
END:: CORNELLCS//TR93-1396
BODY::
Developing Efficient Interpreters Based on
Formal Language Specifications
Arnd Poetzsch-Heffter*
TR 93?1396
October 1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
* Supported by DFG grant Po 432/2-1. Author moving to Munich; thus please use the
Munich address.
Developing Efficient Interpreters Based on
Formal Language Specifications
Arnd Poetzsch--Hlleffter ? 2
Cornell University
4112 Upson Hall
Ithaca, NY 14853
poetzschfflccs.cornell.edu
Abstract
institut fiir Informatik
Technische Universita?t
D--H80290 Miinchen
poetzsd???inf6rmatik.tu-muenchen.de
The paper reports on extensions to the MAX system enabling the generation and re-
fluement of interpreters based on formal language specifications. In these specifications,
static semantics is defined by an attribution mechanism that allows to enrich syntax
trees by control flow graphs. The dynamic semantics is defined by evolving algebras, a
framework that has been successfully used to specify realistic programming languages.
We apply the combined framework to a non-trivial example language and show how
the resulting language specification can be refined in order to improve the efficiency
of the generated interpreters. The framework provides enough modularity and flexi-
bility so that such refinements can be carried out by local changes. We explain the
implementatioii of the extensions to MAX and discuss applications of the system.
1 Introduction
Motivation Certainly, many agree with Tofte that "... a realistic language definition is
a very delicate object" ([Tbf9o], p. 109). Nevertheless, different tasks and levels in lan-
guage design and implementation are til now solved and supported based on different, often
unrelated frameworks; consequently, for each task a specification has to be produced from
scratch. E.g. there are frameworks for providing very high--Hlevel, readable, and formal lan-
guage specifications (e.g. [Mos92J); and frameworks to specify compilation tasks focussing
on efficiency: attril?ute grainmars to express identification, typing, and context checking;
control flow based specifications for data fiow analyses to support optimizations; and pattern
driven code generators.
Our ideal picture to relate the different specifications is a tree the root of which is the
high--Hlevel language specification; a step to a child represents refinements or enrichments
eventually leading to implementations of specific language-dependent tools. And a child
should use as much as possible from the parent specification. The framework we present in
this paper should be understood as a step towards this picture, even though we focus here
on a specific example, the development of interpreters.
Approach In [P1193J, we showed how first--Horder recursive functions defined on occur-
rence structures can be used like attributes in an attribute grammar specification. In fact,
recursive functions on occurrence structures provide a more expressive framework than at-
tribute grammars ii' that they allow to formally specify an enrichment of the syntax tree by
non--Hlocal edges. (On the other hand, attribute grammars allow in general for more efficient
implementations; cf. relation to other work.) Here, we use a slightly extended version of oc-
currence structures to specify control flow and use evolving algebras developed by Gurevich
to specify the dynamic semantics of programming languages.
Figure 1 shows the control flow graph for the expression 7 + f(a, 0) as syntax tree
enrichment. The evolving algebra specifies how to interprete these graphs; in particular it
`Supported by DFG grant Po 432/2-1.
2moving to Munich; thus please use the Munich addresses
specifies the semantics of the graph nodes called tasks. These tasks may have parameters
(omitted in the figure); e.g. the BinOp-task has the operator Add, the arguments expressions
"Tnt" and "CallExp" and the result expression "J3inExp" as parameters (cf. section 3). The
graph nodes denoted by "0" are called program points; they are an auxiliary device to relate
the syntax tree and the tasks.
Add
0
Ident
MovePar
xpList 0
o VarExp 0 0 Tnt 0
Ident
"a"
Figure T: Control flow for expression 7 t f(a, 0)
After providing the formal background, we illustrate the framework by specilying a non
trivial example language. Tn section 4, we refine this language specification to an interpreter
specification. Section 5 explains how the ?TAX system ([PH93]) was extended to generate
interpreters from such specifications.
Relation to other Work Our work should be understood as a step towards filling the
gap between very high-4evel language specification frameworks that are optimized versus
readability and formality and specification frameworks that are designed to generate tools
that can compete in efficiency with hand--Hwriten tools. Compared e.g. to action semantics,
our framework is less complete: e.g. it does not allow to specify the basic data types of the
programming languages (but to our knowledge this is true as well for the action seman-
tics implementations; see e.g. [BM'v92]). And our specifications are not trimmed versus
readability. On the other band, our framework allows to express stepwise refinement of spec-
ifications: We can express e.g. the direct relation between a jump and its target and that
binding should be performed only once in statically scoped languages; even more important
are the refinements based on evolving algebras (see e.g. [13R92b]).
Compared to attribute grammar based system, our system supports the specification
of dynamic semantics and is more expressive as far as the formal par? of specifications is
concerned. On the other hand, if e.g. the flow graphs are encapsulated in semantic actions,
attribute grammar specifications can be understood as refining our specifications towards
efficiency. In addition to that, our work is related to systems enabling the implementation of
data flow analyses ([Wil81]) and allows to bring data flow analysis and language semantics
together (see [PH94]; for a denotational approach cf, [Ven89J).
Another very important relation is to works using programming language specifications
to define the semantics of programming logics. In particular, it is related to positional se-
mantics as defined in [CO78]. In deed, our framework can be understood as a generalization
of positional semantics in that it provides very flexible formal methods to define positions
and the transition relation. Thereby it makes the logic approach of [CO7SJ applicable to
realistic programming languages.
2 Formal Background
This section introduces the formal concepts our specifications are based on. First, we define
occurrence algebras, an extension of term algebras. Occurrence algebras enable the rich
program representations. Then, we give a short introduction to evolving algebras that are
used to specif,y the dynamic semantics. Both concepts use the notion of 4'algebras": An
algebra is given by a set U, called the universe of the algebra, and a set of functions that
take their arguments in U? and yield values in U
2.1 Occurrence Algebras
Occurrence algebras extend order--Hsorted term algebras by providing sorts for the occur-
rences of terms. They simplify and generalize3 the concept of occurrence structures as pre-
sented in [PH93]. Before the formal definition, we give an informal introduction. Consider
the abstract syntax for the expressions of our example language:
Exp			-			Int
VarExp			(			Ident
CallEip			(			Ident
ExpList			*			Exp
BinExp			(			Operator
Operator			=			Add
Add
I VarExp I CallExp I BinExp
ExpList )
Exp Exp )
i.e. an expression is in integer constant, a variable occurrence, a function call, or a binary
expression; and e.g. a call has two components: the identilier of the called function and the
actual parameter list. To define control flow graphs as shown in the introduction, we must be
able to refer to a "father" of a term. But in term algebras there is no `4father" of a term. That
is why we consider occurrences in terms. Occurrences are represented as usual by a pair (t, 1),
where t is a term and arid I a list of positive integers i1,. . ., ?? describing the path from the
root of I to the subterm occurrence. ?Ve use the notation CQ(I; iiin) and write ?(I; e)
for the root occurrence of I. E.g. G(1( CallExp(4f",ExpLsl(VarExp("x"),VarExp("x")) );
2,1,1 ) represents the first occurrence of identifier x in the abstract syntax term for f(x, x).
The father of this occurrence is the first occurrence of the corresponding variable expression:
?( CallExp( "f" ,ExpList (VarExp( `4x" ),VarExp( `4x")) ); 2,1).
It is useful to extend the sorting on terms to occurrences: An occurrence of a subterm
5 of sort S is said to be of sort SGQ?; e.g. the occurrences given above are of sort Ident? and
VarExpCQ? respectively. Using occurrence sorts, we can define new sorts. For example the
program points before and after expression occurrences as shown in figure 1 can be defined
by the productions:
ExpPoint			=			Before			After
Before			(			Exp?
After			(			ExpC )
?The appendix contains examples demonstrating the generalization and simplification.
In the following, we define what the occurrence algebra for a set ofproductions is.
Definition 2.1 Let PRODSORTS and PRIMSORTS be two disjunct sets of symbols called
production and primitive4 sort symbols. A sort symbol is a production or primitive sort
symbol followed by a possibly empty sequence of C(i?symbols, i.e. has the form S? where
5 E PRODSORTS u PRIMSORTS. The set of sort symbols is denoted by SORTS.
A production has one of the following forms: 5...... Srn) (called tuple production),
or 5 * T (called list production), or 5 = so.... JSn (called class production), where
5 E PROD5ORTS, Sj,T E SORTS, and m,n are natural numbers; 5 is called the left
hand side of the respective production.
Let II be a finite set of productions such that for each 5 E PRODSORTS there is
exactly one production with left hand side 5. For each sort in 5ORT5\PRIMSORTS
we inductively define a set of elements assuming that the sets for the primitive sorts
are given:
--H S(t.tn) is an element of sort 5 if 5 (5i . Sn) ? II and each tj is of sort S?;
--H 5(t1,. . ., tn) is of sort 5 if 5 * T E II and each tj is of sort T;
--H t is of sort 5 if 5 = So . . . JSn E II and t is of sort S? for one `i 0 < i < n;
--H ?(t; c) is of sort 5? if t is of sort 5;
--H ?(t; iiirn) is of sort 5(? if t = T(t1, .., tj1, .. In) is of sort T and
?(ti1;i2,...,?m) is of sort SCO.
Elements generated by the first two rules are called terms; elements generated by the
last two rules are called occurrences.
The universe of the occurrence algebra defined by II consists ofthe elements as defined
above, the elements ofthe primitive sorts, and the extra element nil; the functions of
the occurrence algebra defined by II are:
--H functions defined on occurrences: fath(x), lbroth(x), rbroth(x), son(i, x), the
father, left/right brother, i-th son of occurrence x; term(x), the subterm that
corresponds to occurrence r.
--H functions defined on terms: subterm(i, I), the i-th subterm of I; append(e, 1),
first(l), rest(l) with the usual meaning on list terms; for each list production
5 * T the empty list constructor S(); and for each tuple production 5 * T the
constructor S(t1,... , 1n) (For convenience, we use the san? name for the sort
(roman font) and the corresponding constructor (italic).)
--H for each sort 5 a boolean function is5(x) testing whether x is of sort 5.
--H the functions defined on the primitive sorts.
Whenever a function is applied to elements where the meaning as described5 above is
not defined, it yields the extra element nil.
4Throughout this paper we assume the primitive sorts Ident tnt and Bool
5A technical report spelling this out in more rigorous terms by aigebraic laws is iii preparation; the
appendix shows how to do that for the function lath.
2.2 Evolving Algebras
Evolving algebras are a powerful framework for specifying programming language semantics.
They have been used to specify the semantics of many different programming languages
including C ([GH92]), PROLOG ([BR92a]), and Occam ([GM89J). Here, we summarize the
definition of evolving algebras given in [Gur9l].
In an evolving algebra specification, the computation states are described by algebras.
The evolving algebra specifies how the slates are related to their successor states; i.e. it
specifies a successor relation on algebras: Given an algebra A, it describes how the functions
of A may be "updated" to get a successor algebra reflecting the intuition that in a
computation step only a small part of the state is changed.
As an introductory example let us specify the semantics of lists of assignments of the
form ?variable? (var?abie?. We assume the list functions isempty, first, and rest, and the
selector functions lhs, rhs yielding the left/right hand sides of assignments. The state infor-
mation that changes during execution is represented by the unary function VAL mapping
variables to values and the ?0-ary function" AL holding the rest of the assignment list. In
the following, we call functions that may change during execution dynamic; and dynamic
0-ary functions are called dynamic variables. Here is an evolving algebra specifying the
semantics of assignment lists:
IF			?isempty(AL)			THEN			AL := rest (AL)			Fl
IF			rnisempty(AL)			THEN			VAL(lhs (first (AL))) := VAL(rhs(first(AL)))			Fl
The evolving algebra has two rules. Rules are executed in parallel: Given an algebra A,
evaluate the guards, the right hand side expressions of the updates, and the arguments on
the left hand sides of the updates (here lhs(f Irst(AL))). Then, perform those updates
with a true guard. The following definition makes this more precise:
Definition 2.2 An evolving algebra EA is given by a finite set of rules of the form
IF guard THEN f(arg1aig?) := rhsexp Fl, where guard, aig?, and rhsexp are variable
free expressions and n is the arity of f.
o+ Let A be an algebra containing the boolean values and a function for all function
symbols occurring in EA. An algebra A' is an EA--successor of A if updating the
functions of A according to the following procedure yields A': Let UPDATES be
the set of updates of those rules the guard of which evaluate to true in A. Evaluate
all arguments and right hand side expressions occurring in UPDATES and replace
them by their values. The resulting set may contain subsets of contradicting updates,
eg. g(b1,bn) := r and g(bibn) := q with r ? q. From these subsets of
contradicting updates eliminate non--Hdeterministically all updates but one, resulting
in a set of non--contradicting updates. Change the functions of A according to this set
of updates, i.e. change the value of fimction f at point a1,..., a? to r if there is an
updatef(a1,...,a?):= r.
o+ A sequence (Ai)iElN of algebras such that Aj+1 is an EA?successor of Aj for all ? 15
called a computation of EA with initial algebra A0.
For practical purposes, the restricted syntax of evolving algebras as defined above is rather
inconvenient. Assuming that the considered algebras always contain a binary function A
yielding true exactly if both arguments are true and a unary function yielding true for the
argument false, we can introduce the following four syntactical extensions the associated
rules show how to transform the extended syntax into the restricted syntax above:
o+ unguarded assignments as rules: f(J) := rhsexp
? IF true THEN f(J) := rhsexp Fl,
o+ nested if-rules: IF guardi THEN IF guard2 THEN body Fl Fl
? IF guardi A guard2 THEN body Fl
o+ rule lists in the body of if-rules: IF guard THEN R1. . 1?n Fl
? IF guard THEN ....... IF guard THEN Rn Fl
o+ if-then-else-rules: IF guard THEN bodyI ELSE body2 Fl
? IF guard THEN bodyI Fl IF guard THEN body2 Fl
It goes without saying that any combination of these extensions is allowed.
3 Specif?ing Control Flow and Dynamic Semantics
In this section, we apply occurrence algebras and evolving algebras to specify the semantics
of an imperative example language called TOY in the following. Even though necessar-
ily small, TOY contains some features that are difficult to handle in many other language
specification frameworks, namely return statements (showing how to handle jumps or ex-
ceptions) and recursive function procedures with side effects (showing how to handle such
procedure calls in expressions). The TOY specification demonstrates how to enrich syntax
trees by control flow information and how to use this information to specify dynamic SC-
mantics. This modularization of the dynamic semantics specification into two parts may
seem complicated at first sight. However, its advantages become clear when applying it to
realistic size languages (cf. the discussion in section 5) or to the implementation of efficient
interpreters (cf. section 4). Iii addition to this, the flow information or similar enrichments
of the syntax tree can be used for data flow analysis and code generation in compilers.
The semantics specification in the following two subsections proceeds as follows: First,
we specify the state space of TOY programs. Then, we introduce a set of atomic actions
so-called tasks and give an evolving algebra seiriantics for them. Finally, we specify the
control flow of TOY in terms of program points and tasks. But before that, we have to give
the missing productions of the abstract syntax of TOY (the expression syntax was given in
the section 2):
Program
DeciList
Deci
Var
Proc
Body
ParamList
Param
*
*
*
DeciList )
Deci
Var Proc
Ident )
Ident ParamList.params
StatList )
Param
Ident )
Body )
StatList Stat
Stat WhileStat I lfStat I AssignStat I ReturnStat
WhileStat			Exp			StatList			)
lfStat			Exp			StatList			StatList )
AssignStat			ident Exp )
ReturuStat			Exp			)
As shown in the production for Proc, we allow to define selectors for tuple components;
i.e. if P is of sort Proc, the expressions "P.params?' and "subterm(2,P)" are equivalent.
3.1 The Dynamic Semantics of Tasks
This section has three parts: The first defines objects and locations; the second provides
the sort definitions for tasks; and the third gives the evolving algebra for TOY.
Objects and Locations TOY has global objects and objects that are local to a procedure
incarnation. Global objects need exactly one location to store the corresponding value; local
objects need a location for each procedure incarnation. The global objects of TOY are the
variables and integer constants6. Parameters are local objects. Are there other local objects?
As we have to deal with procedure calls in expressions, we must keep track of those parts
of expressions that are evaluated before a call. The easiest way to do this is to consider
expression occurrences as local objects (this is called a store semantics in [MS76] and reflects
the notion of temporary objects in compiler construction). Beside this, it is convenient to
consider procedures as local objects for passing their return values. These considerations
are formally expressed by the following productions:
Object			-
Globalobject			-
Localobject			-
Location			=
Automatic
Incar
I Localobject
Globalobject
Var?			I 1nt?
VarExp? I CallE?p? I BinExp? I Param? I ProcC
Globalobject I Automatic
Localobject incar )
Nat
The relation between locations and values is captured by the flinction VAL,
DYN vAL( Location L ) Int: IF Int? L: term(L) ELSE ??
where the keyword DYN indicates that VAL is a dynamic function. The body of a dynamic
function defines its values in the initial stale. The term "??" stands for an arbitrary value
of the result sort, here Int
Tasks and Control Information The atomic actions the TOY semantics is based on are
the tasks defined by the following productions. A branch task has three components: the
expression node of the corresponding condition and the two possible successors. A return
task has the returning procedure as component. A call task the expression where it is called.
A move task its source and destination etc. The precise usage of the tasks is specified in
section 3.2 in particular the successors of tasks.
Task
SinglSucc
Branch
Return
Call
Move
MovePar
Binop
Start
End
TaskStack
IncarStack
*
*
Branch			Return I Singlsucc I Start			End I Point
Move I MovePar I Binop			Call
ExpC.cond			Task.ttsucc			Task.ffsucc )
Proc?.rproc
CallExp?.cexp			Task.succ
Object? src			Object? dst
LocalObject?.src			LocalObject?.dst
Operator.op Exp?.left ExpC.right
Task
Incar
Treating constants as objects simplifies the spedfication.
Task.succ )
Task.succ )
Exp?.dst Task.succ )
The control information ill a state consists of the current task CT the control stack
CTR recording the return points of the active procedures, the current incarnation CI, the
stack of active incarnations INCAR?S, and an incarnation counter NEXTINCAR:
DYN			CT			()			Task:			Start()
DYN			CTR_5			()			TaskStack:			push( End(),
DYN			CI			()			Incar:			0
DYN			INCAR?S			()			IncarStack:			IncarStack()
DYN			NEXTINCAR()			Incar:			I
TaskStack() )
As explained above, the bodies of these dynamic variables specify their initial values.
Evolving Algebra for TOY The evolving algebra for TOY essentially contains for each
task sort one rule. The first rule specifies the semantics of a branch:
IF isBranch(CT) THEN
IF VAL(loc(CT.cond,CI)) = 0 THEN CT := CT.ffsucc ELSE CT := CT.ttsucc Fl
Fl
As in C, the value faise is represented by 0 in TOY. The function loc yields for ead? object
its current location:
FCT loc( ObjectO 0BJ, Incar IC ) Location:
IF Loca10bject? 0BJ : Autotnatic( OBJ, IC ) ELSE 0BJ
The start task initializes CT to the point before the main procedure and the parameter of
the main procedure to the input value (the specification of mproc and the declaration of the
dynamic variables PROGRAM and INPUT are given in the appendix):
IF
isStart(CT)			THEN
CT			:= Before(mproc(PROGRAN))
vAL( loc( son(I,mproc(PROGRAN) pararns)
Fl
CI ) ) := INPUT
The rules for the tasks with a single successor are rather straightforward. The function eval
evaluates a binary operator on two arguments (cf. appendix). In order to understand the
rule for the return task below, one should notice here that a call task pushes the point after
the call expression on the control stack:
IF
isSinglSucc(CT) THEN
CT := CT.succ
IF isNove(CT)			THEN
IF isMovePar(CT) THEN
IF isBin0p(CT)			THEN
Fl
IF
isCall(CT)
CTR?S
INCAR_5
CI
NEXTINCAR
Fl
Fl
VAL(loc(CT.dst,CI)) := vAL( loc(CT.src,CI) ) Fl
VAL(loc(CT.dst,CI)) := vAL( 1oc(CT.src,top(INCAR?S))) Fl
VAL(loc(CT.dst,CI))
eval( CT.op, VAL(1oc(CT.1eft?CI)), VAL(1oc(CT.right?CI)) )
THEN
push( After(CT.cezp)
push( CI? INCAR?S )
NEXTINCAR
NEXTINCAR + 1
CTR?S)
On return of a procedure: If the end task is on top of the control stack, then the return value
of the procedure is copied to the dynamic variable OUTPUT. Otherwise the return value
is copied to the expression node corresponding to the continuation point of the returning
procedure:
IF
isReturn(CT) THEN
CT := top(CTR?S)
CTR?S := pop(CTR?S)
IF top(CTR?S) = End()
THEN			OUTPUT			VAL( loc(CT.rproc,CI) )
ELSE			vAL( loc( top(CTR?S).node, top(INCAR?S)) )			vAL( loc(CT.rproc,CI) )
CI			top(INCAR?S)
INCAR?S			pop(INCAR?S)
Fl
Fl
IF
IF
isEnd(CT)			THEN			skip			Fl
isPoint(CT)			THEN			CT			succ(CT)			Fl
The spedfication of the successor function for program points (used in the last line) is the
topic of the next section.
3.2 Control Flow
As illustrated in the introduction, control flow can be understood as an enrichment of the
syntax trees. Such enrid?ments are specified in two steps: First, we specify the set of program
points; then, we specify by an attribution process how the syntax trees are enriched. To
describe control flow for TOY, we introduce program points before and after statements,
statement lists, expressions, expression lists, and procedures:
ExecNode			-
Before
After
Point
Exp? 1 ExpList? 1 StatC			StatList? 1 Proc?
ExecNode.node )
ExecNode.node )
Before 1 After
We specify the control flow as a function succ that returns for each program point the
successor task. For reasons that become clear when we discuss implementation aspects
(cf. section 5), we consider succ to be an attribute of the points, i.e. a function the values of
which are computed once for a given program and stored for use during program execution.
From a formal point of view, an attribute is just a unary function. The successor of a point
depends on the context of the node to which the point belongs. We use the pattern notation
of the MAX system to refer to the different contexts in which such a node may occur.
E.g. let P be the point after a node N and N be the node representing the condition of a
while statement; then the successor of P is the branch task that has N as first component, the
point before the body of the loop as tt?successor, and the point after the loop as ff--Hsuccessor;
this is expressed as follows (for conditional expressions in the specification language, we use
a colon to seperate conditions from then?branches in order to distinguish it from EA-rules):
IF WhileStatC<N,BD> W : Branch( N, Before(BD), After(W) )
The following specification shows the other cases for points after nodes (the cases for points
before nodes are trivial and given in the appendix). The function deci maps every identifier
to its declaration and the function proc maps each element of sort ExecNode to the enclosing
procedure (cf. appendix). The function r??xt yields the next program point in a left to right
tree traversal (cf. section 5). The most in?eresting case is a call with parameters, because
ATT succ( Point P ) Task:
LET N == P.node
IF P = After(N)
IF ProcC N
I Whi1eStatC<N?BD> W
I WhileStatC<E,N>
IfStatC<N,TB,EB>
I IfStatC<?,N??> IS
I AssignStatc<ID,N> A
Return( N )
Branch( N, Before(BD)? After(W) )
Before( E )
Branch( N? Before(TB), Before(EB) )
After( IS )
Nove( N, decl(ID), After(A) )
I			ReturnStatO<E> N			:			Nove( E? proc(N), After(proc(N)) )
I			BinExpc<O,L,N> E			:			Binop( terin(O), L? N? E, After(E))
I			CallExpC<ID,<>N> C			:			CaliC C, Before(decl(ID))
I			Ca11ExpC<ID?<PAR,*> N>			C:
Call( C, parcopy( PAR? son(1,decl(ID).params), Before(decl(ID)) ) )
ELSE next(P)
IP = Before(N)
see appendix
ELSE nil()
it demonstrates how to construct task structures that are not directly related to the syntax
tree (cf. the picture in the introduction): In the case here, we build up the sequence of
parameter moves using the function parcopy:
FCT parcopy( ExpC APAR, Param? FPAR, Task SUCC ) Task:
IF rbroth(APAR) = nil() : NovePar(APAR,FPAR,SUCC)
ELSE Novepar( APAR, PPAR, parcopy(rbroth(APAR),rbroth(FpAR),SUcc) )
4 Towards Specification of Efficient Interpreters
If we add a concrete syntax to the TOY specification given in the previous section and the
appendix ([PHE93] shows how to do that), we can execute TOY programs using the proto-
type system described in section 5. However, the stepping from program point to program
point would make these executions slow and the storage consumption would make them
almost unfeasible, because we would need storage proportional to the total number of pro-
cedure incarnations occurring in an execution (and not proportional to the maximal number
of simultaneously active incarnations). In this section, we refine the language specification
of TOY to a specification of a reasonably efficient interpreter by eliminating unnecessary
stepping and by switching to a stack organisation for the automatic storage. The point we
want to make here is that both refinements can be achieved by small and local modifica-
tions of the TOY specification and that these modifications can be carried out within the
presented framework.
Avoiding Unnecessary Stepping The stepping from program point to program point
without executing a "real" task can be avoided by using the attribute tasksucc instead of
ATT tasksucc( Point P ) Task.
IF isPoint( succ(P) ) : tasksucc(succ(P)) ELSE succ(P)
succ in the evolving algebra rule for points: tasksucc yields for each point the next task to
be executed. We could even do a little better by avoiding all program points except those
that break up loops; in TOY, these are tile points before conditions of whlle statements and
before procedures (because of recursion). To do this, we essentially have to make tasksucc
and succ mutually recursive.
Stacks for Automatic Storage This paragraph shows how the TOY specification can be
refined to implement automatic storage by a standard stack organization (cf. e.g. [AU77]).
Locations will be simply natural numbers (i.e. the dynamic function VAL can be considered
as an array). We assume the two attributes size arid addr with the following signature:
ATT			sizeC ProcC )			Nat
ATT			addr( Object? )			Nat
size(P) is the number of locations needed for an incarnation of procedure P; addr yields for
a global object its location, for a local object its relative address, i.e. the offset from the first
location in a procedure incarnation. (It should be clear that the well?known algorithms to
compute size and addr can be expressed in our framework.)
Using the dynamic variable CI as first location in a procedure incarnation (i.e. as "stack
pointer"), we need the lollowing changes to the TOY specification:
o+ replace the definition for Location by "Location = Nat" and the definition for Incar
by `4lncar = Location";
o+ change the body of the function loc to
IF Localobjecte OBJ : IC + addr(OBJ) ELSE addr(OBJ)
o+ replace the updates of CI in the rules for the start and call task by:
CI			numberofvars(PROGRAN)			//			for the start task
CI			size( decl( son(I,CT.cexp) ) )			//			for the call task
where the function numberofvars yields the number of variable declarations.
After these changes all occurrences of NEXTINCAR can be removed. (And a renaming of
Incar, etc. is advisable for muemonic reasons.)
Further Improvements Of course, further improvements are possible approximating
more and more what a compiler does. To mention onVv one7, the costly calls to loc can be
eliminated by splitting the move task into three tasks reflecting the "addressing modes":
MoveGlobToLoc, MoveLocToLoc, and MoveLocToGlob. After a replacement of all occur-
rences of the move task, the calls to loc can be replaced either by the global address or by
the relative address plus CI.
The focus of this section was to demonstrate how systematic refinements can be carried
out in the presented formal framework. It goes without saying that the illustrated method
apply to compiler construction as well.
5 Implementation Aspects and Experiences
?7e implemented a prototype on top of the MAX system that generates interpreters from
specifications based on occurrence algebras, attribute and fui'ction specifications, and evolv-
ing algebras. This section summarizes the implementation aspects of this prototype and
discusses the experiences we made so far.
5.1 Implementation Aspects
This section sketches the extensions to the MAX system and the prototype implementation
used for executing evolving algebras. lbr the following it is helpful to consider the run of a
generated interpreter as a three step process:
1. Reading: inchides parsing and encoding of the nodes; see below.
2. Attribution: includes attribute evaluation (and context checking).
3. Interpretation: execute the input program according to the evolving algebra.
7According to our performance analysis, the most profitable one.
Extensions to MAX The efficiency of our occurrence algebra implementation stems from
the fact that during the attribution and interpretation phase for a program PROG we only
have to deal with occurrences of one term or a very small number of terms, namely the syntax
tree of PROG and possibly some intermediate representations. This allows us to work with
efficient encodings8 for these occurrences (instead of using a direct implementation of the
term?natlist?pairs). Let us call this set of occurrences the nodes9 of a program. Encodings of
nodes were already explointed in the MAX implementation (see [PH93j p. 144). As program
points showed up to be helpful for many purposes (cf. section 5.2), we implemented program
points as well by encodings. Essentially, the system provides a predefined sort Predefpoint
and functions before(n), after(n) yielding the point before and after a node n as well as the
functions node(p), next(p?) yielding the node corresponding to a point p and the next point
according to a left to right tree traversal. The encoding allows us to implement before, after,
and node by an array access and next by an addition. User-defined program points are then
treated as subsorts of sort Predefpoint.
In the specifications, we distinguished functions and attributes (e.g. sncc was specified
as an attribute). As mentioned earlier, the semantics of attributes and unary functions are
the same. But the implementation is different. The relevant values of the attributes, i.e. the
attribute values for the nodes and points, are computed once and cached for later use. One
of the main advantages of encodings is that cad?ing becomes simple: As the encodings of
all elements of one sort form an integer interval, the attribute values can be stored in an
array having this interval as range. In particular, we can handle attributes on points in an
efficient way, in particular control flow information as given by suce.
Executing Evolving Algebras The implementation of evolving algebras has two as
pects: the representation of the dynan?c functions and the stepwise execution of the rules.
In our prototype, we only allow dynamic variables and unary10 dynamic functions. Dynamic
functions are implemented by arrays. If the parameter sort has an encoding, this is done
the same simple way as for the attributes. If the parameter sort is tnt or Nat, the needed
parts of the array are dynamically allocated in blocks; the mapping from indices to blocks is
recorded by a binary tree. For all other parameter sorts, we hash the term representation of
the parameters. Of course, this can drastically slow down reads and updates (and by that
the interpretation): E.g. an update to VAL in the version of section 3 can be more than
50 times slower than an update to VAL in the version of section 4 with parameters of sort
Location = Nat.
As all rules of an evolving algebra have to be executed in parallel, we divide each com
putation step in two substeps: Compute all arguments and right hand sides of updates with
true guard and then perform the updates in some order. To avoid testing the guards twice,
we keep track of the updates that have to be performed in a list of pointers to parameterless
procedures; each of these procedures performs the set of updates corresponding to one node
in the tree of conditionals that reflects the nesting of rules in the evolving algebra. As a
small optimization, we use case statements to implement branching according to sort tests.
5.2 Generation of Language--HSpecific Programming Tools
Until now, the experiences with the prototype are still rather limited: ?Ve generated some
tools for small languages like TOY and have an almost11 coniplete specification for a C subset
that includes the entire expression and statement syntax. The experiences we made so far
8Our implementation uses a four byte integer for all codings mentioned in the following.
9This definition is consistent with the informal use of `?node" in the previous sections.
10Functions with greater arity can be handled by defining a tuple sort for their parameter sort tuple.
11At the conference, this specification will be finished.
are very promissing. The control flow graphs based on tasks are not only a good technique
for closing the gap between formal language specifications and interpreter specifications, but
provide as well a flexible basis for the development of other language specific programming
tools. E.g. it showed to be fairly simple to enrich interpreter specifications to get source
level debuggers; and for them program points are a very natural way to specify the stepping.
In another application we built a (hand-coded) data flow analysis on top of control flow
graphs; and again we could reuse almost the whole static part of the language/interpreter
specification. Last but not least, some aspects of code generation become simpler in our
approach; in particular the generation of jumps and instruction labels is simplified.
Perhaps one of the most interesting observations was that a good design of the task
structure allows to seperate the semantic specification into a big static part and a comparably
small dynamic part (the evolving algebra). This is not only important for the efficiency of
interpreters but as well for proving program properties: Having the rich graph representation
for programs, the proves can concentrate on the dynamic aspects; and to prove them, small
specifications for the dynamic semantics are helpful. E.g. in [PH94], we exploited this
advantage to prove data flow analyses correct.
6 Conclusions
We proposed a combination of occurrence algebras. recursive first??rder functions, and
evolving algebras as a programming specification framework. Even though its formal foun-
dations are comparably simple, the framework is sufficiently expressive to specify all kinds of
programming languages including parallel ones (cf. [GM89]). Its flexibility makes it attrac-
tive for stepwise refinements of specifications. We attempted to demonstrate this by refining
a non--Htrivial language specification. Finally, we described a prototype implementation that
generates interpreters or other languag?dependent programming tools from specifications.
As described in the introduction, one of the motivations for this research was to fill the
gap between higliflevel language specifications and efficiency oriented specification tech-
niques and to learn how to refine the former towards the latter. Whereas we consider the
relation between high--Hlevel specifications and our framework a topic for future research,
the refinements needed to acheive more efficient language implementations are much better
understood. E.g. to develop a production quality compiler from the refined TOY speci-
fication of section 4, add optimization tecliniques, refine the set of tasks w.r.t. a suitable
intermediate representation for code generators and then adapt this refined specification to
more specialized and more efficient compikr construction tools. The refinement approach
not only helps to systematize compiler development and allows for early prototyping, but
should eventually provide the techriiques to prove realistic compilers correct.
The development of the MAX system is still in progress. Currently, we improve the
efficiency of the attribute evaluation ([Mer93]). The next envisaged extension is a fixpoint
evaluator to prototype high-level data flow analyses based on control flow graphs.
References
[AU77] A. V. Aho and J. D. Ullman. PrincjpThs of Compiler Design. Addison--HWesley,
1977.
[BMW92]
[BR92a]
[BR92b]
D. Brown, H. Moura, and D. ?Vatt. ACTRESS: an action semantics directed
compiler generator, In U. Kastens and P. Pfahler, editors, Compiler Construction,
1992. LNCS 641
E. B5rger and D. Rosenzweig. A simple mathematical model for full Prolog.
Technical Report TR--H33/92, Dipartimento di Informatica, Universita di Pisa,
1992.
E. B5rger and D. Rosenzweig. The ?VAM-definition and compiler correctness.
Technical Report TR--H14/92, dipartimento di informatica. universita di Pisa,
1992.
[CO78] R. L. Constable and NI. J. O'Donnell. A Programming Logic: With an Introduc-
tion to the PL/CV Verifler. ?Vinthrop Publishers, 1978.
[GH92]			C programming language. In
pages 274--H308,1992. LNCS
[GM89]
Y. Gurevich and J. Huggins. The semantics of the
E. B5rger et al., editor, Computer Science Logic,
702.
NT. Gurevich and L. Moss. Algebraic operational semantics and Occam. In
E. B5rger et al., editor, Computer Science Logic, pages 176--H192,1989. LNCS
440.
[Gur91] Y. Gurevich. Evolving Algebras, volume 43, pages 264--H284. EATCS Bulletin,
1991.
[Mer93] R. Merk. Generierung von MAX--HAttributauswertern. Master's thesis, Technische
Universitat Miinchen, 1993.
[Mos92] P. Mosses. Action Semantics. Cambridge University Press, 1992. "Tracts in
Theoretical Computer Science
[MS76] R. Milner and C. Stratchey. A Theory of Programming Language Semantics.
Chapman and Hall, 1976.
[PH93]
A. Poetzsch-Heffter. Programming language specification and prototyphig using
the MAX system. In J. Penjam M. Bruynooghe, editor, Programming Language
Implementation and Logic Programming, 1993. LNCS 714.
[PH94] A. Poetzsch-Heffter. Interprocedural data flow analysis based on temporal spec-
ifications. To appear, 1994.
[PHE93] A. Poetzsch-Heffter and T. Eisenbarth. The MAX system: A tutorial introduc-
tion. Technical Report TUM-19307, TU, April 1993 1993.
[Tof90] M. Tofte. Compiler Generators. Springer--HVerlag, 1990.
[Ven89]
G. A. Venkatesh. A framework for construction and evaluation of high--Hlevel
specification of program analysis techniques. AC?I Conference on Progamming
Language Design and Impfrmentation, 1989.
R. Wilhelm. Global flow analysis and optirnization in the MIiG2 compiler gener-
ating system. In 5. Muchnick and N. D. Jones, editors, Program Flow Analysis:
Theory and Applications, pages 132--H159. Prentice?Hall, 1981.
[Wil8l]
A Completing the TOY Specification
The following parts of the TOY specification describe the identification and complete the
specification given in section 3:
DeclOrParas			= Deci I Param
EnvSite			= DeclorParam I Body
DeclorParamList			* Dec1OrParas?
FCT idstr( DeclOrParanC D ) String: idtos( term(son(1,D)) )
ATT
env( EnvSiteC ES ) DeclOrparamList
IF			Programe< <ES?*> >			: append(
I			ProcC<??<ES,*>,?> FD			append(
I			ProcC<?,<>,ES> FD			env(FD)
I			ProcC<?,<*,PD>,ES>			env(PD)
I			Dec1ListC<*?DC,ES,*>			: append( ES, env(DC)
ELSE			nil()
ES, DeclorparamList()
ES, env(FD)
FCT corr?env( ExecNode N ) DeclorParamList:
IF Bodyc<N> : env(N) ELSE corr?env( fath(N)
FCT lookup( String ID, DeclOrParamList ENV ) Dec1OrParam?:
IF ENV = DeclorparamList() : nil()
I			ID = idstr(first(ENV)) :			first(ENV)
ELSE			lookup( ID, rest(ENV)
ATT
decl( IdentC IDN ) Dec1OrParam?:
IF Dec1OrParam?<IDN,*> D : D
ELSE lookup( idtos(term(IDN))
corr?env(fath(IDN)) )
ATT proc( ExecNode N ) ProcC : IF BodyC<N> B: B ELSE proc( fath(N) )
FCT mproc( Programe P ) ProcC : lookup( "main", env(son(-i,son(1,P))) )
Task:
ATT succ( Point P )
LET N == P.node
IF P = After(N)
see section 3
I P = Before(N)
IF ProcC<?,?,<SL N
Assignstate<ID,E> N
I VarExpC<ID> N
I CallExpC<ID,EL>
I BinExpC<?,LO,?>
ELSE next(P)
ELSE nil()
N
N
Before(SL)
Before(E)
Nove( deciCID),
Before(EL)
BeforeCLo)
FCT eval( Operator OP, Int L, Int R ) Int:
IF OP = Add() L + R
ELSE nil()
DYN			INPUT () Int :
DYN			OUTPUT () Int
DYN			PROGRAN() Program?:			??
N, After(N) )
B More Insight into Occurrence Algebras
Here we give two small examples illustrating that occurrence algebras simplify and generalize
occurence structures as defined in [P1193]. Consider the class production
ccc = SSS I TTTC
Occurrence structures do not define semantics for the sort CCC? and in general it can
become rather complex to give a semantics for occurrence sorts without defining them in
layers.
Whereas an occurrence structure as defined in [PH93] contains only the occurrences of
one term, occurrence algebras as defined here are much richer: They contain the occurrences
of all terms of a sorted term algebra. This enables to express the relation between two
different programs in a simpler way and allows to define its functions by algebraic laws;
e.g. the basic law for the function fath is
fath(G?(t; i1,...,?n)) = Q?(t; i1,...,in?1)
