BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1363
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Principled Optimization of Functional Programs
AUTHOR:: Webber, Adam Brooks
DATE:: June 1993
PAGES:: 221
COPYRIGHT:: Adam Brooks Webber 1993 - All Rights Reserved
ABSTRACT::
Automatic optimizers for computer programs work with a fixed list of rote 
transformations, while human programmers can go on to derive new optimizations 
from broad and intuitive principles if known transformations prove inadequate. 
This dissertation investigates the possibility that principled optimization 
can be automated, focusing on a single principle (the principle that programs 
should not do anything unnecessary) and a single program domain (the domain of 
purely functional, first-order programs). Three questions are explored: How 
can the principle be formalized? How can violations of the principle be 
detected? How can violations be repaired?

The trace grammar is a new representation for first-order functional 
programs. It permits a simple formal statement of the optimizing principle. A 
trace grammar is a kind of graph grammar. An individual graph represents a 
path of execution through the program, without any loops or conditionals. The 
grammar for a program generates a language of graphs representing the possible 
paths of execution through that program. Trace grammars provide unique 
leverage for the twin problems of identifying and repairing violations of the 
principle.

Detecting violations of the principle is a problem in semantic analysis. A 
new method of inference, the relational constraint method, helps make 
this tractable. The method treats a trace graph as a system of constraints on 
a lattice of binary relations, and uses those constraints to develop the 
strongest relation it can find for each pair of values in the graph.

Repairing violations is not easy: given an example of an unnecessary 
computation performed by the program, one wants to modify the program so that 
it never makes that mistake. This grammar thinning problem for trace 
grammars corresponds to an interesting open problem on context-free grammars. 
An (approximate) solution to this CFG problem yields an optimization technique 
for trace grammars.

An optimizer called Thinner is the proof-of-concept for these ideas. Using the 
techniques outlined above, Thinner rediscovers a variety of common compiler 
optimizations. It also finds other, more exotic transformations, including the 
well-known Fibonacci reformulation. Thinner demonstrates the potential of the 
principled approach as a high-powered optimizing tool.
END:: CORNELLCS//TR93-1363
BODY::
Principled Optimization of
Functional Programs
Adam Brooks Webber
Ph?D Thesis
93-1363
June1993
Department of Computer Science
Cornell University
Ithaca NY 14853-7501
PR?INCIPLED OPTIMIZATION
OF'
FUNCTIONAL P?OGR?AMS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Adam Brooks Webber
August 1993
Qc Adam Brooks Webber 1993
ALL RIGHTS RESERVED
PRINCIPLED OPTIMIZATION
OF
FUNCTIONAL PROGRAMS
Adam Brooks Webber, Ph.D.
Cornell University 1993
Automatic optimizers for computer programs work with a fixed list of rote trans-
formations, while human programmers can go on to derive new optimizations from
broad and intuitive principles if known transformations prove inadequate. This
dissertation investigates the possibility that principled optimization can be auto-
mated, focusing 011 a single principle (the principle that programs should not do
anything unnecessary) and a single program domain (the domain of purely func-
tional, first-order programs). Three questions are explored: How can the principle
be formalized? How can violations of the principle be detected? How can violations
be repaired?
The trace grammar is a new representation for first-order functional programs.
It permits a simple formal statement of the optimizing principle. A trace grammar
is a kind of graph grammar. An individual graph represents a path of execu-
tion through the program, without any loops or conditionals. The grammar for a
program generates a language of graphs representing the possible paths of execu-
tion through that program. Trace grammars provide unique leverage for the twin
problems of identifying and repairing violations of the principle.
Detecting violations of the principle is a problem in semantic analysis. A new
method of inference, the relational constraint method, helps make this tractable.
The method treats a trace graph as a system of constraints on a lattice of binary
relations, and uses those constraints to develop the strongest relation it can find
for each pair of values in the graph.
Repairing violations is not easy: given an example of an unnecessary compu-
tation performed by the program, one wants to modify the program so that it
never makes that mistake. This grammar thinning problem for trace grammars
corresponds to an interesting open problem on context-free grammars. An (ap-
proximate) solution to this CFG problem yields an optimization technique for
trace grammars.
An optimizer called Thinner is the proof-of-concept for these ideas. Using
the techniques outlined above, Thinner rediscovers a variety of common compiler
optimizations. It also finds other, more exotic transformations, including the well-
known Fibonacci reformulation. Thinner demonstrates the potential of the princi-
pled approach as a high-powered optimizing tool.
Biographical Sketch
Adam Brooks Webber spent his youth in Princeton, Illinois, where he attended
Princeton High School and met his wife, Kelly. He went to Dartmouth College
in 1980 intending to study drama, but it didn't turn out that way. He worked
instead on the Dartmouth College Timesharing System as a systems programmer,
and graduated in 1984 with a B.A. in mathematics.
He worked as a software engineer at True BASIC Inc. in Hanover, New Hamp-
shire for four years. Convinced that he had been a more interesting person when in
school, he entered the graduate program in Computer Science at Cornell University
in 1988. He received the M.S. in 1991 and the Ph.D. in 1993.
iii
dedicated to
Robert Aifted Webber
1895-1989
iv
Acknowledgements
I would like to thank my advisor, Devika Subramanian. Her encouragement and
faith have been unfailing. She suggested the problem solved in this thesis, and
she taught me respect for Al. I would also like to thank my other thesis advisors,
Miriam Leeser and Keshav Pingali.
The research reported here was supported in part by NSF grant IRI-8902721.
Thanks also to General Electric for the fellowship that supported me in my first
year.
I was sustained in this work by my parents' belief that I could do it. This is a
kind of support one never outgrows.
My wife Kelly worked as a waitperson to make this thesis possible. Greater
love hath no one than this. Yet my debt to her is no greater than it was five years
ago; it was already infinite, and has been ever since she had the goodness to marry
me.
v
Table of Contents
2
Introduction
1.1			Motivation			.
1.2			Outline. .			. .			.
1.3 Thinner project in context .
1.3.1 Irrelevance reformulation
1.3.2 Hall's optimizer
1.3.3 Partial evaluation
1.3.4 Redundancy elimination
1.3.5 Advanced functional optimization
1.4			Advice to readers			.
Trace Grammars
2.1			Overview. . 						. .
2.2			Trace			graphs and the PLC . 			.
2.2.1			Trace graphs
2.2.2			Vertex labels and executions
2.2.3			Thinning trace graphs
2.2.4			The PLC for trace graphs			. 			. . .
2.2.5			Discussion
2.3			Sets of trace graphs and the PLC			. 			. .
2.3.1			Sets of trace graphs
2.3.2			The thinning relations for trace graph sets
2.3.3			The PLC for executable sets.
2.3.4			Discussion
2.4			Trace			grammars and the PLC . 						.			.			.			. .
2.4.1			Trace grammars.			.			.
2.4.2			Correspondence to CFG's . 			.			.			.
2.4.3			Thinning and trace grammars			.			.			.
2.4.4			The PLC for trace grammars			. .			. .
2.5			Applying the formal principle . . .
2.5.1			Weak conservative thinning
2.5.2 Strong conservative thinning of right-hand sides
vi
3
5
7
7
9
10
10
12
14
14
19
19
21
24
28
29
32
32
35
39
41
43
43
49
55
57
57
59
60
2.5.3 Weak conservative thinning of right-hand sides
Thinner's Language
3.1 Overview.
3.1.1 Informal
3.2 Concrete syntax
3.3 Trace graph constructors
3.4 Types and type inference
3.5 Abstract syntax
3.6 Semantic sets
3.7 Semantic schema
3.8 Standard semantics
3.9 Example
description of TL .
63
3
4
5
Finding Violations
4.1			Overview. . .
4.1.1			The semantic analyzer			.
4.2			Relational constraint
4.2.1			The vocabulary of binary			relations
4.2.2			Finding strong relations
4.2.3			The relational constraint			algorithm
4.2.4			Examples
4.2.5			Comparison with other			methods .
4.2.6			Abstract interpretation .			. .
4.2.7			Discussion . . . . .			. .
4.3			Normal forms
4.3.1			Integers . . . . .
4.3.2			Bitvectors
4.3.3			Booleans . . . . . .			. .
4.3.4			Lists
Repairing			Violations
5.1			Overview. .			. .
5.2			Fully terminal CFG thinning			. .
5.2.1			Thinning by FSM .			. .
5.3			General CFG thinning
5.4			The thinning-set method .			. . .
5.5			The method of kernels . .
5.5.1 A weakness of the thinning-set method
5.5.2 Outline of the method of kernels . .
5.5.3 Shortcomings of the method of kernels
5.6 The method of kernels for trace grammars
vii
65
65
66
67
70
73
73
74
76
77
77
79
79
80
83
83
89
100
108
113
116
120
121
122
122
123
124
126
126
128
131
134
136
140
140
143
147
149
6
7
Thinner
6.1			Overview. .			. . .			. .
6.2			Thinner fundamentals
6.2.1			Memory management
6.2.2			Doubly-linked circular lists
6.2.3			Iteration macros . .
6.3			The compiler
6.4			The decompiler . . . .
6.5			The explorer . . . . .
6.6			The analyzer . . .
6.6.1			Representing relations
6.6.2			Constructing constraint sets . .
6.6.3			Relational constraint implementation
6.7			The transformer
6.7.1			Graph and subgraph matching
6.7.2			Thinning right-hand sides . . . .
6.7.3			Replacing a group of vertices
6.7.4			Constructing a kernel set . . .
6.7.5			Cleaning up the results .
6.8			Thinner transcripts
6.8.1			Common subexpression elimination . .
6.8.2			Dead code elimination and constant			folding
6.8.3			Code sinking . . . . .
6.8.4			Loop invariant removal
6.8.5			Loop splitting
6.8.6			Loop jamming . . .
6.8.7			Fibonacci reformulation
6.9			What			Thinner can't do
Conclusion
7.1 Summary
7.2 Open problems
7.3 The benefits of principled optimization
7.3.1 The science of optimization
7.3.2 The tools of optimization
7.3.3 Practical applications .
Epilogue
Bibliography
viii
156
156
159
159
159
159
160
162
164
166
166
167
171
175
175
176
180
184
185
186
187
190
192
193
194
195
201
205
207
207
209
211
211
212
213
215
216
List of Tables
3.1 Primitive functions and their types
3.2 Extended BNF grammar for TL
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
Notation for the primitive relations in xint . .
Examples of common expressible relations on integers .
The expressible relations on booleans. . . . . .
The set Xbitv of primitive relations on bitvectors
Some common expressible relations on bitvectors
The primitive relations in xlist . .
Some common expressible relations on lists . . . . .
Operators that can be captured in a single relation . .
The constraint set for boolean conjunction: a = (and b c)
The constraint set for bitvector conjunction: a = (and b c)
Initial relations for the simple integer example . .
Final relations for the simple integer example . . .
Initial relations for the tricky list example
Final relations for the tricky list example
68
69
85
86
86
88
88
89
89
91
94
95
110
110
113
114
ix
List of Figures
1.1 The structure of the Thinner project
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
3.1
3.2
3.3
3.4
3.5
3.6
6
A comparison of three representations
Example of a simple trace graph 			. .			.			. .			. . .
Execution of a trace graph on (2,2)			. . .
A thinning.			.
A non-executable set of trace graphs			.			.
Trace graph sets ?C, H, I, J? z?G', H', I', J'?, with decision trees
A set ?G, Hi not quite in violation of the PLC
Another comparison of three representations . . . . . .
The replacement of a vertex in & by a graph D
Example of a trace grammar and generated set
The trace graph component corresponding to x ?
A trace graph constructed by Identity .
A trace graph constructed by Constant
A trace graph constructed by Select
A trace graph constructed by Single .
A trace graph constructed by Cat.
A trace graph constructed by Meld . .
4.1 Trace graph for the simple integer example
4.2 Trace graph for the tricky list example
5.1 Construction of a grammar for L6(C)
5.2 Some thinning examples classified .
5.3 Construction of a grammar for LA(G) .
5.4 A problem that defeats the thinning-set method
5.5 A trace grammar thinning problem
5.6 Productions for fib after step 1, showing group P
5.7 Productions for fib after step 2
5.8 Productions for Xp generated in step 3
5.9 Productions for Xp after step 4 .
5.10 Final grammar after step 5
x
17
21
23
25
33
40
42
44
46
48
50
70
71
71
72
72
73
109
112
133
133
141
142
152
153
153
154
154
155
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
Structural overview of Thinner
The function structure
The function for addition 			.
Graphs, vertices and objects
Top-level thinning function
Definitions for relations on integers .
Constructor for the cdr constraint set
Testing the cdr constraint set
Constraining triples of relations
Examining the relations for a graph
The representation for a thinning
Four thinning possibilities
Weighing a thinning, with obligations . .
A thinning problem that makes grouping difficult
Result of naive grouping: a cycle
xi
157
160
161
163
165
168
169
170
172
174
177
179
181
182
183
Chapter 1
Introduction
1.1 Motivation
Consider the following programs:
(defun cse (a)
(f (* a 10) (* a 10)))
(defun sink (a)
(let ((x (* a 10)))
(if (P a) x 0)))
(defun fib (a)
(if (< a 2)
a
(+ (fib (- a 1))
(fib (--H a 2)))))
What would an ordinary optimizer do with these programs? Of course it de-
pends on the optimizer, but here's a reasonable guess: common subexpression
2
elimination probably will be performed for the first program, code sinking might
be performed for the second, and the third almost certainly will not be optimized
at all. One thing that can be said for certain is that the ordinary optimizer's
repertoire, whether large or small, is fixed: giving it another hour to think about
the problem will not help.
What if we give these programs to a human programmer to optimize? Again,
it depends on the individual. Some of us have seen all three of these patterns of
optimization so often that we can perform the optimizations without much thought.
But even a less experienced pwgrammer--Hsay, someone who has only recently
learned their first programming language will be able to discover optimizations
in all three cases if given enough time.
This is a challenge to computer science. It's something that people can do
better than machines; and unlike, say, many natural language or vision or robotics
problems, it doesn't seem like the sort of thing for which evolution has prepared us.
Machines should have a home-court advantage on problems like this. In fact, me-
chanical optimizers are superior to human ones in several respects. They're faster,
they're cheaper, and they make fewer mistakes. But in their ability to discover
new optimizations when known ones prove inadequate, human programmers are
unchallenged.
An important component of this wonderful competence is the ability to derive
specific optimizations from high-level, intuitive principles. It seems impossible to
explain what's wrong with the three programs above without appealing to a par-
ticular intuitive principle. In describing this principle we use words like redundant,
wasteful, unnecessary; and we feel no need to explain why it is that redundant,
wasteful and unnecessary computations should be avoided. There is no lecture in
Computer Science 100 devoted to the subject. Even computer neophytes can be
3
expected to know enough to avoid unnecessary computations: after all, we are at
least in part computational engines ourselves, and we seem to make it through the
day without getting bogged down in pointless computation.
There may be many specific powers behind human superiority at the task of
optimization, but the ability to reason from high-level principles to individual
optimizations seems like an important one. There may be many principles of op-
timization, but the principle that programs should not perform any unnecessary
computation seems paramount. I call this principle the Principle of Least Compu-
tation or PLC.1
1.2 Outline
The capstone of this research is a novel optimizer called Thinner. Thinner optimizes
a Lisp-like language of first-order functional programs. It works by detecting and
correcting violations of the PLC. It discovers traditional compiler optimizations and
other, more exotic transformations, including the reformulation of the Fibonacci
function from exponential to linear time. Like a human programmer, it can find
more optimizations the more time you give it.
To build Thinner I needed answers to three big questions about the Principle
of Least Computation:
1. How can the principle be formalized?
2. How can violations of the principle be detected?
3. How can violations of the principle be repaired?
1Prom Feynman's Principle of Least Action [FLS64].
4
How can the principle be formalized? To answer this first question I developed a
new representation for first-order functional programs, the trace gramman A trace
grammar is a node-replacement graph grammar. The grammar corresponding to a
program generates a language of fully terminal trace graphs, each of which describes
a single path of execution through the program. The trace grammar representation
supports a simple, rigorous expression of the principle. A simple transformation of
trace graphs called a thinning yields a partial ordering on graphs: one graph can
be "thinner than" another, and a similar partial order can be defined for graph
grammars. The PLC is formalized in terms of this partial order. These ideas are
developed in Chapter 2.
More than just an interesting formalism, the trace grammar representation is
the actual intermediate form used by Thinner. Thinner compiles a program into
a trace grammar, detects and corrects violations of the principle in that grammar,
and then, if necessary, decompiles the reformulated grammar. The trace gram-
mar formalism is used in Chapter 3 to give a formal semantics for the language
recognized by Thinner.
How can violations of the principle be detected? This is, of course, an infer-
ence problem, one that the reader will not be surprised to learn is undecidable.
The construction of Thinner demanded a practical approach to this problem: a
technique for finding as many violations of the principle as possible, as quickly as
possible. For this purpose I developed a new inference technique called relational
constraint.
Relational constraint starts with a fixed, finite lattice of binary relations on
the values that occur during execution of a trace graph. It treats a trace graph as
a set of constraints on tuples of binary relations, and it applies these constraints
to find the strongest relation it can for any pair of values in the graph. It is,
5
ill effect, an 0(n2) forward-chaining inference method. It is combined with more
traditional methods of semantic analysis (like the computation of canonical forms
for expressions) in Thinner. Relational constraint is discussed in Chapter 4.
How can violations of the principle be repaired? Using the method of inference
described above Thinner can find violations of the PLC. These violations are graphs
generated in the grammar that Thinner knows how to thin. But knowing how to
transform an individual graph isn't enough. The problem is to transform the
grammar so that it never generates the violation apparent in the example. This
is non-trivial because the violation can be generated by a grammar in a variety of
ways, perhaps infinitely many.
This grammar-thinning problem has an analogue in context-free grammars on
strings, and Chapter 5 explores this case. One approximate solution to the string
case, the method of kernels, also applies to the graph case. This is the transforma-
tion method used by Thinner.
My answers to the three questions outlined above are the major contributions of
this research, and Thinner is a demonstration that the ideas and methods developed
in the following pages do work. Chapter 6 describes the construction of Thinner
and presents a selection of timed examples showing some of Thinner's strengths
and weaknesses.
The structure of the Thinner project is shown in Figure 1.1.
1.3 Thinner project in context
This section establishes the position of the Thinner project among related research
efforts.
6
Thinner
Relational			Grammar
Constraint			Thinning
(How to			(How to
detect			repair
violations)			violations)
Trace Grammar
Formalism
Intuitive Principle of Least Computation
Figure 1.1: The structure of the Thinner project
7
1.3.1 Irrelevance reformulation
The Thinner project is a descendant of Subramanian's work on irrelevance in first-
order theories [Sub89,SG87]. Subramanian characterized a formal principle cor-
responding to the intuitive idea of irrelevance. This formal principle served as
justification for the automatic reformulation of first-order theories. Subramanian 5
thesis mentions the possibility of applying of an irrelevance principle to the prob-
lem of eliminating repeated computation in a logic-programming formulation of
the Fibonacci function.
1.3.2 Hall's optimizer
The Thinner project is related to the optimizer described in Hall's thesis [Hal9O].
I had already completed an early prototype of Thinner when I first heard of Hall's
work; and I confess to some alarm when I heard his title, "Program Improvement
by Automatic Redistribution of Intermediate Results". It turns out, however,
that the two research projects, although undertaken with similar goals, moved in
completely different directions.
In discussing his optimizer, Hall appeals to an intuitive principle of optimiza-
tion similar to my PLC, though he does not derive optimizations from a formal
principle as I do. The chief transformation of Hall's optimizer is the redistribution
of intermediate values and the removal of disconnected functions in a representa-
tion similar to the Plan Calculus [Ric86,RW88], an operation very like the thinning
transformation on trace graphs discussed here. Hall's optimizer achieves consider-
able power by making use of sources of information other than the raw program: for
example, user-supplied test inputs, optimization invariants, and proofs of correct-
ness, as well as syntactic clues like iteration macros for series expressions. Another
8
source of power is the fact that optimizations are not guaranteed correct (except
with respect to the given test inputs). The chief transformation, and minor ex-
tensions like the addition of new parameters and returned values to subroutines,
are introduced with a high-level, intuitive appeal to the underlying principle. In
keeping with my goal of showing that optimization can start with an optimizing
principle, I have taken a more circumscribed approach in the Thinner project. My
only input is the raw, purely functional program, and the optimizations produced
by the Thinner are guaranteed correct. Within this circumscribed domain I show a
clear derivation of optimizations from a rigorous statement of the principle, using
a consistent graph-grammar treatment of recursive program structure.
In comparing Hall's work with the Thinner project, it is useful to re-examine
the three fundamental questions mentioned above:
o+ How can the prnnciple be formalized? Hall opted to retain an intuitive princi-
ple, and used a Plan-Calculus-like intermediate representation. This has the
advantage of fitting in well with the existing body of Plan Calculus tools:
Hall's optimizer is closer to being a practical tool than Thinner. I chose to
develop a new representation and to give a formal expression of the principle
being automated. This representation has the advantage of allowing deeper
grammatical transformations.
o+ How can violations of the principle be detected? Hall developed very power-
ful inference techniques, drawing on user-supplied test inputs, optimization
invariants, and proofs of correctness. The inference system is not guaranteed
correct, but relies on an (unspecified) external verifier. It has the advantage
of being able to use information about a program from various sources. The
techniques used for inference in Thinner are less powerful, but they have the
9
advantage of being faster, self-contained and correct.
o+ How can violations of the principle be repaired? Hall uses several transforma-
tions: the redistribution of intermediate values and removal of disconnected
functions; the elimination of unnecessary copy operators (when destructive
operators are used); the combination of series expressions [Ste9O] in "gener-
alized loop fusion"; the addition of new parameters to a function; and the
addition of new returned values from a function. Hall's system does not per-
form deeper grammatical transformations like the Fibonacci reformulation
(although he suggests ways to extend his optimizer for that particular exam-
ple). Thinner uses the method of kernels, which in many cases (including the
Fibonacci case) eliminates all instances of unnecessary computation covered
by a given example of violation.
1.3.3 Partial evaluation
Expressions whose values are constant certainly violate the PLC, so the Thinner
project is related to the considerable body of work on partial evaluation [CD93].
Thinner identifies and eliminates violations of the PLC in programs, so it is, in
part, a partial evaluator. But constancy is a special case of PLC violation, and
Thinner is a much more general tool.
However, within their own domain, modern partial evaluators are superior to
Thinner in several ways. For example, Thinner is not self-applicable. The partial
evaluation community has been quite interested in self-application, at least since
the 1970's [Ers82] [Ers78] [Fut71J and continuing to the present [JCB+90]. But the
language on which Thinner operates (TL, a purely functional, first-order language)
is much smaller than the language in which it is implemented (Common Lisp).
10
Thinner generalizes partial evaluation to a larger class of transformations. In
this respect it resembles the "driving" technique used in Turchin's Supercom-
piler [Tur86,TNT82,Tur80]. This technique arrives at a variety of transformations
through symbolic evaluation of expressions in the target language (Refal).2
1.3.4 Redundancy elimination
The term "redundancy elimination" occurs in the literature surrounding a par-
ticular kind of transformation for imperative programs. This transformation is a
generalization of common subexpression elimination with code motion.
Value numbering [CS70] is the parent of more modern methods [DST8O] of
identifying redundant expressions. A computation is fully redundant if there is
an equivalent computation before it on every path of control flow that reaches it
(remember that we are talking about imperative programs here). Computations
may also be partially redundant: redundant on some but not all paths. Morel and
Renvoise addressed the problem of eliminating partial redundancies [MR79j, and
later work built on this idea [RWZ88].
This kind of redundancy elimination is only loosely related to the Thinner
project. Many of the problems of the area are unique to imperative programs.
1.3.5 Advanced functional optimization
Other methods of advanced optimization for functional programs include the trans-
formational approach of Burstall and Darlington, the algebraic approach of Backus,
and the patterns-of-redundancy approach of Cohen. These are outlined below.
21 believe there is a connection between the "kernels" used by Thinner (see Chapter 5) and the
configurations taken as "basic" in the Supercompiler. But as the source of basic configurations
in Turchin's method is unclear, the connection may be only superficial.
11
The transformational method was pioneered by Burstall and Darlington [BD77].
They presented a small set of transformations including the fold and unfold opera-
tions for systems of recursion equations. This set of transformations is sufficiently
fine-grained that a wide range of program transformations can be composed from
it. The missing piece in this approach is a fully automatic way of deciding when
to apply these transformations. Field and Harrison comment on this deficiency:
The choice of the sequence of rules to apply is not precisely speci-
fied in the methodology, the next rule to apply in any transformation
process being selected by the program designer, perhaps under the guid-
ance of some informal transformation heuristic. [FH88, page 450]
The algebraic approach of Backus is another advanced method of optimization
for functional programs [Bac85]. The foundation of this approach is an axiomatic
semantics for the language FP. From these axioms Backus proved several theorems
about identities: one of these is a recursion removal" theorem which justifies a
class of transformations of recursive functions to iterative form. Kieburtz and Shul-
tis used a similar approach and developed additional theorems in their 1981 paper
[KS8l]. The mathematical elegance of the algebraic approach is attractive, but for
our purposes the question of whether automatic transformations are proved correct
by this or by some other formal means is moot. Is the optimizer based on broad
intuitive principles or on an ad hoc collection of transformations? Is it flexible,
or does it run through a catalog of known transformations and then quit? These
questions depend entirely on the actual transformation theorems and optimization
strategies employed; neither of these questions was addressed by Backus, and I
believe that they are relatively independent of the algebraic approach.
Another relevant method of optimization is the patterns-of-redundancy ap-
12
proach advanced by Cohen [Coh83]. He gave a taxonomy of several types of re-
dundancy: explicit, common-generator, commutative, and so forth. In Cohen's
approach, a program can be classified in one of these categories by fitting it into
a fixed recursion schema and identifying properties (like commutativity) of its
primitive functions. Cohen identified several important coarse-grained patterns of
unnecessary computation with this approach. He did not attempt to give a flexi-
ble collection of primitive transformations, and he did not address the problem of
automation. Indermark and Klaeren took a similar approach in their 1987 paper:
they were content to identify a specific high-level pattern of redundant computation
(Fibonacci-like recursion) [1K87]
1.4 Advice to readers
Readers who want to skim: the first section of each of the following chapters dis-
cusses the ideas in that chapter, omitting most of the technical details. The tran-
scripts of Thinner examples in the last section of Chapter 6 are probably also worth
skimming.
Readers who are interested in artificial intelligence: Thinner is a step toward an
automatic system that can challenge human competence in optimization, and for
this reason I do think of it as an Al project. But the techniques that proved most
suitable for the task are not standard Al techniques, and you will find few heuristics
in the following pages. Do not despair. It would not be difficult to cobble together
a collection of optimizing heuristics that performed as well as (or better than) my
Thinner on a particular collection of test cases. But what then could you claim
for its performance outside that collection? How could you describe the space
of transformations of which it was capable? By following a more rigorous path
13
from the formal statement of an optimizing principle, through semantic analysis
and grammatical transformation, to design and implementation decisions, we will
gain not only a principled optimizer, but also a respectable understanding of its
strengths and weaknesses.
Readers who are interested in formal languages: the material in Chapters 2 and
5 should be of most interest, especially the string case of the grammar-thinning
problem presented in Chapter 5. This interesting open problem in context-free
grammars has potential practical applications.
Readers who are interested in programming language semantics: Chapters 2,
3 and 4 should be of interest. Chapter 4 includes a treatment of the relational
constraint inference method in the framework of abstract interpretation.
Readers who are interested in building systems: I recommend the first section
of Chapter 2, then Chapters 4 and 6. I believe the relational constraint inference
method will be useful in other contexts: it strikes a reasonable bargain between
effectiveness and speed, and it is eminently parallelizable.
Readers who want to read everything carefully and tell the author about any
errors: bless you.
Chapter 2
Trace Grammars
2.1 Overview
Principled optimization begins with the formal statement of a principle; in this
case, the Principle of Least Computation. But first things first: here's an informal
statement of the principle.
Definition 2.1 (PLC Informal) A functional program should not make a corn-
putation if:
1. the result is already available, being either constant or equal to some other
available result; or
2. the result is not used, being neither an input to another computation nor an
output of the program.
Naturally, there can be no proof that a formal definition agrees with an intuitive
one. There can only be attempts to persuade the intuition. Thinner itself is one
kind of persuasion: evidence that, working from the formal definition developed in
14
15
this chapter, an automatic optimizer can discover many of the optimizations that
we normally explain in terms of the informal principle.
Formal statement of the PLC in the domain of purely functional, first-order pro-
grams requires a program representation that meets two important requirements.
First, it must make all the primitive computations, and all the intermediate values,
explicit; a representation that conceals the flow of data from computation to com-
putation will also conceal violations of the PLC. This requirement favors graph-like
representations, in which each intermediate value corresponds to an arc, over tex-
tual representations. The second requirement is that the representation must not
impose any unnecessary structure. Once an optimizer has detected a violation of
the PLC, the program transformation that removes that violation should be as
straightforward as possible.
To satisfy the requirements of PLC-based optimization I have developed a new
representation for purely functional programs [Web92a]. The basis of the represen-
tation is the trace graph, a directed acyclic graph structure that describes a single
path of computation, a single mapping from inputs to outputs by way of inter-
mediate values, without any loops or conditionals. The vertices of a trace graph
represent primitive functions and the arcs represent intermediate values. Of course
some of the computations in the course of a normal program's execution will be
predicates: comparisons, tests, or other primitives, each with a single true-or-false
output. These predicates determine which one of the many paths of execution
implicit in a program is actually followed. Any predicate evaluated in the course
of a particular path of execution will appear in the trace graph for that path, but
the output of that predicate will be fixed by the graph structure. A trace graph is
like a partial evaluation of a program a partial evaluation with respect to a fixed
control path, as in the work of Perlin [Per89]. It has an execution only for those
16
inputs that result in the proper predicate values.
The trace graph isn't powerful enough by itself to represent a program: a
program may have infinitely many possible paths of execution, so we need some
way to generate potentially infinite sets of trace graphs. I have developed a kind of
graph grammar, the trace 9rammar, for this purpose. It is a specialization of the
directed, node-label controlled (DNLC) graph grammars studied by Jannsens and
Rozenberg [Roz87], structurally (though not semantically) similar to Feder's plex
grammars [Fed71] [BH9o]. It works this way: a trace graph may contain vertices
whose functions are nonterminals, and each nonterminal appears on the left-hand
side of some production in the trace grammar. The right-hand side is a "daughter"
trace graph whose input and output arities match those of the "mother" vertex.
The host trace graph is transformed into a new graph by vertex replacement: the
mother vertex is removed and the daughter graph is inserted in its place. The
set of fully terminal trace graphs that can be generated from the start graph by
repetitions of this process is the set generated by the trace grammar.
This representation meets the first criterion: it makes each computation and
each intermediate value explicit. Thinner compiles purely functional programs
directly into trace grammars, which can then be unfolded at will to expose more of
the computation to inference and optimization. The trace grammar representation
also meets our second criterion, as shown in Figure 2.1. (In this and other trace
graph illustrations, computational vertices are represented by rectangles, input
and output vertices by lines, and the vertices that fix the values of predicates by
squares; dots indicate the input and output arities of each vertex.) The program
shown would benefit from code sinking: the evaluation of g is only needed in one
branch and should be performed there. The "code movement" normally involved in
this optimization is an artifact of the representation: it is required in the Lisp code
17
and ill the dataflow graph because they make a commitment to the exact point
at which the conditional parts of the computation diverge. The trace grammar
representation makes it clear that this optimization merely removes an intermediate
computation (the circled vertex) without affecting any other values. In short, it
repairs a violation of the PLC, much like dead code removal.
Lisp fragment:
(defun alpha (a)
(let ((x (g a)))
(if (P a) (h x) 0)))
Dataflow graph:
Trace grammar productions:
P
alpha H			alpha H
p			g
Figure 2.1: A comparison of three representations
18
The purpose of the trace grammar representation is to allow a rigorous expres-
sion of the PLC and to support the automatic derivation of optimizations from
it. The PLC sanctions the removal of unnecessary computations from a program;
this corresponds to the removal of vertices from a trace graph. Every vertex input
in a trace graph must be supplied by an edge, so if you remove vertices you may
also have to add edges to bypass the gap. I call this transformation a thinning.
Obviously, most thinnings alter the graph's function. We are chiefly interested in
conse?ative thinnings, which thin a graph without changing the surviving inter-
mediate values.
The conservative thinning transformation on trace graphs induces a relation
Ec: G'EcG (read "G' is conservatively thinner than G") if and only if G' can be
derived from & by removing one or more vertices and adding edges to bypass
the gap, so that G and 0' agree on every surviving intermediate value. The Ec
relation can be defined similarly for trace grammars, which is the basis of our
formal definition of the PLC: for a trace grammar Q there should be no Q'EcQ.
Informally speaking, this principle sanctions the removal of vertices which can be
bypassed (which happens when there is some redundant computation, some other
source for the same value), as well as the removal of vertices whose outputs are
unused and, in the case of trace grammars, the removal of an entire trace graph if
it represents a path through the program than can never be taken.
The remainder of this chapter deals with trace graphs and trace grammars, and
explores their properties in more detail.
19
2.2 Irace graphs and the PLC
A program foreshadows a class of potential paths of execution, only one of which is
realized for any given input. These individual paths of execution are simply graphs
of the flow of data, mapping inputs to outputs by way of intermediate values. By
treating individual paths we avoid, for the moment, a variety of thorny issues about
conditional and recursive structures in the source language.
2.2.1 Trace graphs
Definition 2.2 A trace graph G = (V? E) consists of a finite vertex set V and an
edge set E. Each vertex v E V has a fixed input arity and a fixed output arity,
which give the extents of the input vector v?? and the output vector vO?t. The
vertex set V is partitioned into five disjoint subsets:
a singleton with input arity 0, standing for the producer of a computation's
inputs.
V0: a singleton with output arnty 0, standing for the consumer of a computation's
outputs.
V?: a set of vertices with input arity 1 and output arity 0. (Each vertex in VT
represents a true-or-false decision which, in this path of execution, gets the
boolean value true on its input edge. These vertices are called true predicates.)
a set of vertices with input arity 1 and output arity 0. (Each vertex in VF
represents a true-or-false decision which, in this path of execution, gets the
boolean value false on its input edge. These vertices are called false predi-
cates.)
20
Vc: a set of vertices with input arity > 1 and output arity > 1. (These vertices
represent primitive functional computations.)
The edge set E is an acyclic set of directed edges; each edge is either a pair
(v??t, Wj?), representing a value produced by one vertex and required by another, or
a pair (c, wJ?), representing a constant supplied to a vertex input. There is exactly
one edge supplying every vertex input. Each vertex v in Vc U VT U VF has a vertex
label label(v)
The vertices Vi and V0 represent the external source and sink, the ultimate
progenitor of inputs and devourer of outputs. Between V and V0 is a computation
that maps inputs to outputs by way of intermediate values. A program may contain
conditionals, in which case a part of the computation is the evaluation of a boolean
function which is used to choose one of two paths for the subsequent flow of data.
The trace graph, however, represents a single path of execution: the computation
producing the boolean value is present, but the value is consumed by a predicate
and the decision, the form of the graph, is fixed.
Graphs that have some structure in common will play a key role in later dis-
cussions, so we will make use of the following definition:
Definition 2.3 Vertex v in trace graph G matches vertex v' in trace graph G' if
and only if the subgraph of C consisting of v and all its ancestors is isomorphic to
the subgraph of G' consisting of v' and all its ancestors.
In this and all subsequent appeals to trace graph isomorphism, it is assumed that
isomorphism considers vertex arities, partitions, and labels as well as edge struc-
ture, so that isomorphic trace graphs are identical up to vertex renaming.
Figure 2.2 shows a simple example of a trace graph. Observe that the input
and output vertices are drawn as horizontal lines, the predicates as squares, and
21
the other vertices as rectangles; also, the vertices are marked with dots to show
their input and output arities.
a
b			-			c equal
e
dT
Vj			=
v0			=			?e1
VT			=			tdl
VF			=
Vc			=			?b,c1
E			=			?(a10?,b1??), (ai0?,ci?),
(a20?, b2??), (a20?, c2??),
(b10?,e1??), (c10?,d1??)J
Figure 2.2: Example of a simple trace graph
2.2.2 Vertex labels and executions
Each vertex in Vc U VT U VF has a vertex label. Ordinarily this label is a relation:
a relation on a space of values which is unspecified but which must include the
boolean values true and false. All vertices v ? VT have label(v) = ?(true)? (a
unary relation, since the input arity is 1 and the output arity is 0) and all vertices
v ? VF have label(v) = i(false)?. Most vertices in Vc are labelled with relations
too: these are the terminal vertices. (When we introduce trace grammars we will
find another use for the labels of nonterminal vertices.) In a terminal trace graph,
all vertices are terminal, so all labels are relations of the appropriate arities.
22
In fact, these relations are usually functions: for any vertex v, for any tuple
x of input values, there is usually a unique tuple y such that x y ? label(v).
Until we introduce an abstract interpretation in Chapter 4, we will assume that
the label of a vertex in Vc is a function, and we will draw trace graphs by naming
the function in question. For example, the label of the vertex marked "-" in
Figure 2.2 is ?(x, y, z) x --H = zi, and the label of the vertex marked "equal" is
?(x,y,true) x = y? U f(x,y,false) x ?
A trace graph can be thought of as a partial program that computes a function
for a limited set of inputs.
Definition 2.4 Given a fully terminal trace graph G = (V, E), an execution of G
is a function f that assigns a value to each vertex input and output such that:
o+ For each edge (v2O?t,wjfl), f(w)?) = f(vto."t);for each edge (c,w)?), f(Wj??) =
o+ All v ? Vc U VT U VF with input arity n and output arity m satisfy
(f(vft),.,f(vn?fl),f(v10?t),.,f(vrn0?)) ? label(v)
If x is the tuple of values assigned by an execution f to the graph inputs, we will
say that G has execution f on x. The domain of a graph, dom(G), is the set of
input tuples for which G has an execution.
Figure 2.3 shows an execution of the trace graph from Figure 2.2. (This graph
has an execution on an input (n, d) if and only if n =
The important thing about executions is that they fix not only the relation of
inputs to outputs, but also the relation of inputs to all intermediate values. This
relation is, in fact, a partial function: for any input there is at most one execution.
23
22
2			2
equal
0			true
0
Figure 2.3: Execution of a trace graph on (2,2)
Theorem 2.1 If all vertex labels are functions, there is at most one execution of
a terminal trace graph on any fixed input vector.
Proof: Consider any execution f of a trace graph G on a fixed input vector x. Take
any vertex output vtO?t Since the graph is acyclic, v??t is at the root of a finite
tree of ancestors. There are only two ways a path of ancestry can stop, giving
a leaf of the tree of ancestors: it can stop at a vertex input whose edge is from
a constant, or it can stop at a vertex output in Vj. (All other vertex inputs are
sourced by an edge from a vertex output, and all other vertex outputs are from
vertices with input arity > 1.) Now f assigns constant values to all the leaves in
the tree: the appropriate constant to vertex inputs with constant sources, and the
appropriate value from x to vertex outputs in Vj. Since v??t is at the root of a tree
whose leaves are assigned fixed values and whose internal nodes are functions, f
must assign a fixed value to v??t. Now take any vertex input Wj? If it has an edge
from a constant, f must assign it that constant; if it has an edge from a vertex
output vf?t, f must assign it the same value as vtO?t, which is fixed for fixed input
24
vector x. Thus, f is determined by x over its entire range. ?
2.2.3 Thinning trace graphs
The PLC sanctions the removal of unnecessary computations from a program,
which corresponds to the removal of vertices from a trace graph. How can you
remove some vertices and still have a legal trace graph? You would have to remove
not only the vertices in question, but all the edges to and from those vertices. You
would probably also have to add some edges. Since every vertex input in a trace
graph must be the target of exactly one edge, you would have to add edges to
bypass the gap, so that every vertex that used to get its input from one of the
excised vertices would be the target of an edge from elsewhere in the graph.
Definition 2.5 Let & = (V, E) be a trace graph, W a vertex set with W c (Vc U
VT U VF). Let Ew be that subset of E with source or destination vertices in W. A
bypass for W is an edge set F such that ((V \ W), (E \ Ew u F)) is a legal trace
graph.
A bypass is simply a set of new edges; it has an edge for every edge that used to
leave W, supplying the same target vertex but from a source that is not a vertex
in W. It does so in a way that makes ((V \ W), (E \ Ew u F)) satisfy the definition
of a trace graph; in particular, the new edge set must be acyclic.
The idea of removing and bypassing some vertices is critical. I call this trans-
formation a thinning. It induces a "thinner than" relation E preserved by isomor-
phism: one graph is thinner than another if it is isomorphic after removing and
bypassing a subset of vertices.
25
Definition 2.6 Let C = (V? E) and G' = (V', E') be trace graphs. G'EG if and
only if there exists some vertex set W c (Vc U VT U VF) and some bypass set F
such that ((V\W),(E\EwUF)) is isomorphic to G'. G'LG if and only if G'EG
and G' is not isomorphic to G.
To identify the W and F involved in the thinning, we will sometimes write C?LFWG.
a
c
dM
zFW
a -			W = (b,cJ
0
F = ?(a10?,d1??),(O,d2??)?
Figure 2.4: A thinning
Figure 2.4 shows an example of a thinning. The following theorem summarizes
several properties of the E relation.
Theorem 2.2 The following are properties of the E relation on trace graphs:
1. GEG.
2. If C'EG and GEG' then C and G' are isomorphic.
3. If G11EG' and G'EG then G"EG.
?. If GEFWH and G?EFWI'H then there is a graph G" such that GllEF%UFW'H if
and only if no edge in F u F' has an end in W u w', no two edges in F u F'
have the same target, and there are no cycles in (E \ Ewuwi u F u F').
Proof: The reflexive property follows from the definition, since GE?G. For the
identitive property, note that for G = (V, E) and G' = (V', E'), G'EG implies
26
IV'I < V and GEG' implies V < JV'?, so V = V1I and G' is isomorphic to G.
To prove the transitive property, suppose GllEFWJ,GI and G'LFw?'G for some sets
W', W", F', F". Construct a set F including every edge in F", and also every edge
in F' with no end in W". The edge set of G" contains three groups: those from
F", those from F' not removed by bypassing W", and those from G not removed
by bypassing W" or W'. By construction the first two groups make up F, and the
remaining edges are those not removed by bypassing W' u w". So c??Ew'uw"G.
The fourth property gives necessary and sufficient conditions for composing two
thinnings by a simple union. We're given that G?LFWH and G?LFWI'H for some trace
graph H = (V E). Suppose there is a graph G" such that GllE?F%UFW'H. Since G"
is a trace graph its edge set is acyclic. Also, the edge set meets the two other
conditions there are no two edges to the same target, and no edge with an end
in W U W' so (F U F'), which is a subset of the edge set, meets those conditions
too.
Now assume all three of the conditions are met for F U F'; we have to show
that G" = ((V \ (W u W')), (E \ Ewuw? U F U F')) is a trace graph. In particular,
we have to show that the edge set is acyclic, restricted to the vertex set, and has
one edge to every vertex input. It is acyclic by assumption. Also by assumption,
no edges involving W U W' were added; and all old edges involving W U W' were
removed. The new edge set has at least one edge to every vertex since every vertex
that had an input edge from W or W' is the target of a bypass edge in F U F'; and
it has no more than one edge to any vertex since there are no two edges in F U F'
with the same target but different sources. So G" is a trace graph. ?
Usually, when you thin a graph, you end up with a graph that computes some
completely different partial function. That isn't going to be useful in an optimizer!
27
We're really interested in a special class of thinnings: those that preserve function-
ality in a particular way. The two graphs of Figure 2.4 already stand in this (much
more demanding) relation. Executions of the two graphs on the same value always
agree.
Definition 2.7 Suppose G = (Nr, E) and G' = (V', E') are terminal trace graphs.
G'EcG if and only 4:
1. G'EC. (This defines an injective function h from vertex inputs and outputs
of V' into corresponding vertex inputs and outputs of V.)
2. For every execution f of G there is an execution f' of G' such that f' = f oh.
3. For every execution f' of G' there is an execution f of G such that f' = f oh.
If G'EG in condition 1, 0' EeC
A conservative thinning removes vertices from a graph in a way that preserves
the values assigned to the remaining vertices by all executions.1 This is much
stronger than saying that the function is extensionally equivalent after thinning:
not only the function's outputs, but indeed all surviving intermediate values, are
preserved. Observe that by Definition 2.7, predicates are only unnecessary if they
play no role in limiting the set of inputs for which the graph has an execution. If
removing a predicate would allow additional executions, enlarging the domain of
the graph, the thinning is not conservative. Theorem 2.3 lists a variety of other
properties of ?.
Theorem 2.3 The following are properties of the ? relation on trace graphs:
1. C'EeC implies C'EC (but not the reverse).
?Johnson [Joh86] uses the term "thinning" in a different context, to refer to any restriction of
an equivalence relation that retains at least one member of each equivalence class. This is related
to my "conservative thinning" only accidentally; see the proof of Theorem 2.5, below.
28
2. CEeG
3. If G'?G and G?G' then G and G' are ?somorphic.
4. If G11EcG' and G'EcG then G11EeG.
Proof: The first three properties follow from the definition and from the correspond-
ing properties of E. The transitive property follows from the transitive property of
E: when G"?G' and G'EcG we know that GttEG and the two subgraph isomor-
phisms can be composed to give a mapping function h that satisfies the definition
for G11EcG. c
2.2.4 The PLC for trace graphs
The PLC for the domain of trace graphs has a natural formulation in terms of
conservative thinning.
Definition 2.8 (PLC Trace Graphs) For a trace graph G there should be no
G'EcG.
Compare this definition with the informal statement of the PLC in Defini-
tion 2.1. The first clause of the informal statement says that a computation is
unnecessary if the result is constant or equal to some other available result. Re-
placement by a constant or by a different available result is the idea captured by
the bypass set we've simply defined "availability" in terms of legal trace graph
structures so that, roughly speaking, a result is available as long as using it doesn't
introduce a cycle. The second clause of the informal statement says that a com-
putation is unnecessary if the result is unused. This also is captured by the formal
29
definition, since if no edges leave the set W, it is vacuously true that a bypass
exists (F = ?) and that the resulting thinning is conservative.
2.2.5 Discussion
Any optimization principle like the PLC must refer to a cost model. The PLC's
cost model is a simple one: the cost of a trace graph is the number of vertices.
This model would be inadequate for parallel computation, because one graph can
be thinner than another yet have a longer critical path.
The trace graph can be thought of as a partial evaluation of a program: a partial
evaluation with respect to a fixed control path as in the work of Perlin [Per89].
A trace graph is a partial program which is equivalent to some full program on a
subset of inputs (those for which it has an execution). By thinning it conservatively
we can optimize it for operation on such inputs.
Discovering violations of the PLC appears to be simply a matter of finding
conservative thinnings. Unfortunately, the word "simply" is something of an over-
statement: the question of whether a thinning is conservative is undecidable even
for a very restricted class of trace graphs.
Theorem 2.4 Consider the class of trace graphs containing only two-input integer
additions, multiplications and exponentiations, positive integer constants, and inte-
ger equality tests. The question of whether a thinning is conservative is undecidable
for this class.
Proof: Using the vertex types mentioned above we can construct a graph com-
ponent for any expression E that involves only integer addition, multiplication,
exponentiation, positive constants, and variables (the variables are input edges at
the leaves of the expression tree). So given any two expressions E1 and E2 we
30
can construct a component that evaluates both and compares the two results for
integer equality. Let C be a graph with such a structure for expressions E1 and
E2, in which Vj is connected to the expression inputs, and the comparison output
is connected to a false predicate v ? VF. This G has an execution on x if and
only if x is an assignment to the variables of the E1 and E2 that makes them
unequal. Vertex v is unnecessary if and only if E1 = E2 is unsatisfiable. But this
is undecidable: there is an exponential diophantine equation with one parameter
N which is unsatisfiable if and only if the program with G6del number N does not
halt on any input [JM84]. ?
This is our first glimpse of the inference problem Thinner will have to deal
with: it must perform some kind of semantic analysis on trace graphs to identify
conservative thinnings. It must also carry out any conservative thinnings it identi-
fies. Suppose we have all the semantic analysis we need: a partition of the vertex
inputs and outputs into equivalence classes (so that two objects are in the same
equivalence class if and only if they are assigned equal values in all executions),
and a constant for each constant equivalence class. Can we then thin the graph
optimally and efficiently? The answer is no: the problem of identifying an optimal
thinning is intractable.
Definition 2.9 The minimum thinning problem is this: given a trace graph G =
(V, E) and an integer k ? IVI, and given a partition of the vertew inputs and
outputs into equivalence classes, decide whether there is a graph ?? = (V', E') such
that C'EG the thinning is conservative with respect to the given partition, and
V' ?
Theorem 2.5 The minimum thinning problem is NP-complete.
32
little further significance. The practical inference problem is not the problem of
finding all violations, but the problem of finding as many violations as possible, as
quickly as possible. The practical reformulation problem involves deep difficulties
in transforming recursive structures, and we will gladly settle for a greedy heuristic
when we finally get around to thinning individual trace graphs.
But this is a digression from our immediate goal, which is to develop a formalism
for complete programs.
2.3 Sets oftrace graphs and the PLC
As a single trace graph represents a path of execution in a functional program, so
a set of trace graphs represents a collection of paths. If constructed correctly, a
set of trace graphs may represent exactly the collection of all possible paths for a
single program. The examination of these executable sets is the next step in the
development of our formalism for programs: at this level we will take conditional
execution into account, but still defer issues relating to recursive structures in the
source language.
2.3.1 Sets oftrace graphs
To represent a program, a trace graph set must meet several conditions. Of course,
the extents of Vj and V0 must match for all graphs in the set; from now on we will
make this assumption about all trace graph sets, and we will use dom(S) to denote
UCES dom(G). The set must also be deterministic, in the sense that it must not
have more than one member with an execution for a given input. But these two
properties by themselves do not guarantee that a set of trace graphs corresponds to
a program. For example, consider the set shown in Figure 2.5. It has exactly one
31
Pwof The minimum thinning problem is in NP, since we can verify in polynomial
time whether a particular G', W and F satisfy G?LFWG, and whether that thinning
is conservative with respect to the given partition. The remainder of the proof is
by transformation from minimum cover [GJ79J.
An instance of the minimum cover problem is a collection C of subsets of a finite
set S, and a positive integer k < C?. We'll construct a corresponding instance of
the minimum thinning problem as follows. Let there be one equivalence class Es
for each element s ? S. For each c ? C, let there be a i-input, cj-output vertex
v E Vc; and for each s ? c assign one vertex output vf.?t to the equivalence class
E5. Let the graph have a single input, with an edge to the input of each v ? Vc;
and let it have ?cEC Ici outputs, one for each vertex output in Vc.
Now for a given k, there is a = (V', E') such that G'EG, the thinning is
conservative with respect to the partition, and jV'I < k, if and only if there is a
C' c c such that every element of S belongs to at least one member of C' and
< k. This follows immediately since vertices can be removed in a conservative
thinning if and only if a representative of every equivalence class remains. ?
This proof involves the construction of trace graphs with unbounded degree,
which may seem unreasonable. But with a bit more work we can prove the same
result for graphs with vertices restricted to indegree 1 and outdegree ? 3. (Note
that the minimum cover problem remains NP-complete even if all c ? C have
IcI<?3.)
So even if the semantic analysis problem is licked, the problem of choosing an
optimal thinning is still intractable. These two problems identifying violations
of the PLC and repairing them--Hare treated in Chapters 4 and 5. But the un-
computability of the first and the intractability of the second turn out to have
33
graph with all execution for any input, but the difference between the two graphs
in the set represents a choice that must be made by any corresponding program,
and there is no predicate that can be evaluated in both traces in time to guide the
choice. Any program with these two traces would have to be prescient--Hit would
have to decide immediately and spontaneously whether to compare or to subtract.
T
Figure 2.5: A non-executable set of trace graphs
Imagine that you are given a deterministic set of trace graphs and an appro-
priate input vector. Your task is to perform exactly those computations called for
by the trace graph with an execution for that input, if any. You are not told which
graph in the set actually applies to that input, so the trick is to narrow the set of
possibilities by evaluating predicates and discarding those graphs that do not have
an execution; for the task to be possible, there must be a way to do this without
deviating from the computation called for by every trace graph not yet eliminated.
When this is possible for every input, the set of trace graphs is called executable.
To make this concept more precise, we first define a predicate partition of a set
of trace graphs, which splits a set in two using a predicate common to all graphs
in the set. Then we define an executable set of trace graphs inductively: a set is
34
executable if it is a singleton or empty, or if it can be predicate partitioned into
executable sets.
Definition 2.10 A predicate partition of a set 8 of trace graphs is a function p
identifying a predicate of each element of 8, such that for any G and G' in 8,
p(G) matches p(G') (except that one may be a true predicate and the other a false
predicate).
So a predicate partition selects a predicate (which is just a vertex in VT or VF)
from each graph in 8. These predicates match each other, which is to say the
subgraph of ancestors of a predicate in one graph is isomorphic to the subgraph of
ancestors of the predicate in another. The predicates thus represent true-or-false
decisions which are computed identically in every graph in 8, and the predicate
partition divides 8 into two sets: those for which the predicate is true and those
for which it is false.
Definition 2.11 A decision tree for a set 8 of trace graphs is a binary tree with
a subset of 8 at each node and a predicate partition of that subset at each internal
node. If 8 is empty or is a singleton, the decision tree for 8 is a leaf giving 8.
Otherwise, a decision tree for 8 is a node giving both 8 and a predicate partition
for 8, whose left child is a decision tree for that subset of 8 for which the predicate
is true, and whose right child is a decision tree for that subset of 8 for which the
predicate is false. No predicate can occur more than once in any partition in the
tree.
Definition 2.12 An executable set of trace graphs is one for which there is a
decision tree.
A decision tree is a tree of if-then-else refinements, zeroing in on a trace graph
with an execution for a given input. Intuitively, any program that makes all its
35
decisions based on predicates defines a decision tree and must generate an exe-
cutable set of trace graphs. The next theorem shows that an executable set can
be thought of as a kind of program itself, since it contains exactly one trace graph
with an execution for any input.
Theorem 2.6 If S is an executable set of trace graphs, and if all vertex labels are
functions, then for any input vecThr x there is at most one G c S with an execution
on x.
Proof: Suppose 5 is executable and suppose by way of contradiction that there are
two different graphs G and G' in 5 with executions f and f' on the same input
vector x. 5 has a decision tree with G and G' in different leaves: let N be that
common ancestor of G and G' furthest from the root of that decision tree. At
N there is a subset of 5 containing G and U', and a predicate partition p that
selects a true predicate for one and a false predicate for the other. Without loss
of generality, assume p(G) = v ? VT and p(G') = v' ? VF'. Then f(v?) = true
and f'(v'1??) = false. But v and v' are matching vertices, so by Theorem 2.1
= f(vI?). This is a contradiction. ?
Theorem 2.6 shows that any executable set is deterministic. As Figure 2.5
shows, the reverse is not true.
2.3.2 The thinning relations for trace graph sets
The idea of thinning developed for individual trace graphs extends naturally to
sets of trace graphs. One set of trace graphs is thinner than another when every
36
member of the thinner set is thinner than some member of the thicker set.2 That
is:
Definition 2.13 Suppose 5 and 5' are sets of trace graphs. S'?S if and only if
for every G' ? 5' there is some GE 5 for which G1EG. S'zS if and only if 5'E5
and there is no one-to-one mapping of graphs in 5' to isomorphic graphs in 5.
Theorem 2.7 summarizes some properties of the E relation on sets.
Theorem 2.7 The following are properties of E for sets of trace graphs:
1. SES.
2. If 511ES' and 5'E5 then S1,ES
3. 5' C 5 implies 5'E5.
Proof: For transitivity, 5"E5' means that for every G" E 5" there is some G' E 5'
with G"EG' and 5'E5 means that for every G' E 5' there is some & E 5 with
G'EG. It follows that for every G" E 5" there is some G E 5 with G"EG so
5"ES. The third property follows directly from the definition: if 5' c 5 then for
every G' E 5' there is some G E 5, namely G = G', with G'EG. This also implies
refiexivity since 5 C 5. El
Note that we have lost the identitive property of the E relation on individual
trace graphs: two sets can be strictly thinner than each other.
The ? relation also generalizes to trace graph sets, biit more care is required.
Obviously, a conservatively thinner set should be, structurally, a thinner set; as
with individual trace graphs, the additional constraint has to do with agreement
between executions. A trace graph set is a function just like an individual trace
2Compare this with the lower or Hoare powerdomain [Gs9o].
37
graph (except that it may be nondeterministic). For one set to be conservatively
thinner than another, the two sets must have the same domain, and every execution
of a graph in the thinner set must agree with some execution of a graph in the
thicker set.
Definition 2.14 Suppose 5 and 5' are sets of trace graphs. S'?S if and only 4:
1. S'ES.
2. For every execution f of some & = (V E) E 5 there is an execution f? of
some G' = (V', E') E 5' such that G'EG (this defines an injective function
h from vertex inputs and outputs of V' into corresponding vertex inputs and
outputs of V) and f' = f oh.
3. For every execution f' of some G' = (V', E') ? 5' there is an execution f
of some G = (V, E) ? 5 such that G'E& (this defines an injective function
h from vertex inputs and outputs of ?? into corresponding vertex inputs and
outputs of V) and f' = f o
If S'E5 in condition 1, 5'Ec5.
Note that this definition gives f&'1Ec?&1 if and only if G'E?cG, as expected. The
definition does not require that every graph in the thicker set be thicker than some
graph in the thinner set. If a graph in the thicker set has no executions, it need
not be represented in the thinner set. The fact that the EC relation allows the
removal of vacuous graphs from the set as well as conservative thinnings of graphs
within the set is important because it admits a significant class of "dead code"
style transformations.
Theorem 2.8 summarizes some other properties of the Ec relation for sets.
38
Theorem 2.8 The following are properties of Ec for sets of trace graphs:
1. S7EcS implies S'ES (but not the reverse).
2. SEC 5.
3. If 5ll?51 andS'?S thenS"EcS.
Proof: The first property follows directly from the definition. For every G ? 5
there is some H ? 5, namely H G, for which HEcG. This proves the reflexive
property. Transitivity follows because for each G" ? 5" and each execution f" of
G", there is a thicker G' E S? with an execution f' that agrees with f", and there
is a thicker G ? 5 with an execution f that agrees with f'. The two subgraph
isomorphisms can be composed to give a mapping function h from G" to G that
satisfies the definition, and the transitivity of E completes the proof. ?
If S'EcS and 5 is executable, it does not necessarily follow that 5' is executable:
each graph in 5 may be thinned in a different way, so that there are no matching
predicates for a decision tree. How can we characterize E relations that preserve
executability?
It is not hard to characterize an executable thinning if it preserves a decision
tree T of the original graph. It removes no graph from the set and no predicate
from any graph; and if it changes an ancestor of a predicate in any graph, it must
make the same change to every graph at that node of T where the predicate is
used (since all the predicates in a predicate partition must match). A similar,
more complicated characterization can be developed to cover executable thinnings
that only prune a decision tree of the original graph.
But an executable thinning need not have a decision tree that bears any rela-
tion to a decision tree of the original graph. Consider the set thinning shown in
39
Figure 2.6. By forming differences among formerly identical predicates it excludes
the original decision tree, and by erasing differences among predicates it allows a
new decision tree. So although every graph in the thinned set is a thinned version
of a graph in the original, the decision trees for the thinned set may be completely
different they need not be "thinned" versions of the original decision trees.
2.3.3 The PLC for executable sets
The principle of least computation for individual trace graphs expressed in Defini-
tion 2.8 extends naturally to executable sets of trace graphs.
Definition 2.15 (PLC Executable Sets) For an executable set S there should
be no executable S'EeS.
We started with an intuitive idea: that a program should not do anything unneces-
sary. Elaborated on an intuitive level, this meant that a program should not make
a computation with a constant, redundant or unused result. Formalized in terms of
individual trace graphs, it meant that there should be no way to remove vertices,
bypass them, and end up with a legal trace graph that makes the same computa-
tion for any input but with fewer intermediate values. Now we have reached the
level of executable sets, and at this level the principle is that there should be no
way to remove and bypass vertices and end up with a set that makes the same
computation for any input, with fewer intermediate values for at least one.
Actually, that isn't quite enough to make a violation of the PLC as expressed
in Definition 2.15. The principle is only violated when there is a conservatively
thinner executable set. There may be superfluous computation that cannot be
removed without making the set non-executable, as is the case in Figure 2.7. There
are several non-isomorphic sets conservatively thinner than the set f&, Hi shown
40
T
I:
F
F
G
Figure 2.6: Trace graph sets fO, H, I, JJ ?fG', H', I', J'J, with decision trees
41
one of them is ?&`, Hi, with G' as indicated--Hbut none is executable. In effect,
there is no way to discover that graph G is the one with an execution on a given
input until after performing computations that are only necessary in H.
2.3.4 Discussion
There is an interesting connection between sets of trace graphs that are not neces-
sarily executable and nondeterministic automata. Perhaps this would be a fruitful
area for further research.
An executable set of trace graphs is a program expressed without recursion:
every possible path is explicit. This is part of what makes this a good domain for
the formal expression of the PLC. In the next section we will see that recursive
structure makes these simple concepts much harder to manipulate.
Executable sets express a bare minimum of control: there is at least one order
of execution that selects the right trace for any given input without deviating
from that trace, but the executable set doesn't say what that order is. This may
seem unnecessarily abstract, but it too contributes to clean formalization of the
PLC. Consider an alternative formalism, the dynamic dataflow graph in the style
of Arvind [AN87], in which the two arms of a conditional have inputs directed
to them by "switch" nodes and outputs collected from them by "mux" nodes. In
the introduction of this chapter, Figure 2.1 showed an illustrative code fragment
expressed both as a datafiow graph and as an executable set.
The code fragment of Figure 2.1 exposes an opportunity for the optimization
known as code sinking: the evaluation of g is only needed on the left branch and
should be performed there. The "code movement" required by this transformation
in the datafiow case is an artifact of the representation; it is necessary because the
representation has made a commitment to the exact point at which conditional
42
T
Zc
T
Zc
T
Figure 2.7: A set fG, H? not quite ill violation of the PLC
43
parts of the computation diverge. The executable set representation makes it clear
that this optimization merely removes some intermediate values without affecting
the result: in short, it is a thinning, much like dead code removal. Figure 2.8
shows another such comparison; again, the datafiow graph representation obscures
an obvious thinning.3
2.4 Trace grammars and the PLC
Most interesting programs have infinitely many possible paths of execution, so the
explicit trace graph set is going to be inadequate as a representation. What we
need is a finite representation that implicitly identifies a potentially infinite trace
graph set. We therefore turn to trace grammars, which are finite objects that
generate sets of trace graphs.
2.4.1 Trace grammars
Trace grammars will provide a mechanism for deriving one trace graph from an-
other, and so for generating a language of trace graphs from an original "start
graph". In a trace grammar one trace graph derives another by vertex replace-
ment. The mother vertex and the daughter graph with which it is to be replaced
must have matching input and output arities. Remove the mother vertex from the
host graph and remove the input and output vertices from the daughter graph,
then stitch the daughter graph into the host graph with a "seam" of edges con-
necting the matching inputs and outputs. This seam of edges includes one edge to
each vertex input left sourceless in the daughter graph (those that used to have an
3These examples are intended to explain the choice of executable sets as a formalism for study
of the PLC, not to criticize the ideas of dynamic dataflow. Dynamic datafiow was developed for
a different purpose, namely, as an architectural approach for exposing fine-grained parallelism.
44
Dataflow graph:
g
Lisp fragment:
(defun alpha (a)
(h (g x) (if (P ::) (g x) x)))
Executable set:
h
Figure 2.8: Another comparison of three representations
45
edge from the input vertex of the daughter graph), and one to each vertex input
left sourceless in the host graph (those that used to have an edge from the mother
vertex). Formally:
Definition 2.16 Let G (V? E) and H = (W? F) be trace graphs with disjoint
vertex sets, and let v be a vertex in Vc matching H in input and output arity.
Define E0ld to be E restricted to V \ ?vi, and define Enew to be F restricted to
Wc U WT U WF. Define Eseam as follows:
o+ For each edge in F from the j th vertex output of wi? to some w???, Eseam has
an edge (vf.?t, wk??), where utO?t was the source of the edge in E to the jth
vertex input of v.
o+ For each edgefrom v, (vf?,u???) ? E, Eseam has an edge (x,o?t,u?in), where
x,o.?t was the source of the edge in F to the j th vertex input of W0.
The replacement of v in G by H yields a trace graph G' = (V', E'), where V' =
V\?4UWcUWTUWF andE'=Eoi?UEseamUEnew.
Vertex replacement is a straightforward operation because the graph inputs
and outputs match up with the vertex inputs and outputs. There's no uncertainty
about how a graph should be substituted for a vertex: just connect the matching
inputs and outputs. Figure 2.9 shows an example of a vertex replacement.
Definition 2.17 A trace grammar is a ?-tuple (N, T, P, S). N is a finite set of
nonterminal labels and T is a finite set of terminal labels. (N and T are disjoint.)
P is a finite set of productions of the form (0 H D), where 0 ? N and D is a
trace graph with the same input and output arities as 0. S is a graph called the
start graph. All of the Vc, VT and VF vertices in P and 5' have labels in N u T
46
M
A
g
F
Figure 2.9: The replacement of a vertex in G by a graph D
47
Suppose that Q = (N, T, P, S) is a trace grammar and G = (V, E) and G' =
(V', E') are trace graphs in which all of the vertices have functions in NUT. Then
G ?? G' if and only if there is a vertex v ? Vc with a nonterminal label o,
and a production (? H D) in P, such that the replacement of v in G by a graph
isomorphic to D that shares no vertices with G, yields G'. This is the mechanism
by which one graph is derived from another directly. It can be extended in the
usual way to the reflexive and transitive closure: G ??* G' if and only if there is
a sequence of zero or more single-step derivations from G to G'. The relation
can be used to define the set of trace graphs generated by a trace grammar:
Definition 2.18 Let Q = (N, T, P, S) be a trace grammar, and let A be the set
of graphs C such that S G and all vertices in G have functions in T. A set
generated by Q, denoted L(Q), is a minimal set of vertex-disjoint trace graphs with
an element isomorphic to every element of A.
Figure 2.10 shows an example of a trace grammar with part of the set generated
by it. This grammar generates an executable set, which not all grammars do. There
is a restriction on the set of productions that guarantees that the generated set
will be executable. Trace grammars that obey this restriction, like the one in
Figure 2.10, are normal trace grammars.
Definition 2.19 A normal trace grammar Q = (N, T, P, S) is one in which all
nonterminals ? in N have the property that fD (? D) ? PJ is an executable
set.
Theorem 2.9 If Q is a normal trace grammar, L(Q) is executable.
Proof: Let R be the set generated by a normal trace grammar Q = (N, T, P, 5).
Define a function f that maps each graph G E R to a breadth-first derivation
48
A trace grammar Q = (N, T, P, S):
N=fo??			T=ff,?1
f
P=t(?H
The first three elements of the generated set:
T
T
T
T
Figure 2.10: Example of a trace grammar and generated set
49
of G.4 We will define a decision tree Tf(R) recursively using the breadth-first
derivations chosen by f. Let n be the greatest integer with the property that all
f(G) for G ? R agree in the first n steps. Since f chooses a fixed-order derivation,
all the (n + 1)th steps are expansions of the same nonterminal, and since f chooses
a breadth-first derivation, that nonterminal has terminal ancestors only. So T1(R)
can start with a decision tree for the possible expansions of that nonterminal,
which must exist since Q is normal. This partitions R into subsets that agree on
the first n + 1 steps; for any such R' c R with more than one member, construct a
decision tree Tf(R') recursively. Since all f(G) for G ? R' agree on the first n + 1
steps, Tf(R') will only use predicates generated by nonterminals that were not yet
expanded when TF(R) was constructed. It follows that no predicate is used more
than once, so the resulting structure is a decision tree for R. ?
2.4.2 Correspondence to CFG's
Can any executable set be generated by some trace grammar? To answer this
question, we will develop a connection between trace grammars and traditional
context-free grammars on string alphabets. This connection will also prove useful
in answering other questions about trace grammars later on.
First we'll define a correspondence between symbols in an alphabet ? and trace
graph components. Let ?+ be ? U twi, where w is a new symbol not occurring
in ?. (w will mark the end of a string.) Let Pj denote a distinct function for
each i from 1 to [log2 J?+j] + 1. For each symbol x in ?+, define a unique truth
assignment vector vz so that v? is x's assignment to Pj and no other symbol has
4A trace graph does not necessarily have a unique breadth-first derivation. If this troubles
you, derivations can be ordered and the least breadth-first derivation selected for each graph.
50
the same pattern of truth assignments. Finally, let fX denote a distinct function
for each symbol x in ?+. The graph component corresponding to a symbol x in ?+
is as pictured in Figure 2.11. It contains a vertex for each of the Pj'5, each with an
edge to a vertex either in VT or VF as determined by the unique truth assignment
for x. It also has a vertex for the function fx, The inputs to the component are
the inputs to fX and the Pj'5. They have a common source as indicated in the
diagram. The output of the component is the output from fX
n
Figure 2.11: The trace graph component corresponding to x ?
This correspondence between symbols and graph components leads to a simple
correspondence between strings and graphs. If s is a string of length n, index the
characters of the string as .1... Sn+1, where Sn+1 refers to the string end marker
w. A graph G = (V? E) corresponds to the string s if and only if:
o+ G contains a unique component for each Sj, an input vertex, an output vertex,
other edges as specified below, and nothing else.
o+ There is a vertex in Vj with a single output connected to the input of the
component corresponding to ?i
o+ For every i in the range 1 < ? < n the output of the component corresponding
to Sj is connected to the input of the component corresponding to Si+1.
51
o+ The output of the component corresponding to 5n+1 is connected to the single
vertex input of the vertex in V0.
It should be clear there is a one-to-one correspondence between strings on the
alphabet ? and graphs as described above. Sets of such graphs correspond to
languages over ?.
Theorem 2.10 If 5 is a set of graphs that correspond to strings, and ifS has no
two elements isomorphic to each other, then 5 is executable.
Proof: We proceed by induction on the length n of the longest string corresponding
to a graph in 5. In the base case n 0: since two graphs corresponding to the
empty string would be isomorphic to each other, 5 must be a singleton and hence
has a decision tree. For n > 0, note that every graph G in 5 has a matching
collection of Pj'5 connected directly to Vj. So there is a partial decision tree that
partitions on each Pj in turn (any order will do). Since each x has a unique truth
assignment to the Pj'5, the partitions at the bottom of this partial tree are sets
S? whose graphs agree that the first component is an x for each x ? ?+. There
is a decision tree for S? since, as noted before, it can contain at most one graph.
There is also a decision tree for each S? with x ? ?, since we can omit the identical
first component from each graph, get a decision tree for that set (which must exist
by the inductive hypothesis), and use that tree for S?. Since there is a partial
decision tree for 5 and a decision tree for each leaf of that partial tree, there is a
full decision tree for 5. ?
Theorem 2.11 Suppose a string language M corresponds to a set of trace graphs
M is context free if and only if fi is generated by a trace grammar.
52
Pwof: The "only if" direction is the easier of the two: given a CFG Q with
nonterminal set N, terminal set T, productions P and start symbol S, construct a
trace grammar Q' = (N', T', P', S') as follows. Extend the correspondence between
strings and graphs to include strings of terminals and nonterminals: a nonterminal
in a string corresponds to a nonterminal function in the graph. N' contains a
nonterminal function fV for every V ? N. T' contains a terminal function fX
for every x ? T. Now for every production (V H .... . sn) E P, P' contains a
production (fV G) where G is the graph corresponding to the string .1... 8n
Finally, start graph S' is simply the graph corresponding to the string S (so it has a
nonterminal function fS followed by the component for w). By construction, if G'
is a graph corresponding to a string g, G' ??` H' if and only if H' corresponds to
a string h for which g ?? h. Since the start graph 5' corresponds to the stringS,
it follows that 5' G' if and only if 5 ??* g, where g is the string corresponding
to G'. Hence, the set generated by Q' corresponds to the language generated by
The other direction is more difficult, since there is no guarantee that a given
trace grammar is in a nice linear form. All we know is that every graph in the
generated set corresponds to a string; the intermediate forms in the derivation need
not. We will assume that the given trace grammar has no useless nonterminals
(that is, that all nonterminals can be expanded into terminal graphs). This is
without loss of generality since a trace grammar with useless nonterminals can
easily be converted into one without them, by essentially the same method that
removes useless nonterminals from context-free grammars [HU79].
So suppose Q' = (N', T', P', 5') is a trace grammar that generates a set of trace
graphs corresponding to strings, and suppose N' contains no useless nonterminals.
Define a CFG Q = (N, T, P, 5) as follows. N contains a nonterminal symbol ?`
53
for each nonterminal function ? ? N' and each input i to ? and each output j
from ?. T contains a terminal symbol x for each fX e T'. Now a path in a trace
graph corresponds to a string over T u N as follows: each terminal function fX
corresponds to an x in the string, each nonterminal function 0 entered in the path
at input i and exited at output j corresponds to an 0,' in the string. For each
production (o H ? P', and for each input i to 0 and output j from o, and for
each path q in D from input i to output j, P contains a production mapping 0,? to
the string corresponding to the path q. P also contains productions for the start
symbol 5: one for each path from the input to the output in the start graph 5'.
(If 5' does not have a single input and a single output then no generated set could
correspond to a string, so the generated set must be empty and corresponding
language context free. We can therefore assume that 5' has a single input and a
single output.)
Now 5 ?+? ? if and only if 5' ??*1 G?, where G' contains a path from input
to output corresponding to 9. There is a simple proof of this by induction on the
length of the derivation, which we will skip.
It now remains to be shown that the language of Q corresponds to the set
generated by Q'. Suppose a string 9 is in the language of Q. Then, as shown
above, Q' derives a graph G' with a path of terminal functions from input to
output corresponding to 9. We cannot assume that G' is fully terminal, but since
Qt contains no useless symbols, a terminal graph H' can be derived from &`
This H' is in the set generated by Q', so by assumption it must correspond to
a string; and since it has the same path G had from input to output, it must
in fact correspond to 8. In the other direction, suppose C' is one of the (fully
terminal) graphs generated by Q'. Since C' corresponds to a string, it has exactly
54
one path from input to output; and as shown above, Q derives the terminal string
corresponding to that path. ?
Now we can answer the question of whether every executable set is generated
by some trace grammar. Every string language L corresponds to some set of
trace grammars L' in which no two elements are isomorphic to each other. By
Theorem 2.10, L' is executable. But by Theorem 2.11, L' has a trace grammar
only if L is context free. So not every executable set has a trace grammar: if L
is not a context-free language, the corresponding set L' is executable but has no
trace grammar.
The relation between trace grammars and CFG's also gives a reduction which
can be used to show that several questions about trace grammars are undecidable.
Among these:
o+ Are the sets generated by two trace grammars disjoint?
0 Do two trace grammars generate isomorphic sets?
o+ Is the set generated by one trace grammar a subset of the set generated by
another?
o+ Is there a trace grammar for the intersection of the sets generated by two
trace grammars?
The reader should observe that the undecidability of these questions does not follow
immediately from Rice's Theorem. There is a difference between these questions,
which have to do with the recursive structure of a program, and questions involving
the actual behavior of programs on inputs. For example, it is decidable whether
the set generated by a trace grammar is empty, and if it is empty the program does
55
not halt. But it does not follow that we can decide whether the program halts,
because it may be the case that the generated set is not empty, but none of the
trace graphs in it has an execution on any input.
Consider how a trace grammar expresses nontermination. Suppose a program
allows infinitely many different paths of execution. It must be possible for such a
program to diverge (by K6nig's lemma). The corresponding trace grammar gener-
ates an infinite set of trace graphs, each of which is a finite structure representing
a terminating execution history. No infinite graph is generated by the grammar;
there is no trace graph with an execution for those inputs on which the program
diverges. Suppose a program allows only finitely many paths of execution yet still
diverges on some input. This is possible if one of those "paths of execution" is
a branchless infinite loop. The corresponding trace grammar supports an infinite
chain of derivations, but that chain never produces a fully nonterminal trace graph;
once again, no infinite graph is generated, and there is no trace graph for those in-
puts on which the program diverges. A trace grammar for a program that diverges
on an input generates an executable set containing no graph with an execution for
that input an executable set with a decision tree that is not a full binary tree.
2.4.3 Thinning and trace grammars
The E and ? relations extend to graph grammars in the obvious way: one grammar
is thinner than another when the language it generates is thinner.
Definition 2.20 The relations E, E, Ec and Ec hold for trace grammars if and
only if they hold for the sets generated by those grammars.
This definition is not very satisfying, since it refers again to those unwieldy infinite
sets. It is not an effective definition, since it cannot be used to derive thinner
56
grammars or even to test whether one grammar is thinner than another. It would
be so much more practical to have an effective characterization of E and Ec for
grammars, in terms of the grammars themselves. Is such a thing possible?
An effective characterization of Ec for grammars is clearly not possible. Theo-
rem 2.11 shows that the ? relation is undecidable even for individual trace graphs,
and with trace grammars there are, in addition, all the usual undecidabilities of
fully expressive formalisms for computation. For example, a grammar generating
the empty set is conservatively thinner than a grammar generating a set s if and
only if dom(S) = ?, so an effective characterization of ? would decide the halting
problem.
What about E? It is a relatively simple relation on graph structures and makes
no reference to executions. One potentially useful characterization would be as a
class of transformations to trace grammars: some way to transform a grammar
into any thinner grammar. Some idea of the kind of transformational power re-
quired for such a characterization can be gained by examining the consequences of
Theorems 2.7 and 2.11. For any alphabet ? the language ?? is context free, and
so the corresponding executable set has a linear trace grammar Q--Hquite a simple
one at that. Now any context-free language M over ? is a subset of ?*; and so the
executable set corresponding to M is a subset of L(Q). But any subset of a set of
trace graphs is a thinning! This means that the class of thinning "transformations"
of trace grammars has at least enough power to generate any grammar for a set of
graphs that correspond to strings from a grammar for the set that corresponds to
?? which is to say, at least enough power to generate any context-free grammar
out of practically nothing.
Is the E relation on trace grammars decidable? I do not know. However,
there is an interesting corresponding question about CFG's. Define L(C) to be the
57
language of a CFG C, and define Omit(L) to be the language of strings that can
be formed by dropping 0 or more characters in any positions from any string in
language L. Is it decidable whether L(C) c Omit(L(C'))? Using the construction
of Theorem 2.11 this question can be reduced to the question of whether the trace
grammar corresponding to C is thinner than the trace grammar corresponding to
c1, so if it were undecidable, the E relation for trace grammars would also be
undecidable. Surprisingly, it is decidable: Omit(L(C')) is a regular language.5 So
perhaps the E relation is decidable too.
2.4.4 The PLC for trace grammars
We can express the PLC for executable trace grammars in terms of the sets they
generate.
Definition 2.21 (PLC Trace Grammars) For an executable trace grammar
Q there should be no executable Q'EcQ.
A program is a trace grammar that generates a possibly infinite executable set of
trace graphs. The program violates the PLC if and only if there is another program
whose trace grammar generates a conservatively thinner executable set.
2.5 Applying the formal principle
Now we have a formal statement of the PLC. How can can it be embodied in an
automatic optimizer? We can't just search for conservatively thinner grammars!
One observation is that if a grammar violates the PLC for trace grammars, it
5A neat proof that Omit(L(C')) is regular was suggested by Juris Hartmanis. The key step
is to express L(c') as h(LD fl R) for a homomorphism h, a Dyck language LD and a regular
language R, which is possible for any CFG by a result of Chomsky [Cho62].
58
generates some graph that violates the PLC for trace graphs. (That is, if a program
violates the PLC then there is some path through the program that makes an
unnecessary computation.)
Thinner makes use of this. It first searches a space of (not necessarily terminal)
graphs generated in the grammar, looking for violations of the PLC. (Chapter 4
deals with the problem of analyzing individual trace graphs.) If it finds such
a graph, Thinner uses it as a "thinning example": it tries to reformulate the
program so that it never makes the mistake illustrated in the example. (Chapter 5
treats this aspect of the problem.) This step involves reorganizing the grammar so
that thinnable graphs occur explicitly in the right-hand sides of productions. The
actual thinning step is then performed on the set of right-hand sides for a given
nonterminal. This is done in such a way that the set remains executable (and the
grammar remains normal).
There's a theoretical impediment to all this, though: graphs in the right-hand
sides of productions in a grammar may contain nonterminals, but we only defined
the ? relation for fully terminal graphs. It could be defined with reference to
expansions of the nonterminals, but this leads back to infinite sets. To localize
the problem we must treat nonterminals more like terminals, without investigating
how they expand.
The solution is to test ? relative to a mapping from nonterminals to partial
functions. If H and G are trace graphs containing nonterminals and ? is a function
mapping each nonterminal to a function, and if that mapping gives H?G we'll
say that H?G relative to ?
This raises an important point: there are two approaches to thinning the right-
hand sides of productions. Strictly speaking, one should assume nothing about
?: any nonterminal can be any partial function. Conservatively thinning a graph
59
containing partial functions is tricky because partial functions limit the class of
inputs for which the graph has executions: remove one in a thinning and you
may be adding executions, which would be nonconservative. Since the domain of
the arbitrary function associated with a nonterminal is unknown, a conservative
thinning must retain at least one computation of each nonterminal function on
each distinct input value. Treating nonterminal sets of trace graphs in this way is
a safe approximation to full conservative thinning: it misses some cases but never
makes a false step.
The other approach is to treat each nonterminal as an arbitrary, deterministic,
total function. This is the approach Thinner takes it just doesn't make sense to
waste any computational effort on ensuring that the set of inputs on which the
program diverges doesn't shrink.
2.5.1 Weak conservative thinning
To talk about this more formally, let's define weak conservative thinning: thinning
that is almost conservative, except that it may enlarge the domain. In Defini-
tions 2.7 and 2.14 we insisted that conservatively thinner graphs, sets and gram-
mars have the same domains as their thicker counterparts. By omitting the third
condition from these definitions, we can define a relation Ecweak which allows thin-
ner graphs, sets and grammars to have larger domains.
Definition 2.22 Suppose G = (V, E) and G' = (V', E') are terminal trace graphs.
G'?_eakG if and only if:
1. G'EG. (This defines an injective function h from vertex inputs and outputs
of V' into corresponding vertex inputs and outputs of V.)
2. For every execution f of G there is an execution f' of G' such that f' = f oh.
60
Suppose 5 and 5' are sets of trace graphs. S'--H?we?? if and only if:
1. S'ES.
2. For every execution f of some G = (V E) ? 5 there is an execution f' of
some = (V', E') E 5' such that G'EG (this defines an injective function
h from vertex inputs and outputs of V' into corresponding vertex inputs and
outputs of V) and f' = f oh.
2.5.2 Strong conservative thinning of right-hand sides
Let's look at strong conservative thinning first. We want to be able to gather the
right-hand sides of a nonterminal into a set (which will be executable, since the
grammar is normal), thin that set conservatively, and conclude that we have a
conservatively thinner grammar. This seems reasonable, and it's correct; but it's
not simple to prove.
Here we go. In the following discussion, let Q = (N, T, P, 5) be a normal trace
grammar. For some nonterminal c C N let A be the set ?D (? D) ? P?, and
suppose there is some executable set A' such that A'LcA relative to every mapping
? from nonterminals to partial functions. Let Q' be a trace grammar formed from
Q by removing the old productions for ? and adding a new production (OL H
for every D' E A'. Because Q is normal each nonterminal corresponds to a partial
function, obtained as the executable set generated from that nonterminal under Q.
Define Fi to be this mapping from nonterminals to their actual functions under
Lemma 2.12 Suppose we have a graph G' and set R, possibly containing nonter-
minals, for which fG'1?R relative to F1, and suppose G' derives some terminal
graph I' in Q'. Then R derives some terminal set T in Q for which ?I'i?T
61
Proof: by induction on the number n of steps in the derivation of I'. In the base
case n = 0, so G' is terminal and I' = G'. R need not be terminal since its graphs
are thicker than G, but let 5 be the set of terminal graphs generated from R under
Q. Since ?I'1EcR relative to ?i, ?f J?EcS relative to F1; and since all the graphs
are fully terminal, ?flJ?S.
In the inductive case n > 0: let H' be the result of the first step in the derivation
of I', so that G' ??1 H' ??*, I'. If the first step is the expansion of some
nonterminal p $ ?, let 5 be the set that results from applying that same step
to every member of R. (All members of R are thicker than G, so they all have
a corresponding P to expand.) Since fG'JLcR relative to ?i, and since p has
been replaced on both sides with a subgraph whose function is the same as
(though perhaps on a restricted domain), fH'JEcS relative to ?i.
If that first step is the application of some production (? H D') for D' E A', we
know there is some set AD' c A such that fD'lEcADI relative to any mapping, and
there is a production (? D) for every D ? AD'. Apply each such production to
the corresponding ? in each member of R and collect the results in a set 5. Since
fG'??R relative to ?i and since tD'1??AD? relative to any mapping, fH')?5
relative to ?i.
In both cases we have a set 5 such that ?H'J??5 relative to ?i, and H' ??*, I'
in n --H 1 steps. So by the inductive hypothesis 5 derives some terminal set T in Q
for which fY?EcT. LJ
Lemma 2.13 Suppose we have two graphs G and C', possibly containing nonter-
minals, with G'ECweakG relative to F1; and suppose C I for some terminal
graph I with at least one execution. Then C' ??*, I? for some I' ECweak1
62
Pwof: by induction on the number n of steps in the derivation of I. In the base
case n = 0, so G is terminal and 1 = G. c' must also be terminal. Since ?????????
relative to ?i, and G' and G are fully terminal G'?weakG.
In the inductive case n > 0: let H be the result of the first step in the derivation
of some 1 with at least one execution, so that G ?? H 1. If the first step is
the expansion of some vertex not corresponding to a vertex in G' then let H' =
since H'??weak& relative to F1, H'L?cwe?H relative to ?i.
If the first step is the expansion of some vertex corresponding to a vertex in G', it
may or may not be the nonterminal ?. If it isn't, we can expand the corresponding
vertex in G' the same way, and get an H' for which H'EcweakH relative to ?i.
If the first step is an expansion of ? using some production a D, note that
since 1 has at least one execution, D must have an execution relative to some F.
It follows that D could not have been discarded in the thinning of the grammar:
there must be some production a D' in Q' for which D'EcweakD relative to any
?. We can use that production on G' to get an H' for which H'?weakH relative
to ?i.
In all cases we have a graph H' ?weakH relative to F1, and H derives some
terminal graph 1 with at least one execution, in n --H 1 steps. So, by the inductive
hypothesis, H' I' for some I' Ecweak1 El
Theorem 2.14 Q'LcQ.
Proof: For the start graph S we have ?s1E?c?s1 relative to ?i, so by Lemma 2.12
we can conclude that for any C' generated by Q' there is a set T generated by
Q for which IctiEcT. By Lemma 2.13 we can conclude that for any G generated
by Q with at least one execution, there is a C' generated by Q' with G'?weakG.
63
These two conditions imply Q'EcQ: every graph generated by Q is thinner than
or isomorphic to some graph generated by Q', and the two generated sets have
matching executions. ?
2.5.3 Weak conservative thinning of right-hand sides
For weak conservative thinning (which is Thinner's approach) we want to be able
to gather the right-hand sides of a nonterminal into a set (which will be executable,
since the grammar is normal), thin that set using weak conservative thinning, and
conclude that we have a weak-conservatively thinner grammar.
In the following discussion, let Q = (N, T, P, 5) be a normal trace grammar.
For some nonterminal ? ? N let A be the set fD (Q? H D) ? PJ, and suppose
there is some executable set A' such that A'LcA relative to every mapping F from
nonterminals to total functions. Let Q' be a trace grammar formed from Q by
removing the old productions for a and adding a new production (a D') for
every D' ? A'. Because Q is normal each nonterminal corresponds to a partial
function, obtained as the executable set generated from that nonterminal under Q.
Define ?i to be this mapping from nonterminals to their actual functions under
Lemma 2.15 Suppose we have a graph G' and set R, possibly containing nonter-
minals, for which fG'YECweakR relative to ?i, and suppose C' derives some terminal
graph I' in Q'. Then R derives some terminal set T in Q for which fI'Y?weakT.
Proof: similar to the proof of Lemma 2.12, but making use of the observation
that if we have two sets R'EcR relative to any mapping ? to total functions, then
R'?wea?R relative to any mapping ? to partial functions. E]
64
Lemma 2.16 Suppose we have two graphs G and G', possibly containing nonter-
minals, with C'ECweakC relative to F1; and suppose G I for some terminal
graph I with at least one execution. Then G' I' for some I'?weak1
Proof: This is the same as Lemma 2.13, and the same proof goes through ill this
case. ?
Theorem 2.17 Q'?weakQ
Proof: For the start graph 5 we have fS??we?tS1 relative to ?i, so by Lemma 2.15
we can conclude that for any C' generated by Q' there is a set T generated by Q for
which tG'1E?CweakT. By Lemma 2.16 we can conclude that for any G generated by
Q with at least one execution, there is a G' generated by Q' with &`ECweakG. These
two conditions imply Q'?weakQ: every graph generated by Q is thinner than or
isomorphic to some graph generated by Q', and any execution in the set generated
by Q corresponds to some execution in the set generated by Q'. ?
Chapter 3
Thinner's Language
3.1 Overview
In the previous chapter we examined the trace grammar representation for first-
order functional programs. Thinner operates on these trace grammars, finding
and repairing violations of the PLC. But where do programs represented as trace
grammars come from? I am certainly not proposing trace grammars as a program-
ming language, so Thinner will have to start with programs written in a more
customary language, compile them into trace grammars, optimize them, and then,
if necessary, decompile them back into the source language.
The source language for Thinner is a small Lisp-like language called TL (nat-
urally, Thinner's Language). TL satisfies two important restrictions: it is a first-
order, purely functional language, and no errors can occur at runtime. I developed
TL specifically for the Thinner project. It is a strongly typed language just large
enough to express interesting first-order recursive functions over several data types.
This chapter spells out the syntax and semantics of TL.
The chapter overview presents an informal description, and later sections of
65
66
the chapter present a formal syntax and a denotational semantics. It is no great
feat to give a denotational semantics for a simple, purely functional language like
TL, and the reader may wonder why this material was not omitted, or at least
relegated to an appendix. There are two points of interest:
o+ The semantics is an unusual one, since it gives a trace graph as the deno-
tation of each expression. This clarifies the connection between TL and the
trace grammars introduced in the previous chapter. In fact, the structure
of the part of Thinner that compiles TL into trace grammars reflects the
denotational semantics.
o+ The semantics is presented in a schematic form, with two semantic functions
initially undefined. One definition for these functions yields a standard se-
mantics for TL. An alternative definition developed in Chapter 4 places the
relational constraint inference method in the framework of abstract interpre-
tation.
3.1.1 Informal description of TL
In many ways TL programs are like pure first-order Lisp programs. The only avail-
able constructs are defun, let, multiple-value-bind, values, if, and function
application.
The language is strongly typed. Every expression has a type and every function
has a tuple of input types and a tuple of output types. The type names are
integer, boolean, bitvector, and the three fiat list types (listof integer),
(listof boolean) and (listof bitvector). The bitvector type is a 32-bit
word.
The boolean constants are given by the identifiers #t and #f. Neither these nor
67
any other syntactic keyword may be used as the name of a user-defined function.
Bitvector constants use the syntax #hdddddddd, where each d is a hexadecimal
digit.
A defun may start with a declaration, using the syntax (declare (t?pename
identifier) ...), where each identifler is a parameter of the function. Thinner
performs type inference while compiling, so many declarations may be omitted.
Aside from the declaration, the body of a defun must be a single expression.
Table 3.1 shows the primitive functions of TL with their types. Several primitive
functions are overloaded: for example, equal applies to any two objects of the same
type. This overloading is resolved at compile time.
There are three typed empty list constants: the empty integer list inil, the
empty boolean list bnil, and the empty bitvector list vnil. The cdr of an empty
list is that empty list. The car of an empty list is always an element of the base
type, even when the list is empty.1
3.2 Concrete syntax
Table 3.2 defines the syntax of TL using an extended BNF. In this table, (x?*
signifies a sequence of zero or more occurrences of (x'), and (?+ signifies a sequence
of one or more. TL has the same lexical structure as Lisp: tokens are delimited by
white space or by parentheses.
`Specifically, (car inil) is 0, (car bnil) is #f, and (car `mii) is #00000000.
68
Table 3.1: Primitive functions and their types
(integer x integer) boolean
(boolean x boolean) boolean
(bitvector x bitvector) H boolean
((listof integer) x (listof integer))?boolean
((listof boolean) x (listof boolean)) H boolean
((listof bitvector) x (listof bitvector)) boolean
equal
integer			integer
abs integer H integer
<, <=, >, >= (integer x integer) boolean
and, or (boolean x boolean) boolean
(bitvector x bitvector)			bitvector
not boolean			boolean
bitvector H bitvector
cons
(integer x (listof integer)) H (listof integer)
(boolean x (listof boolean)) (listof boolean)
(bitvector x (listof bitvector))
(listof bitvector)
append
((listof integer) x (listof integer))
(listof integer)
((listof boolean) x (listof boolean))
(listof boolean)
((listof bitvector) x (listof bitvector))
(listof bitvector)
cdr
(listof
(listof
(listof
integer) H (listof integer)
boolean) (listof boolean)
bitvector) (listof bitvector)
length
(listof
(listof
(listof
integer) H integer
boolean) H integer
bitvector) H integer
car
(listof
(listof
(listof
integer)			integer
boolean)			boolean
bitvector)			bitvector
69
(pwgram?			H			(defun?+
(defun?			H			(defun (identifier? ((identifier?*)
(declare ? (expression ?)
(defun (identifler') ((identifler?*) (expression ?)
(declare ?			H			(declare (declaration `)+)
(declaration ?			(( type ? (identifler?)
(type?			(basetype? (listof (basetype?)
(basetype?			H			integer boolean bitvector
(expression?			H			(identifler? (constant? (conditional?
(multiple value bind? (let? I (values?
(terminal call? (nonterminal call')
(constant') (number') I (boolean') (bitvector') inil bnil vnil
I `((number')+) I `((boolean')+) I `((bitvector')+)
(boolean')			H			?t j
(bitvector')			H			#h(hex digit')32
(conditional')			(if (expression') (expression') (expression'))
(multiple value bind')			(multiple-value-bind ((identifler')+)
(expression') (expression'))
(let')			H			(let ((binding')*) (expression'))
(binding')			(( identifler') (expression'))
(values')			(values (expression')+)
(terminal call')			((binary terminal') (expression') (expression'))
I (( unary terminal') (expression'))
(binary terminal')			H			+ - * I <= >= I
and or cdr cons append I equal
(unary terminal')			H			- abs I not length I car
(nonterminal call')			(( identifler') (expression `)*)
Table 3.2: Extended BNF grammar for TL
70
3.3 ?race graph constructors
Recall that we defined a trace grammar as a 4-tuple (N, T, P, S), where N was a
set of "nonterminal labels" and T a set of "terminal labels". Labels of both kinds
occurred in vertices, in the graphs that made up the right-hand sides of productions
in P. The nonterminal labels also occurred as the left-hand sides of productions,
and this is how nonterminal vertices were connected with their productions.
To give a semantics for TL we must be a bit more specific about the nature of
these labels. In the semantics that follows, each vertex label will be either:
o+ a relation, for terminal vertices
o+ an identifier, for vertices corresponding to nonterminal function calls; or
o+ a pair of trace graphs (giving the right-hand sides of two productions imme-
diately), for vertices corresponding to conditionals.
To build up these trace graphs denotationally we will use several trace graph con-
structor functions. These are defined below. (In these definitions, and throughout
this chapter, boldface variables denote sequences.)
Identity(n) is an fl-input, fl-output graph that passes each input directly to the
corresponding output. Figure 3.1 gives an example.
Identity(3) =
Figure 3.1: A trace graph constructed by Identity
Constant(i, v) is a graph depending on the constant value v and on I?I (The
sequence i will be a sequence of identifiers, but only the length matters.) The
71
graph has I?I inputs, which are all unused; it has no vertices other than Vj and v0;
and it has one output, whose source is the constant v. Figure 3.2 gives an example.
Constant((x, y), 3) =			3
Figure 3.2: A trace graph constructed by Constant
Select(i, j) is a graph depending on a sequence of identifiers i and an identifier j
which must occur in i. The graph has ji inputs and one output; it has no vertices
other than Vj and V0; and the source for the output is the first input m for which
2m = ? Figure 3.3 gives an example.
Select((a, b, c), c) =
Figure 3.3: A trace graph constructed by Select
Single(m, n, 1) is an m-input, fl-output graph with a single m-input, n-output
vertex v with label(v) = 1. The inputs of the vertex are the graph inputs in order,
and the outputs of the vertex are the graph outputs in order. Figure 3.4 gives an
example.
Cat(Gi,... , Gn) combines n> 1 graphs into a new graph H, whose input arity
is the sum of the input arities of the Gj's and whose output arity is the sum of
the output arities of the Gj'5. Each Ci is duplicated in H (not including the input
and output vertices), and is connected in order to the inputs and outputs of H
Figure 3.5 gives an example.
72
Single(2,1,l) =			1
Figure 3.4: A trace graph constructed by Single
Figure 3.5: A trace graph constructed by Cat
Meld(G1,... , Gn) combines n > 2 graphs into a new graph H as follows, The
first n --H 1 graphs must all have the same input arity k; this will also be the input
arity of H. Each Gj for i < n is duplicated in H (not including the input and
output vertices), with the inputs connected to the corresponding inputs of H. The
sum of the output arities of the first n --H 1 graphs is the input arity of G?; the
outputs of the first n --H 1 graphs are connected to the inputs of &? in order. The
outputs of H are the outputs of Gn. Figure 3.6 gives an example.
The six functions Identity, Constant, Select, Single, Cat and Meld are used in
the denotational semantics that follows, to construct the trace graphs which are
the denotations of expressions.
73
Figure 3.6: A trace graph constructed by Meld
3.4 Types and type inference
As noted above, TL is strongly typed. Its types are integer, boolean, bitvector,
and the three flat list types (listof integer), (listof boolean) and (listof
bitvector). All type checking is done at compile time.
We will not labor through a type semantics here. Instead, we will treat type
checking as a syntactic preprocessing step. We'll assume that a consistent type has
been determined for every expression, all overloaded operators have been disam-
biguated, and the input tuples and output tuples of all functions have been typed.
Where necessary, we'll refer to the derived type of an expression or function in the
semantics using type subscnpts.
3.5 Abstract syntax
These are the abstract syntactic categories:
C ? Con			constants (including quoted lists)
I ? Ide			identifiers
T ? Ter			terminal functions
74
E ?			Exp			expressions
D ? Def			defuns
P ?			Pro			programs
An abstract syntax for TL is as follows. (Remember that we're using bold type for
sequences.)
E H ? II' I (T E) 1(1' E)
(if E E' E")
(values E)
(let ((r1 E'1) ... (?? E1?)) E)
I (multiple-value-bind (I') E E')
D H (defun I' (I) E)
PHD
3.6 Semantic sets
It is customary to use domains for the denotations of programming constructs,
but we will use sets instead. This simplification is possible because we will not be
using recursive domain definitions or fixed points, and we will not need a bottom
element to represent non-termination.2
The denotations will be elements of these semantic sets:
v
c
values (booleans, numbers, bitvectors, and lists thereof)
relations on V
2Actually, there is a hidden recursiveness in the semantic sets: the label of a vertex in a trace
graph may be a pair of trace graphs. But since the whole structure of a trace grammar is finite,
this can easily be accomodated; and I didn't want to introduce a lot of unnecessary mathematical
machinery.
75
z
x
N
trace graphs
pairs (G,f) where f is an execution of G ? Z
nonterminals (?,C) for identifier z ? I and G ? Z
Note that the syntactic category I of identifiers persists in the denotations of non-
terminals.
The semantic functions are as follows:
Ic: Con H V
: Exp H Ide*			Z
D : Def? N
Q: Pro Exp
T: Ter H C
(Here, ?(S) denotes the set of subsets of S. We will also use ?f(S) to denote the
set of finite subsets of S.)
The semantic function K maps each constant to the corresponding value. ?
maps each expression and list of identifiers to a trace graph. The list of identifiers
will determine the number of inputs to the graph, and will guide the connections
for identifier references within the expression. D maps each function definition to
a nonterminal, which is just an identifier (the name of the function) and a trace
graph (the right-hand side of the single production for that function).
In the interactive style of Lisp-like languages, a program is a collection of func-
tion definitions that define an environment in which expressions may be evaluated.
Semantic function Q reflects this idea: it maps each program to function that
maps an expression (which is assumed to contain no free variables) to a set of
executed trace graphs. (In the standard semantics that set will either be of size
zero, if the evaluation of the expression given the definitions of the program does
76
not terminate, or of size one, if it does.)
T determines the constraint associated with each terminal function. We will
also use an auxilliary function Lang; this is the function that determines the lan-
guage of executed trace graphs generated by a grammar.
Lang: Z H ?(N)
In Section 3.8, below, we will define T and Lang in a way that yields a standard
semantics for TL. We will try alternate definitions in Chapter 4, when we place the
relational constraint method of inference in the context of abstract interpretation.
3.7 Semantic schema
The definition of the semantic function ?G mapping syntactic constants to values
is omitted.
S?C?1 = Constant(I, K(C))
= Select(I,I')
??(T??p E)?1 =Meld(??E1?1,..., ?IEIaI?I, Single(1ai, Pi, T(T)))
?[(?a??p E)?1 = Meld(C?E1?1,...,?[E1?j?I,Single(a ,Ipi,I'))
?I(i? E E'a E?11)?1=Mdd(Identity(I),?[[E?I,
Single(?I t 1, I?
(Cat(?[E'?1, Single(1, 0, (true))),
Cat(S[[E"?I, Single(1, 0, (false))))))
?[(va1ues E)??1 = Meld(?[[E1?1,.. . ,?[E&?1,Identity(jaJ))
?[(1et ((r1 E'1) ... (?? E'?)) E)?1 = Mdd(?[E'1?1,...
Identity(?Ij),
.1))
77
??(mu1tip1e-va1ue-bind (I') E E')?I = Meld(S?E?I,
Identity( III),
?IE'?(r I))
D[[(de?un I' (I) E)? =
Q[[D?E=Lang(?[[E?(),fD[Di?,... , DIDn??)
3.8 Standard semantics
Now to define the remaining semantic function T and the auxiliary function Lang.
To get a standard semantics we define T so that the meaning of each terminal func-
tion identifier is the relation normally associated with the corresponding function
on values:
4+? =?Vo, V1, V2) I V2=Vo+?V?1
T?nOt?ooiean??ooiean? = ?(true, false), (false, true)J
and so forth. We define Lang to be the function that generates the set of fully
terminal trace graphs from a trace grammar in the usual way:
Lang(G, 5) = t(H, f) H is a fully terminal trace graph derivable from
start graph G in grammar 5, and f is an
execution of Hi
3.9 Example
For example, consider the program (defun q (a b) (If (< a 0) (+ a b) a)).
The simplest expressions in the program are the identifiers a and b and the constant
78
0. These expressions have the following denotations, constructed by Select and
Constant:
=			???(a,b) =			St[O?(a,b) = 0
From these simple denotations, the meanings of the compound expressions (+
a b) and (< a 0) are constructed:
a b)?(a,b) = T(t)			??(< a 0)l(a,b) =
Now we can construct the denotation of (if (< a 0) (+ a b) a). This makes
use of a nonterminal vertex whose label is a pair of graphs.
?[(if (< a 0) (+ a b) a)?(a,b) =
(Ct
Cf =
LetCbethisfinalgraph,C=&?(if (< a 0) (+ a b) a)?(a,b). Then
D?(defun q (a b) (if (< a 0) (+ a b) a))? =
which is the pair giving C as the only production for q; and finally
Q?(defun q (a b) (if (< a 0) (+ a b) a))?=AE.Lang(??E?(),?(q,C)J)
Chapter 4
Finding Violations
4.1 Overview
How can violations of the PLC be identified? A trace grammar violates the PLC
if there is a conservatively thinner grammar, and it has already been observed
that even for individual trace graphs the question of whether a given thinning is
conservative or not is undecidable. It follows that there is no way to automate an
optimizing step that says, "Now find all violations of the PLC in the grammar."
At first glance this may seem like a major obstacle, but in fact undecidability is
the least of our worries. In early prototypes of Thinner I experimented with several
different "fast" inference techniques: a Prolog-style backward-chaining inference
engine and a term-rewriting system, among others. It quickly became clear that
the major difficulty in finding violations of the PLC is not undecidability but
intractability. Undecidability only says that no algorithm can correctly detect all
violations of the PLC. One can live with this kind of limitation in an optimizer;
but I found many seemingly obvious violations of the PLC that no method I tried
would detect reasonably quickly.
79
80
My solution to this problem was to build a semantic analyzer that combines
traditional techniques, like the computation of normal forms of expressions, with
a new method of inference called retational const?in1 [Web92bJ. This semantic
analyzer is the subject of this chapter.
Research in automated deduction has usually focused on generality and ade-
quacy rather than efficiency [Bib91]. Compared with the state of the art in auto-
mated deduction, the relational constraint method is not very powerful. Its chief
virtue is that it develops a collection of useful inferences about a trace graph of
size n in 0(n2) time. So it strikes a balance between power and efficiency--Hthe
right balance for a semantic analyzer like Thinner's.
4.1.1 The semantic analyzer
Another part of Thinner, described in Chapter 6, inspects the trace grammar being
optimized and proposes individual trace graphs for semantic analysis. It is the job
of the semantic analyzer to identify as many conservative thinnings of that graph
as it can, as quickly as it can. This means finding equivalences: two vertex outputs
that are assigned equal values in every execution, or an individual vertex output
that is assigned the same (constant) value in every execution.
To do this the semantic analyzer makes use of two complementary techniques:
the relational constraint method, and the computation and comparison of normal
forms of expressions. These two methods complement each other well. Each finds
equivalences that the other misses.
Relational constraint is described in Section 4.2. It is a fully automatic inference
technique with a forward-chaining flavor: it doesn't prove or disprove queries, it
reports observations. It starts with a fixed, finite vocabulary of binary relations;
for each data type this vocabulary forms a complete lattice. Relational constraint
81
focuses on strengthening relations in this lattice, developing the strongest correct
statement it can about the relation between any two objects in its universe.
The output of the relational constraint aigorithm is a binary relation for every
pair of "objects" of the same type. These objects are:
o+ all vertex outputs and constants from the graph; and
o+ the integer constants 0 and 1, the boolean constants #t and #f, the bitvector
constant OOOOOOOOxO, and the three typed empty lists whether they occur
in the graph or not.
The vocabulary of binary relations for a given data type is fixed. It forms a com-
plete lattice in which the greatest lower bound of a set of relations yields a relation
which is their conjunction, and the least upper bound of a set of relations yields
a relation which is their disjunction. Relations are strengthened monotonically
during inference: they start out at the top of the lattice (the relation which is true
of all pairs) and may grow stronger during inference, but are never weakened or
retracted.1 The vocabulary of binay relations is described in detail in Section 4.2.1.
The leverage for strengthening relations comes from constraints on tuples of
relations. A constraint (P, S) gives a tuple P of pairs of objects and a specification
S of allowable tuples of relations on those pairs. Constraints come in four flavors:
Functional constraints. Each operator in a program restricts the relations
among its input and output values. In some cases (like the absolute value
function) this restriction is a simple input-output relation which can be as-
serted immediately. In other cases (like subtraction) the relations among the
1Irom a certain point of view the lattice of relations appears upside-down, since it is natural
to think that information increases as you move up in a lattice. Not so in this case: information
increases downward. The reason for this will become more clear in the next section, when we
observe that these relations are built from sets in such a way that the disjunction of relations
(v), the union of the sets (u) and least upper bound in the lattice (H) coincide.
82
input and output values, and relations with certain constants, are restricted
in a more complex way. For example, if we have three objects a, b and c with
a = (--H b e), and if b = c, a functional constraint excludes the possibility
that a ? 0.
Transitivity constraints. For any three objects a, b and c of the same type, the
relations among a, b and c are restricted by transitivity. For example, given
that a < b and b < c, the transitivity constraint for a, b and c excludes the
possibility that a > c. (Thinner limits its use of transitivity constraints to
certain interesting triples.)
List property constraints. List property constraints ensure that for each pair
of list objects (a, b) the relation between a and b, the relation between car(a)
and car(b), and the relation between length(a) and length(b), are as strong
as possible taken jointly. For example, if we have two list objects a and b and
two base-type objects c and d, and if we know that c = car(a), d = car(b), and
a = b, then the list property constraint for a and b excludes the possibility
that c ?
Transparency constraints. For each pair of calls to the same function, a trans-
parency constraint insists that the relations reflect referential transparency.
For example, if the program contains both a = f(x) and b = f(y), and if
x = y, a transparency constraint excludes the possibility that a ?
The nature of the specification of allowable tuples S in a constraint (P, S) and the
method by which 5 is used to strengthen the relations on pairs in P are described
in Section 4.2.2. The full relational constraint algorithm is presented and analyzed
in Section 4.2.3.
83
The other part of Thinner's semantic analysis is the computation and compar-
ison of normal forms. A normal form for the expression computing each object
is recorded in a hash table, and when two normal forms are found to be identi-
cal, the objects must be equal. This is a standard technique and doesn't require
much explanation; for completeness, Section 4.3 outlines the normal forms used by
Thinner.
4.2 Relational constraint
4.2.1 The vocabulary of binary relations
When other parts of Thinner propose a trace graph for semantic analysis, the
analyzer's job is to find equivalences on objects in that graph. In fact, the relational
constraint method finds a binary relation for every pair of objects in the graph;
some of these may imply equivalence, but even those that do not provide leverage
for relevant inference.
We begin by defining a small set xt of prnmit?ve binary relations for each data
type 1. The primitive relations for a type are exhaustive and mutually exclusive,
so that for any constants a and b of type t, there is exactly one r ? xt for which
arb. Further, every primitive relation has a reversed version, 50 that if q ? xt,
there is some r C xt such that arb if and only if bqa. (q may be symmetric, in
which case r and q will be identical.)
The primitive relations are used to build a lattice of expressible relations, which
makes up the semantic analyzer's vocabulary. We'll denote an expressible relation
by indicating a set R of primitive relations: we define xRy as VrER x?y. So an ex-
pressible relation is just a set of primitive relations, one of which must be satisfied
84
each subset of the set of primitive relations gives a different expressible relation.
Because the primitive relations are exhaustive and mutually exclusive, this
construction yields a lattice of expressible relations in which the disjunction of
two relations Q and R is the least upper bound (Q u R) and their conjunction is
the greatest lower bound (Q A R). The set xt of all primitives for type t is the
expressible relation which is true of all pairs of objects of type t, and the empty
set is the expressible relation which is never true; we'll denote these relations T
and I, respectively. For any expressible relation Q there is a relation R with the
property that xQy if and only if ??(xRy); R is simply xt \ Q. Finally, for any
expressible relation Q there is an expressible relation R with the property that
xQy if and only if yRx; R can be constructed by simply reversing every primitive
in Q. (Again, Q may be symmetric, in which case Q and R will be identical.)
By construction, this vocabulary of expressible relations has the following prop-
erties:
o+ If two relations are expressible, their conjunction and disjunction are express-
ible.
o+ If a relation is expressible, so is its negation.
o+ If a relation is expressible, the same relation with arguments reversed is also
expressible.
There is an obvious bit-coding of the expressible relations using one bit for each
primitive relation in xt. Using this representation disjunction can be computed
using logical-or, conjunction using logical-and, and negation using logical-not.
The method of relational constraint will work for any vocabulary with the
lattice properties described above. The remainder of this section describes the
vocabulary used by Thinner.
85
Integers
Thinner's primitive relations on integers a and b describe the sign of a: a ? 0,
a = 0, or a > 0; the sign of b: b < 0, b = 0, or b > 0; and the difference of the
magnitudes Ia --H IbI: lal --H jbj < --H2, a --H Ibi = --H1, Ia --H Ibi = 0, lal --H Ibi = 1, or
lal --H IbI > 2. Not all combinations of these properties are consistent for example,
if both a and b are 0 the difference of their magnitudes can only be 0. But 29 of
the combinations are; and these 29 make up the exhaustive set Xint of mutually
exclusive relations on integers. Table 4.1 shows the notation we'll use for these
primitive relations.
Difference of magnitudes, lal --H Ibi:
?--H2			--H1			=0			=1			>2
a < 0, b ? 0			--<2+			--<1			--=0			-->1			-->2+
a < 0, b = 0			-0>1			-0>2+
a ? 0, b> 0			-+<2+			-+<1			-+=0			-+>1			-+>2+
a = 0, b ? 0			0-<2+			0-<1
a = 0, b = 0			00=0
a = 0, b> 0			0+<2+			0+<1
a> 0, b ? 0			+-<2+			+-<1			+-=0			+->1			+->2+
a> 0, b = 0			+0>1			+0>2+
a> 0, b> 0			++<2+			++<1			++=0			++>1			++>2+
Table 4.1: Notation for the primitive relations in xint
There are 229 subsets of xint, giving the same number of expressible relations on
integers. A variety of well-known relations on integers are among the expressible;
Table 4.2 gives some examples.
86
xRy, for R = Description
f--=0, 00=0, ++=0? x = y
x < y
x = succ(y)
f--=0,-+=0,00=0,+-=0,++=01 Ixi = I?I
Booleans
Table 4.2: Examples of common expressible relations on integers
There are four possible combinations of two booleans x and y. This yields a set
xbool of four primitive relations, which are denoted ff, ft, tf, and tt, with
xtftly if and only if x = false and y = true, and so on.
There are sixteen subsets of the XbooI, giving the same number of expressible
relations on booleans. These are described in Table 4.3.
xR?, for R = I Description
?J false for all x and y
x = false and y = false
titi x = false and y = true
ttfl x = true and y = false
ftt? x = true and ? = true
?ff,ftJ x=false
?ff,tfJ ?=false
lff,ttl			x=?
?ft,tf1			x=y
tft,ttl			y=true
ftf,ttj			x=true
fft,tf,ttl			?HX
fff,tf,ttj			y?x
fff,ft,ttj			XH?
fff,ft,tfl XHy
?ff,ft,tf,tt1 true for all x andy
Table 4,3: The expressible relations on booleans
87
Bitvectors
There are 15 nonempty sets of ordered pairs over the universe ?0, 1?. Each primitive
relation r on bitvectors can be described by one of these sets: if A is such a set, the
corresponding primitive relation r? has xr?y if and only if for every bit position,
the ordered pair of bits drawn from x and y at that position is in A, and every
pair in A occurs as the pair of bits drawn at some position. In other words:
XTAY = nE)o..3l](Xfl?Yfl) E A
= (xn,?n)
The set Xbitv of primitive relations on bitvectors is the set of the fifteen such
relations rA. These primitive relations are exhaustive and mutually exclusive, as
required, and a primitive relation rA can be reversed by reversing all the ordered
pairs in A. We'll use the notation shown in Table 4.4 for these primitive relations,
but remember that these relations are mutually exclusive. The relation v= is like
equality, but by definition it excludes vOO (where both are OOOOOOOOxO) and vii
(where both are FFFFFFFFxO).
There are 215 subsets of Xbitv. giving the same number of expressible relations
on bitvectors. Examples of expressible relations on bitvectors are given in Table 4.5.
Lists
There are actually three list types, but the primitive and expressible relations are
the same for all three. These relations should be considered as applying only to
lists of matching type.
Thinner's primitive relations on lists a and b describe whether the shorter is a
suffix of the longer; whether the two lists have the same car; and the difference of
the lengths a --H Ibi: a --H Ibi < --H2, a --H Ibi = --H1, a --H Ibi = 0, Jal --H Ibi = 1, or
88
Notation ar?b, for A =
v00 f(0,0)?
vOl t(0,1)i
v10 ?(1,0)J
vii f(1, 1)1
vOx ?(0, 0), (0, 1)1
vx0 f(0,0),(1,0)?
v= f(0,0),(1,1)J
v=--H t(0,1),(1,0)i
vxi f(0,1),(1,1)i
vix ?(1,0),(1,1)?
v>--H f(0, 1), (1,0), (1, 1)J
v> f(0, 0), (1,0), (1,1)1
v< i(0, 0), (0, 1), (1, 1)1
v<--H ?(0, 0), (0,1), (1, 0)J
vzx ?(0, 0), (0, 1), (1,0), (1, 1)i
Table 4.4: The set Xbitv of primitive relations on bitvectors
xRy, for R = Description
?v00J x = y = 0
fv00,vii,v=J x=y
fv0i,vi0,v=-? x=y
fvii,v=J x=y#0
?v00, vOl, vii, vOx, v=, vxi, v<? x V y = y
Table 4.5: Some common expressible relations on bitvectors
89
a --H Ibi > 2. Not all combinations of these properties are consistent: if a and b
have the same length and are suffixes of each other, they must have the same car,
But 19 of the combinations are; and these 19 make up the exhaustive set Xlist
of mutually exclusive relations 011 lists. Table 4.6 shows the notation we'll use for
these primitive relations.
IaI--HIbI<?--H2 IaI--HIbI=--H1 IaI=IbI Ia--H bI=1 iaI--HIbI>2
suffix			s<2+			s<1			s>1			s>2+
same car			c<2+			c<1			c=O			c>1			c>2+
both			sc<2+			sc<1			sc=O			sc>1			sc>2+
neither			<2+			<1			=0			>1			>2+
Table 4.6: The primitive relations in xlist
There are 219 subsets of xlist, giving the same number of expressible relations
on lists. Examples are given in Table 4.7, below.
xRy, for R = Description
?s<1, sc<1? x = cdr(y) (y not empty)
?sc=o,c=o,=o1 IxI = I?I
xlist\?sc=oJ x#y
Table 4.7: Some common expressible relations on lists
4.2.2 Finding strong relations
In the previous section we established a vocabulary of expressible relations, giving
a complete lattice of relations on each type. Of course, every pair of identically
typed objects in a trace graph satisfies the top relation for that type, but our goal
is to find the strongest relation we can, as quickly as possible.
90
The table of relations
The relational constraint algoritlim uses a table of relations giving the strongest
relation currently known for each pair of objects. This table is always relationally
symmetric: the relation in the table for (a, b) is always the reverse of the relation
for (b,a).
Most entries in the table of relations are initially the top relation for their type,
indicating that Thinner knows nothing about the relation for that pair. But there
are several exceptions:
o+ The relation of an object with itself is initially the equality relation. Note that
the equality relation need not be maximally strong: for example, the equal-
ity relation on integers is f--=O, 00=0, ++=0?, but if the semantic analyzer
discovers that the object is in fact positive, this relation can be strengthened
to t++=0y.
o+ The relation of an object with a constant can be initialized to something
stronger than the top relation. For example, the relation between a list object
a and the empty list can be initialized to ?s>2+, sc>2+, s>1, sc>1, sc=01.
o+ The relation between two constants can be computed immediately. Exactly
one primitive relation applies to any ordered pair of constants.
o+ Some pairs have an initial relation established by a "single-relation operator"
as described below.
Single-relation operators
Some of the operators in TL establish a relation on a single pair of objects. For
example, suppose a trace graph includes the computation a = (- b). This estab-
91
lishes a relation between a and b that is expressible in the language of relations,
so all we need do is strengthen the original relation between a and b, T, to the
negation relation, ?+-=O, -+=O, 00=01. Table 4.8 shows the other TL operators that
can be treated this way.
Operation Relation on (a, 6y
a = ?- b)
a = (abs b)
a = (not b), a and b boolean
a = (not b), a and b bitvector
a = (cons c
?+-=0, -+=O, 00=0?
f++=0, -+=o, 00=01
?ft, tfj
fvOl, vlO, v=-1
fsc>1, 5>11
Table 4.8: Operators that can be captured in a single relation
In addition, vertices in VT strengthen the relation between the input object and
itself from ?tt,?:`1 to fttl, and vertices in VF strengthen the relation to f??1 So
Thinner treats these vertices as single-relation operators too.
The constraint computation
Suppose there is an addition a = (+ b c) in the trace graph. The relations
among a, b and c can't be captured by directly strengthening entries in the ta-
ble of relations. But because c is the sum of a and b, not all of the 29? triples
of primitive integer relations for the pairs ((a, b), (b, c), (a, c)) are possible. For ex-
ample, if b and c are both positive, neither can be greater than a; so tuples like
(t++>2+1, ?++>2+1, f++<2+1) can be excluded. Of course, if the semantic analyzer
ever developed exactly those relations for a, b and c it would have a contradiction.
More significantly, if it ever finds a triple like (f++>2+J, t++>2+1, ?++<2+,++>2+1)
it can strengthen it to (?++>2+1, ?++>2+1, f++>2+1) just by eliminating the impos-
sible.
92
Definition 4.1 Let T be any k-tuple of types. A constraint (P, S) consists of a
k-tuple P of pairs of objects, in which the two objects in each pair Pj are of type T?;
and a constraint set S of k-tuples of expressible relations, with the property that
for each C ? S, each relation Ci is of type Tj.
For any k-tuple of types T consider the space of k-tuples of primitive relations
on the corresponding types:
T-space --H XTl .... x XTk
Any tuple Q of expressible relations describes a cube in T-space, for the appropriate
tuple of types T:
t(ri,...,r?)I A r??Q4
iE(1..kj
A constraint set S is a set of tuples of expressible relations: it describes the re-
gion of T-space which is the disjunction of the cubes described by each C ?
The inference procedure takes the current tuple of relations Q for the pairs in P
(remember, this describes a cube in T-space) and the constraint set S, and yields
the smallest cube R containing the conjunction of Q and S. Each relation in R is
at least as strong as the corresponding relation in Q; if it's stronger, we've gained
information.
The function Constrain that follows performs the basic constraint computation.
Given a current tuple Q of relations and a constraint set S, it returns a potentially
strengthened tuple of relations.
function Constrain(Q, S);
Q is a tuple of relations;
S is a set of tuples of relations
Constrain(Q, S) returns a tuple of relations;
93
begin
R			the tuple of bottom relations;
for each tuple C in S
begin
I			Q A C (the componentwise conjunction);
if I does not contain a bottom relation then
R R v I (the componentwise disiunction)
end;
return R
end;
Constraint sets
Now we have a method for constraining a tuple of relations given a constraint set.
The constraint set describes a space of legal tuples of primitives. Where does a
constraint set come from? Each constraint set used by Thinner is generated by
a program (and tested by another program); almost all the constraint sets are
defined before Thinner runs. Most have less than 10 elements, and the largest (the
list transitivity constraint set) only 300. (Compare this to the size of the tuple
spaces, which contain as many as 296 elements).
Thinner uses four types of constraints: functional constraints, transitivity con-
straints, list property constraints, and transparency constraints.
Functional constraints Some of the operators in TL establish a single binary
relation, as described previously; others determine a restriction on the relations
among the input and output objects for a vertex. For each such operator in the
language there is a fixed constraint set 5; for each such vertex in a trace graph we
94
establish a constraint (P, S) that connects the appropriate tuple P of object pairs
to the constraint set for that operator.
Integers An addition operation a = (+ b c) and the successor and pre-
decessor relations are intimately related. If we learn that jcj = 1, for exam-
ple, we want to eliminate many of the possible primitive relations between a
and b; and if we learn that a is the successor of b, we want to be able to con-
clude that c = 1. To make this kind of inference possible an addition con-
straint (P, 5) doesn't have P = ((a, b), (b, c), (a, c)), as you might expect; instead
P = ((a, b), (b, c), (a, c), (a, 1), (b, 1), (c, 1)).
The constraint set for a subtraction operation b = a c) also contains
sextuples--Hin fact, it's the same constraint set as for addition (but with the pairs
treated in a different order), since since b = a c) if and only if a = (+ b c).
The constraint set for a multiplication operation a = (* b c) uses the same trick:
sextuples of relations, constraining not only the pairs (a, b), (b, c) and (a, c) but
also the pairs (a, 1), (b, 1) and (c, 1). In this case the trick is even more pertinent
since 1 is the multiplicative identity.
Booleans The constraint set for the conjunction of booleans is obvious, as
shown in Table 4.9. Boolean disjunction is handled similarly.
(b, c) (c, a) (b, a)
Ifti			ftf?			f??1
ftfl			?ffJ			ftfl
?tt??tt?ftty
Table 4,9: The constraint set for boolean conjunction: a = (and b c)
There are comparison operators that have one boolean output and two inputs
95
of other data types: the equal operator for every data type, and <, <=, >= and
> 011 integers. There are only two tuples in these constraint sets. For example,
consider the < operator on integers: suppose a = x y). The tuple of pairs to
be constrained is ((x, y), (a, a)) and there are only two possibilities: (<,tt) and
(>, f?). Eliminate one and you have the fullest possible inference; eliminate both
and you have a contradiction.
Bitvectors The constraint set for the logical-and of bitvectors is straightfor-
ward, as shown in Table 4.10. The logical-or of bitvectors is handled similarly.
(b, c)			(c, a)			(b, a)
Ivool			?voo1			?vOOJ
?vO1J			fv1O?			(vool
?v1o1			fvOO?			Iviol
lviii			lviii			lviii
Ivoxi			lvxoi			Ivool
Iviol			Ivool			lvxoi
lv=i			lv=i			lv=i
lv=-,v<-i			lvxoi			lvxoi
lvxii			Ivixi			lv=i
Ivixi			lv=i			fvixj
lv>-,vxxi			lv>i			lv>i
lv>i			lv=i			lv>i
lv<i			lv>i			lv=i
Table 4.10: The constraint set for bitvector conjunction: a = (and b c)
Lists Two list operators, append and cdr, require constraint sets. The con-
straint set for c = (append a b) contains sextuples of relations: it constrains (a, b),
(b,?c) and (a, c) and also the relations of each of a, b and c with the empty list. The
constraint set for b = (cdr a) constrains triples, constraining the relation (a, b)
96
and also the relations of a and b with the empty list. Thinner could approximate
the cdr relation immediately, as fsc<1, s<1, sc=o1, but by constraining relations
with the empty list at them same time it gains the power to make further infer-
ences. For example, if a is the empty list Thinner will conclude that b is also the
empty list.
Transitivity constraints The relations manipulated by the semantic analyzer
have transitive properties. When one relation is strengthened, we want to take
advantage of transitivity to strengthen others, even when the objects in question
are not neighbors in the graph.
The semantic analyzer treats transitivity as a kind of triangle property: a re-
striction on the three binary relations among any three objects. There are 0(n3)
of these triangles. To keep the complexity down to 0(n2) the relational constraint
algorithm installs transitivity constraints on certain "interesting" triples only, leav-
ing most triples unconstrained. A triple (a, b, c) is deemed interesting only if there
is a functional constraint on (a, b), or if a is a special constant like 0. My exper-
iments suggest that this captures many of the useful inferences that occur when
full transitivity is used.
Even with full transitivity, it may seem at first glance that an important class
of inferences having to do with unary relations is missing. For example, suppose
we have established that x?0+<2+Jy. This fixes a binary relation between x and y,
of course, but it also fixes something about each integer independently: x must be
o and y must be positive. This means that there are restrictions on the relations
for (x, z) and (?, z), for any z of that same type.
As it happens, this kind of inference occurs as a by-product of inference about
transitivity. Given a relation xRy, a transitivity constraint will apply to the re-
97
lations in all triangles with (x, y) on one leg, including the triangle whose other
two legs are (y, x) and (x, x). So when any relation restricts x's sign, that will
be reflected in the reflexive relation for (x, x). (As mentioned above, this starts
out as f--=0, 00=0, ++=0J.) Then, when the relation for (x, x) is strengthened, a
transitivity constraint will apply to the relations in all triangles with (x, x) on one
leg. The other two legs will be (x, z) and (z, x) for all z of the same type, so the
new information about x will be reflected in all relations involving x.
List property constraints Sometimes we know that one object is the car or
length of another object. This happens with the empty lists, of course; it also
happens when the graph contains length, car or cons operators. Whenever we
have two lists a and b that both have length objects or both have car objects, we
install a list property constraint. This constraint ensures the relation between a
and b, the relation between car(a) and car(b), and the relation between length(a)
and length(b), are as strong as possible taken jointly. For example, if we have
two list objects a and b and two base-type objects c and d, and if we know that
c = car(a), d = car(b), and a = b, then the list property constraint for a and b
excludes the possibility that c ?
The relations of a and b with the empty list contain significant information
about their cars and lengths. So a list property constraint (P, S) actually applies
to 5-tuples of relations:
P = ((a, b), (car(a), car(b)), (length(a), length(b)), (a, nil), (b, nil))
(I have used "nil" here to refer to an empty list; which empty list depends on the
type of a and b.)
98
Transparency constraints Since the language is purely functional, all func-
tions are referentially transparent: given the same inputs, they produce the same
outputs. This applies even to nonterminal functions, about which the semantic
analyzer has (or, more accurately, uses) no other information. Given two nonter-
minal function calls cl = (f al b1) and C2 = (f a2 b2), and given that a1 = a2
and b1 = b2, we should be able to conclude that c1 = c2 without knowing anything
about the function f. Or, given the information that aj = a2 and ci # c2, we
should be able to conclude that b1 # b2.
To perform this kind of inference, the semantic analyzer places a constraint on
the inputs and outputs for each pair of vertices with the same function. A custom
constraint set is built to conform to the number and type of the function's inputs
and outputs. (This is the only time Thinner constructs a constraint set all others
are prefabricated.) A pair of n-input, m-output vertices with the same nonterminal
function will get a constraint on (n t m)-tuples.
This method could apply to terminal as well as nonterminal vertices: we could
construct a custom "transparency" constraint set for each operator, which in addi-
tion to transparency could reflect things like associativity (like or), bijectivity (like
cons), or determination by any two of three values (like *). The current version
of Thinner doesn't do this.
Optimizing constraint computation
Each constraint (P, 8') connects a tuple of pairs of objects to a constraint set.
The Constrain algorithm uses the constraint set to strengthen the current tuple of
relations for those pairs; and, as we'll see, it is often used to compute the same
constraint repeatedly. Since it is being called repeatedly with the same constraint
set 8' and with a tuple Q that gets stronger monotonically, it can be optimized to
99
avoid wasted recomputation. The idea is to have each constraint use its own copy
of the constraint set S, initialized as described previously; then when Constrain
finds that the intersection of Q with some tuple C ? S is empty, it discards that
tuple from 5.
procedure FastConstrain(Q, 5);
Q is a tuple of relations updated in place;
5 is a set of tuples of relations updated in place;
begin
R			the tuple of bottom relations;
T			the empty set;
for each tuple C in 5
begin
I Q A C (the componentwise conjunction);
if I does not contain a bottom relation then
begin
R R v I (the componentwise disiunction);
T TufIJ
end
end;
Q
5 T
end;
The initial constraint sets themselves can also be optimized. For example, if a
constraint set includes two tuples (?++=oJ, ?++<1J) and ((++=o1, f++>11), it can be
reduced by removing those tuples and replacing them with (?++=o?, f++<i, ++>?i)
loo
This is simple logic minimization, analogous to the optimization of (a A b) v (a A c)
to (a A (b v c)). In fact, when the constraint sets used by Thinner are generated,
they are, initially, sets of tuples of primitive relations--Hit's just easier to generate
them that way. But before Thinner sees them they are reduced automatically to
minimal sets of cubes.
One might take this further and ask: why not dispense with the general con-
straint set algorithm and instead implement a logically minimal custom constraint
algorithm for each operator? I experimented with this idea by converting a con-
straint set into logic equations, passing them to a logic minimizer (the Berkeley
tool MIS), and converting the resulting logic equations directly into code by hand.
The result, unfortunately, was a somewhat slower constraint procedure! Although
it was logically minimal it lacked the regular tight-looping structure of the general
procedure.
4.2.3 The relational constraint algorithm
The difficult part of the relational constraint method is the design of suitable vo-
cabularies of relations and the construction of appropriate constraint sets. The
relational constraint algorithm itself is quite simple, and the reader has no doubt
anticipated it. This section presents the algoritlim, gives correctness and complex-
ity results, and offers some tips on implementation.
The aIgorithm
Here's the algorithm in a nutshell. First, establish a table of relations indexed
by pairs of objects and initialize each relation in the table. Next, install the con-
straints: functional constraints, transitivity constraints, transparency constraints,
and list property constraints, as described above. Initialize a set of constraints to
101
be computed; initially, this set contains all the constraints. Now loop while there
are more constraints to be computed: choose one, compute it, and update the table
of relations; for any entry that got stronger, find the constraints that apply to that
pair and add them to the set of those to be computed; and remove the one we just
performed
We need to be able to find all the constraints that apply to a given pair in
constant time, so we'll construct a table Con indexed by pairs of objects, giving
a set of relevant constraints for each pair. The AddCon procedure installs a con-
straint in the right sets, and also adds the constraint to the set of constraints to
be computed.
procedure AddCon((P, 5), Con, Todo);
(P, 5) is a constraint
Con is a table of sets of constraints, updated in place;
Todo is a set of constraints, updated in place;
begin
for each pair (i,j) in P
Con(i,j) := Con(ij)Ut(P,S));
Todo := Todou ?(P,S))
end;
Whenever we strengthen a relation in the table of relations, we also strengthen
the reversed relation. (We'll assume that the new relation is stronger than the old
relation.) Then we find all the constraints that apply to either and include them
in the set of constraints to be computed.
102
procedure Strengthen(i, j, r, Con, Ret, Todo);
i and j are objects;
r is a relation;
Con is a table of sets of constraints;
Rel is a table of relations, updated in place;
Todo is a set of constraints, updated in place;
begin
Rel(i,j)
Rel(j, i)			r reversed;
for each constraint (P', S') in Con(i,j) or Con(j, i)
Todo Todo u f(P', s')1
end;
The installation of all the constraints has one subtlety, already noted: we don't
install all the possible transitivity constraints, only those that involve pairs that
also have functional constraints. The Init procedure takes a graph C and a set of
objects Obs and initializes the Con table, the table of relations Ret, and the set
Todo of constraints to be computed.
procedure Init(C, Obs, Con, Ret, Todo);
G is a trace graph;
Obs is the collection of objects for C;
Con is a table of sets of constraints, updated in place;
Ret is a table of relations, updated in place;
Todo is a set of constraints, updated in place;
begin
1 Todo := the empty set;
103
2 for each pair of identically typed objects (i, j) from Obs
3 begin
4 Con(i,j) the empty set;
5 Rel(i,j) the initial relation for (i,j) (Section 4.2.2)
6 end;
7 for each single-relation vertex v in C
8 update the appropriate Re1(?,j);
9 for each vertex v in C with a functional constraint
10 AddCon(fthe functional constraint for v?, Con, Todo);
11 for each pair of identically typed objects (i,j) from Obs
12 if i is a special constant or Con(i,j) is not empty then
13 for each object k from Obs of the same type as i and j
14 AddCon(fthe transitivity constraint on (i, j, k)), Con, Todo);
15 for each pair of identically typed list objects (i, j) with car or length objects
16 AddCon(?the list property constraint for (i,j)J,Con,Todo);
17 for each pair of vertices (v, w) in G with identical nonterminal functions
18 AddCon(fthe transparency constraint for (v, w)J, Con, Todo)
end;
The input to the main procedure RelationalConstraint is a trace graph C; we'll
assume that we are also given Obs, the set of objects of interest.
procedure RelationalConstraint(C, Obs);
G is a trace graph;
Obs is the collection of objects for C;
Con is a table of sets of constraints;
Ret is a table of relations;
104
Todo is a set of constraints;
begin
1			Init(G, Obs, Con, Rel, Todo);
2			while more constraints in Todo do
3			begin
4			(P, S) a constraint from TodQ
5			Q the tuple giving the relation Rel(i,j) for each (i,j) in P;
6			Q Constrain(Q, 3);
7			for each pair (i, 5) in P and corresponding relation r in Q
8			if r is stronger than Rel(i,j) (that is, if Rel(i,j) ? ?) then
9			Strengthen(i,j,r,Con,Rel,Todo);
10			remove (P, 3) from Todo
11			end
end;
Correctness
Theorem 4.1 The RelationalConsmaint algorithm terminates with a table of re-
lations Rel that cannot be strengthened by the given constraints.
Proof: An invariant for the loop at line 2 in RelationalConstraint is this: if the
computation of a constraint would strengthen any relation in Rel, that constraint is
in Todo. This invariant is established by Init, which places all constraints in Todo.
The invariant is reestablished inside the loop, since whenever any entry in Ret is
altered, all constraints involving that pair are added to Todo; and the one entry
removed from Todo is the one that was just computed (which would not result in
any further strengthening if recomputed).
105
Let c be the number of constraints installed by Init, and let k(Rel) be the
sum over all pairs (i, j) of the distance in the lattice of relations from Rel(i, j) to
the bottom relation for that type. Then consider t = (c + 1). k(Rel) + Todoj.
On any pass through the loop either some relation is strengthened or no relation
is strengthened. If some relation is strengthened then t decreases since k(Rel) is
reduced (even though I Todol may grow as large as c). If no relation is strengthened
then t decreases since one constraint is removed from Todo and none are added.
So t always decreases; and when t = 0 (if not before) we have Todo = 0 and the
loop at line 2 terminates. Since Todo is empty at this point, we can conclude from
the invariant that no constraint would strengthen any relation in Rel any further.
One might wonder whether the final table found by RelationalConstraint is
unique for a given initial table and collection of constraints. Since we haven't
specified the order in which constraints are selected for computation, is it possi-
ble that different orders yield different solutions? It turns out that the order in
which constraints are computed is irrelevant (although it makes a difference in
efficiency, as noted in the Chapter 6), and RelationalConstraint always finds the
same solution.
To prove this, note that an n-by-n table of expressible relations, like the Rel
table computed in RelationalConstraint, defines a cube in the space of n2-tuples
of primitive relations. Each individual constraint can be promoted to this super
tuple-space by reorganizing the tuples and adding T elements for those dimensions
not previously represented. So we can think of the method of constraints as starting
with some cube A (the cube defined by the initial Rel table) and restricting it in
a series of steps to reach some final cube 1?. We'll use C to denote the subcube
106
relation a cube A is a subcube of a cube B if and only if each relation in the table
for A is equal to or stronger than the corresponding relation in the table for B.
Each step in the main loop of RelationalConstraint is a constraint computation,
which is, as we've seen, the computation of the smallest cube that contains the
intersection of the current cube with the region defined by a constraint set. If s
is a sequence of constraints that takes initial cube A to final cube B, we'll write
A Hs B.
Lemma 4.2 If A HS B then BC A.
Lemma 4.3 If A Hs B and A' Hs B', and if A c A', then B C B'.
Proof: both lemmas follow immediately from the observation that each step in
S is a constraint that forms the smallest cube containing the intersection of the
previous cube with a region of the tuple space. ?
Theorem 4.4 For a 9iven initial cube and collection of constraints, Relational
Constraint finds a unique final cube.
Proof: Suppose we have two runs of RelationalConstraint with the same collection
of constraints and the same initial cube A. Let S and R be the two sequences of
constraints from these runs, so A HS B and A HR B'. Apply the sequence R to
B. By Theorem 4.1, none of the constraints will have any effect, so B HR B. By
Lemma 4.2 we know B c A and since A HR B' we can conclude from Lemma 4.3
that B C B'. Now apply the sequence S to B'. By Theorem 4.1, none of the
constraints will have any effect, so B' Hs B'. Reasoning as before we can conclude
B C B'. Since B C B' and B' C B, we have B = B': RelationalConstraint finds
a unique final cube. E)
107
This is as much as we can say about correctness: that the algorithm terminates
with a unique table of relations that cannot be further strengthened by the con-
straints. A more thorough proof of correctness would depend on the correctness
and completeness of the constraint sets.
Complexity analysis
Let n be the number of vertices in the graph. We assume that the trace graph has
bounded degree: that is, that functions have bounded input and output arity and
that the fanout is bounded. So the number of vertices in the graph, the number of
edges in the graph, and the number of objects for the graph, are within constant
factors of each other.
Theorem 4.5 RelationalConstraint runs in 0(n2) time.
Proof: First we need to determine how many constraints are installed by Init. The
number of functional constraints installed at lines 9-10 is n one for each vertex.
At lines 11-14 0(n2) transitivity constraints are installed: 0(n) constraints for
each of the 0(n) pairs constrained in the previous step. At lines 15-16 0(n2) list
property constraints are installed, at most one for each pair of list objects. Finally,
at lines 17-18 0(n2) transparency constraints are installed (normally far fewer
than n2 since the number of pairs of vertices with identical nonterminal functions
is likely to be small). The total number of constraints installed by Init is 0(n2),
and the time in initialization (line 1 in RelationalConstraint) is also 0(n2).
We now turn to the main loop in RelationalConstraint. The number oftimes the
test at line 8 can be satisfied for a given pair (i, j) is bounded by the height of the
lattice of relations, which is constant. So Strengthen is called 0(1) times for each
pair (i,j). The total time spent in Strengthen is therefore 0(?(i,j) jcon(i,j)j) =
108
0(n2). A constraint is computed at most once initially, and then at most once
every time one of its pairs is strengthened; this can happen for a given constraint
only a constant number of times, so the loop in lines 2-11 is executed 0(n2) times.
Disregarding Strengthen, whose contribution has already been counted, each pass
through the loop takes constant time. So the total time in the main loop and the
total time in RelationalConstraint--His 0(n2), ?
Full computation of transitivity, which is an option left turned off in Thinner,
can be achieved by skipping the test at line 12 in Init. With full transitivity the
complexity of the algorithm rises to 0(n?).
4.2.4 Examples
In this section we will look at two simple examples of relational constraint. The
second example is actually not so simple; it demonstrates that in spite of its low
complexity, relational constraint sometimes proves surprisingly tricky things.
A simple integer example
Consider the following program:
(defun f (a b)
(>= (+ (* a a) b) b))
Figure 4.1 shows the corresponding trace graph. There are six integer objects
for inference: the input variables a and b, the intermediate values labeled x and y,
and the constants 0 and 1. There are also three boolean objects: the output value
labeled z and the constants true and false.
The initial table of relations is shown in Table 4.11. Here r0 is the initial
relation between the constant 0 and any nonconstant integer object and ri is the
109
ab
Figure 4.1: Trace graph for the simple integer example
initial relation between the constant 1 and any nonconstant. Note that the initial
array is symmetric, with relations at least as strong as equality on the diagonal.
After relational constraint, the final table of relations is as shown in Table 4.12;
empty slots indicate unchanged relations.
We've concluded that the output of this function is a constant, the boolean
value true. How did this happen? There's no point in reconstructing the actual
order of events (since we haven't specified the order in which waiting constraints
are performed), but we can analyze the results logically.
The functional constraint for multiplication constrains the relations on pairs
(a, a), (a, x), (a, 1) and (x, 1) to make them consistent with x a a; in particular,
this constraint tightens the relation on (a, x) from T to r?. Note that this subsumes
the "law of signs" in that any x satisfying a?x must be non-negative. That's not
all ar4x tells us: the magnitude of x a?a must be either equal to the magnitude of
a (this happens when a ? 1) or greater than the magnitude of a by at least 2 (this
happens when Ia > 1). But for our purposes the important thing is that x is non-
negative, and after some transitivity constraints this fact is reflected throughout
110
0			1			a			b			x			y			true			false			z
0			00=0			0+<1			r0			ro			r0			ro
1			+0>1			++=0			ri			ri			ri			r?
a			rev(r0)			rev(r1)			=			T			T			T
b			rev(r0)			rev(r1)			T			=			T			T
x			rev(r0)			rev(r1)			T			T			=			T
y			rev(r0)			rev(r1)			T			T			T			--H
true			tt			tf			ftf,ttl
false			ft			ff			fff,ftj
z			t:'t,ttl			?tf,ff1			--H
r0			=
r1			=
fO-<i, 0-<2+, 00=0, 0+<1, 0+<2+1
f+-<i, +-<2+, +-=0, +0>1, ++=0, ++<1, ++<2+1
Table 4.11: Initial relations for the simple integer example
0			1			a			b			x			y			true			false			z
0			r2
1			r3
a			r4
b
x			rev(r2)			rev(r3)			rev(r4)			rev(r5)			r6			rev(r5)
y			>			T5
true			tt
false			ft
z			tt			tf			tt
r2			=
r3			=
r4			=
r5			=
r6 =
ro \ fO--H<1,0--H<2+J
ri \ ?+-<1,+-<2+,+-=0,++<1J
f-+<2+, -+=0, ++<2+, 00=0, ++=0?
r4 u ?-o>i, -0>2+, -+>1, -+>2+, +0>2+, +0>1,
++>2+, ++>1, -+<1, 0+<2+, 0+<1, ++<1i
fOO=0, ++=OJ
Table 4.12: Final relations for the simple integer example
111
the table, affecting x's entire row and column, This explains r2 and r5: they are
the original relations strengthened for x > 0. (r? is even stronger: note that it
does not allow x, the square of an integer, to be 2.)
Now the functional constraint for addition constrains the relations on pairs
(x, b), (x, ?), (b, y), (x, 1), (b, 1) and (y, 1) to make them consistent with ? = x +
Since x is positive, this constraint tightens the relation on (b, y) from T to ?.
This in turn triggers the computation of the functional constraint for the <= test,
which constrains the relations on (y, b) and (z, z): since y > b, z = true. Finally,
transitivity strengthens the relations between z and the constants true and false.
A tricky list example
Consider the following program:
(defun g (a b)
(let (Cc (cons 1 a))
(d (cons 0 inil)))
(= (append c d) (append b c))))
Figure 4.2 shows the corresponding trace graph. Thinner decides that the output
of this program is always the boolean value false. The reader is invited to prove
this, using any method, before proceeding.
Perhaps the reader found this inductive proof-by-contradiction. List x is 1. a 0
and list y is b 1 a. Suppose x = y. Then Ib = 1, a cannot be inil and its
last element must be 0. We also observe that if a? = 0 then a??1 = 0 and so, by
induction, all the elements of a must be 0. But if x = y the first element of a must
be 1. By contradiction, x $ y.
How does Thinner prove this without induction or contradiction? It reasons
about seven integer-list objects: the inputs a and b, the "let" variables c and d,
112
ab
c1on?s			0 inil
z
Figure 4.2: Trace graph for the tricky list example
the intermediate values x and y, and the empty integer list. There are also three
boolean objects: the output z and the boolean constants true and false. The initial
table of relations is shown in Table 4.13. llere r0 is the initial relation between
the empty list and any list, and r1 is the relation established by cons between its
input and output lists. Table 4.14 shows the final table of relations.
The cons vertices establish initially that 1 is that car of c and 0 is the car of d,
and that the length of d is the successor of the length of inil (that is, the length of
d is 1). Then because the integer relation O?O+<1?1 is present (though not shown
in the table) the list property constraint for c and d yields r?, which says (among
other things) that c and d have different cars.
The functional constraint for the append that computes x constrains the rela-
tions on (c, d), (c, x), (d, x), (c, inil), (d, inil) and (x, inil). The constraint set for
append reflects the fact that if the length of d is 1 and if c and d have different
cars, c cannot be a suffix of (append c d). (This is where that tricky proof is
hiding it's implicit in the constraint set for append.) So when this constraint is
computed, we get the relation c<1 between c and x shown in the table. It says
113
inil			a			b			c			d			x			?			true			false			z
inil			=			r0			ro			To			ri			r0			r0
a rev(r0) = T r1 T T T
b rev(r0) T =TTTT
c rev(r0) rev(ri) T = T T T
d			rev?1)			T			T			T			=			T			T
x			rev(r0)			T			T			T			T			=			T
?			rev(r0)			T			T			T			T			T			--H
true			tt			tf			tt:',ttl
false			ft			ff			tff,ftl
z			fft,ttj			?t:',ff1			=
To			=			fsc=o,sc<2+,sc<1,s<2+,s<11
Ti			=			fsc<1,s<1?
Table 4.13: Initial relations for the tricky list example
that c is one shorter that x, that they have the same car, but that c is not a suffix
ofx.
On the other hand, the append that computes y constrains its tuple of relations
to assert that c is a suffix of ? (relation r0). Since c is a suffix of ? and not a suffix
of x, we conclude by transitivity that x and y cannot be equal; and the functional
constraint on the equality test settles the value of z.
4.2.5 Comparison with other methods
Methods of greater power
One way to reason about programs is to use a more powerful inference system:
perhaps a Prolog-style backward-chaining system, or something that finds simple
inductive proofs like Boyer and Moore's early system for proving theorems about
Lisp programs [BM75J; maybe even a powerful reasoning system like Nuprl [C+86J.
My early experiments with a prototype of Thinner used a simple backward-
114
inil			a			b			c			d			x			y
inil			r2			sc<1			s<2+			r3
a			r8			r6
b			r8
c			rev(r2) rev(r1)			r4			c<1			ro
d			sc>1			rev(r8)			rev(r8)			rev(r4)			r2			r9
x			s>2+			rev(r5)			c>1			rev(r2)			r10
y			rev(r3)			rev(r6)			rev(r7)			rev(r0)			rev(r9)			rev(r10)
r2			=
r3			=
r4			=
r6			=
r7			=
r8			=
rio =
true			false			z
true
false			ff
z			ft			ff			?f
r0 --H fsc=O, sc<2+, sc<11
r0 --H ?sc=O, sc<11
f=o, s>2+, s>1, >2+, >1J
tsc<2+, s<2+, c<2+, <2+1
r0 --H tsc=Ol
r6 U fc<2+, c<1J
r4 u fsc=O, sc<1, sc>2+, sc>1, c>2+, c>1J
r? U f=o, c<1, <1, s<11
?c>1, c=O, =0, c<2+, c<1, <2+, <11
Table 4,14: Final relations for the tricky list example
115
chaining inference engine. My choice of the vocabulary of relations used by the
current Thinner originated in post-mortem examinations of the most useful proofs
discovered by that prototype. Looking over these proofs, I made several other
observations:
o+ A trace graph suggests natural paths of inference that a general purpose
inference engine ignores. For example, if you've just concluded that a = 0,
the hext natural step is to reason about uses of a in the graph.
o+ Thinner needs to examine programs and discover opportunities for thinning
and this means forward, not backward, chaining. What's more, Thinner
doesn't need proofs, it only needs conclusions.
o+ The number of objects in a trace graph is fixed and rather small, and it is
these objects about which Thinner needs to reason. There is no real need for
unification.
These observations led me away from general-purpose inference techniques which,
although powerful, are too costly (that is, slow) for the inference task Thinner
faces.
Methods of greater specificity
The literature contains quite a few techniques for dealing with the integer vari-
ables of imperative programs. Constant propagation is a time-honored technique
[WZ9i]; it treats program variables independently, without considering the rela-
tions between them. Cousot and Cousot extended this kind of analysis to constant
intervals [CC76]
Relations between integer variables have also been studied: the method of
Karr [Kar76] develops a system of affine equations on a program's variables, and
116
the linear restraint analysis of Cousot and Halbwachs [CH78j develops a system of
affine inequalities. A strong motivation in this area has been the compiler-writer's
desire to vectorize loops automatically, which requires congruence analysis. In
this vein are Granger's congruence analysis, which yields assertions of the form
x c[m] [Gra89] and, later, analysis of congruence equalities, which yields a
system of congruence equations [Gra91]
These techniques are too specialized for use in Thinner, which needs to reason
about other data types as well. Many are also too expensive: congruence analysis
is O(pn6 log2 n) where p is the program size and n the number of integer variables,
while linear restraint analysis is exponential.
4.2.6 Abstract interpretation
Many of the inference techniques discussed above appear in the literature as ab-
stract interpretations. Abstract interpretation is a framework for the formal treat-
ment of program analysis techniques, first advanced by Cousot and Cousot [CC77,
CC79]; see Abramsky and Hankin [AH87] for a modern introduction. Techniques
for the analysis of functional programs that have been expressed in this framework
include strictness analysis for the optimization of lazy languages [11ug87,Wad87]
and update-in-place analysis for the optimization of storage reuse [Hud87]. Not
wanting to appear behind the times, I have sketched out a treatment of relational
constraint in this same same style. In fact, I have sketched a treatment in two
styles: the original Cousot and Cousot treatment using adjoined functions, and a
more recent style involving semantic schemata, due to Mycroft.
117
Abstract interpretation with adjoined functions
First, the original Cousot and Cousot treatment using adjoined functions. My
terminology and notation follow theirs [CC79]
Let E(G) denote the set of executions of graph G, with respect to any mapping
? from nonterminals to functions. An assertion about a given trace graph G is
a total function mapping executions of G to ftrue, false?. Let Ac be the set of
assertions about graph G:
: E(G) (true, false?
Ac ordered by implication forms a complete boolean lattice.
Recall that in Chapter 3 we discussed a function Lang that maps any trace
graph G and trace grammar Q into the set of all executed trace graphs derivable
from G in Q. That is, Lang(G, Q) is a set of pairs (G', f') where f' is all execution
of G' and C G'. We can use this same function to identify, for each graph G
and grammar Q, the particular assertion giving a standard semantics: the assertion
CG,Q, which is true of exactly those executions f of G that match the execution f'
for some (G', f') ? Lang(G, Q). That is: if mG?GI is the function mapping each
object of G to the corresponding object of G',
CG,Q = Af. N(G', f') ? Lang(G, Q) . f = 0
This is a standard interpretation of trace graphs and grammars.
Now for the abstract interpretation: let Ac be the set of tables of expressible
relations on pairs of objects in G, as defined in Section 4.2.1. This is a complete
lattice under the partial order R E S =--H V(i,j).R(i,j) E S(i,j), using component-
wise conjunction and disjunction, the all-T table as the top element, and the all-I
table as the bottom element. Elements of Ac represent approximate assertions.
118
The function ?? : A0 H A0, defined as follows, gives meaning to each element of
A0.
= Af.[V(i,j) .
It should be clear that ?o is not one-to-one: any table R with a bottom element
specifies the empty assertion, and any table R that can be strengthened by tran-
sitivity specifies the same assertion before and after strengthening. (This is where
the transitivity and symmetry computations of relational constraint enter the pic-
ture: they find the pairwise-strongest 1?' c A0 with ?0(R') = ?0(R).) The range
of ?o is a set of assertions ?o ? A0: these are the approximate assertions.
Theorem 4.6 For all assertions p ? A0 the set ?q c A0 p ? g? has a least
element.
Proof: A0 contains the supremum of A0. It is also closed under conjunction: the
assertion given by the table R and the assertion given by the table S are both
satisfied if and only if the assertion given by R A S is satisfied remember that for
any two expressible relations their conjunction is also an expressible relation. It
follows that A0 is a Moore family and any p ? A0 has a least upper approximation
in A0. ?
We can therefore define an approximation operator p? : A0 A0 as
= vfq ? A0 p ?
To get a representation for the least upper approximation of an assertion p we need
to pick one of the representations R such that ?0(R) = po(p). We can just choose
the conjunction over all such 1?, and so define a function a0 : A0 A0 as:
ao(a) = AfR ?0(R) = po(p)1
119
Now (??, ?c') is a pair of adjoined functions in the sense of Cousot and Cousot.
Cousot and Cousot translated the operators of their language into predicate
transformers. In contrast, we have treated TL operators as predicate constraints:
each operator in a graph G yields a constraint (P, 5) giving a tuple of pairs P and
a description of the space of tuples of relations 5 that must be satisfied by any
table of relations R ? A0 consistent with the standard semantics that is, any R
for which CG?Q ?
Abstract interpretation with schematic semantics
A different approach to abstract interpretation is advanced by Mycroft [Myc87].
In this approach, a schematic denotational semantics is presented for the language.
One way of completing the semantics yields a standard interpretation, while an-
other way yields an abstract interpretation.
In Chapter 3 we developed a schematic denotational semantics for TL. Recall
that we constructed a standard semantics from this schematic semantics by defining
the semantic function T: Ter C (so that the meaning of each terminal function
identifier was the relation normally associated with the corresponding function on
values), and by defining the auxiliary function Lang: Z ?f(N) ?(X) (so
that it generated the set of fully terminal trace graphs from a trace grammar in
the usual way). We will now construct an abstract semantics by defining these two
functions in a more relaxed way. To distinguish the standard semantic functions
from the abstract ones, we will subscript the former with an 5 and the latter with
an A
First, a definition for Lang?: let Lang?(&, Q) be f(G, f) f is an execution of
Gi, again allowing executions with respect to any mapping ? from nonterminals
to functions. Note that this ignores the grammar Q: it specifies an immediate
120
semantics for trace graph &, whether or not G is fully terminal.
Next, a definition for T?: to make this definition we will use two auxiliary
semantic functions XA and YA. The first maps a terminal function and a tuple of
values to a tuple of pairs of values, and the second maps a terminal function to
a set of tuples of primitive binary relations (recall that xt is the set of primitive
relations for each type t).
: T H V* H (V x V)*
YA : T			?((xint + Xbool +
These functions are derived from the constraints (P, S) described previously.
?A ?T? is the function that organizes a tuple of values (giving the inputs and outputs
of the primitive function T) into pairs corresponding to the pairs of objects P in a
constraint (P, 5) for function T. YA[T? is the space of tuples of primitive relations
described by the constraint set 5 in a constraint (P, 5) for function T.
Now the semantic function TA can be defined as the set of all tuples of values
that satisfy the constraint defined by XA and Y?:
TAt[T? = fv ? V*I V A(XA?T?v)i,ori(X??T?v)i,iY
rEY?ITI
If the constraints described by X and Y are constructed correctly we will have
Ts iT? C TA iT? for all terminal functions T.
Because T5iT? C 7??T? for all terminal functions T, and because nonterminal
functions are unconstrained, it follows that we have an abstract semantics in the
sense that if (G, f) ? Qs?P?E then there is a corresponding (G', f') ? QAIP?E.
4.2.7 Discussion
The relational constraint method should be easy to parallelize. Since the order in
which constraints are computed is largely irrelevant, there is no reason why they
121
cannot be computed in parallel. Updates to the table of relations are also unusually
order-independent since relations grow stronger monotonically.
Relational constraint need not be the only method of inference used in a given
system. Thinner's semantic analyzer also uses the normal forms of expressions
to gain information about a program (although this was disabled for the exam-
ples discussed above). Even a more powerful inference system--Hsay, one with a
method for induction might well seed its initial collection of assertions with those
discovered efficiently by the method of relational constraint.
4.3 Normal forms
Thinner 5 semantic analyzer gains additional information by computing and com-
paring normal forms of expressions. Whenever the normal expressions for two
objects are identical, the objects must be equal. As this is a well-known technique
I will not labor it here. For completeness, however, this section outlines the normal
forms used by Thinner for each data type.
Inputs to a trace graph and outputs from nonterminal functions within it are
always among the variables that may occur in normal expressions, except that if
two such objects are known to be equal at the time the normal expressions are
computed, one variable will be used for both. Additional variables are created
for any object whose normal expression could not be computed in terms of ex-
isting variables; these new variables may occur in the normal expressions of the
descendents of that object in the graph.
The semantic analyzer enters the normal expression for each object into a hash
table. Whenever it discovers two identical normal expressions it asserts the rela-
tion of equality between the two corresponding objects. This assertion is just a
122
strengthening of an entry in the table of relations built by RelationalConstraint. (It
may trigger additional constraint computations.) Information other than equality
might also be extracted from the normal expressions; for example, a variety of re-
lations between two bitvectors could be gleaned from an inspection of their normal
expressions. But the semantic analyzer does not attempt this.
4.3.1 Integers
The normal form for integer expressions is a polynomial, represented in recursive
form. The variables of these polynomials are the integer objects for which no
normal expression is known: the integer inputs to a trace graph, the integer outputs
of abs and length vertices, and the integer outputs of nonterminal vertices. The
semantic analyzer computes normal expressions for the integer results of addition,
subtraction, multiplication and negation.
4.3.2 Bitvectors
The normal form for bitvectors is a sorted list of variables xi through X?, followed
by a list of 2? bitvector constants. Each constant determines one disjunct in an
expression for the object. Let i be an integer from 0 to 2? --H 1 and suppose the n
bits .... bn give i's binary representation. Then the i'th constant Cj in the normal
form of an object contributes a term of the form
(ci A ii A ... A in)
where ii = Xj if bj = 0, and ii = Xj if bj = 1. The disjunction of all 2? such terms
is an expression for the object. Normal expressions are simplified to eliminate
unnecessary variables: that is, no variable in a normal expression has the property
that the coefficient of each term is the same as the coefficient of that term with
123
the variable complemented. If there are no variables in the list, there is 20 = 1
coefficient: this is the representation for bitvector constants. (The mathematical
properties of this form are discussed by Lausch and No?bauer [LN73]. They prove
that every boolean function has exactly one representation in this form.)
The size of a normal expression is exponential in the number of variables, so
the semantic analyzer computes only the shorter ones. It places a limit on the
number of variables (currently this limit is 7, which limits the size of a normal
expression to about 512 bytes). It proceeds in a breadth-first fashion: when it
has canonical forms for the inputs to a bitvector primitive, it determines the total
number of distinct variables in the inputs. If that number is greater than the
limit, it establishes a new variable for the output of that primitive. Otherwise,
it computes a normal expression for the output of the primitive using the normal
expressions of the inputs, then simplifies the result to eliminate any unnecessary
variables.
Given an object a with a normal expression, it is easy to compute a normal
expression for the complement of a: just complement each constant. Given two
objects a and b with normal expressions over the same variables in the same order, it
is easy to compute a normal expression for the and or or of a and b: simply logical-
and or logical-or the corresponding constants together. When a and b have normal
expressions over different sets of variables, it is only slightly more complicated to
choose the right pair of constants to logical-and or logical-or for each constant in
the result.
4.3.3 Booleans
The normal form for booleans is the same as the normal form for bitvectors, except
that the 2? constants for each expression are boolean constants and not bitvector
124
constants. Since this can reduce the space consumed by normal forms by a factor
of 2?, it is possible to allow more variables.
Many comparison primitives for example, integer equality take nonboolean
inputs and produce a boolean output. That boolean output is treated as a variable
in the normal expressions of its descendants.
4.3.4 Lists
In considering the normal form for lists, it is important to remember that TL lists
are not the same as Lisp lists. TL only allows the three flat list types: (listof
integer), (listof boolean), and (listof bitvector).
The list normal form specifies a list type and a list of terms. Each term is
either a variable, a base-type constant, or a car or cdr operation applied to another
list normal form. A list of these terms represents the catenation of the elements
contributed by each one; this catenation may have occurred in the program as a
cons or append, but the distinction is irrelevant since lists are strongly typed in a
way that does not allow lists of lists. Each term is simply a contributor of zero or
more list elements. Thus, (append (cons b inil) n) and (cons b n) have the
same normal form: the type (listof integer) followed by a list of terms giving
a term for b followed by a term for n.
Any base-type constant or base-type variable is a "solid", known to contribute
exactly one element to the expression in which it occurs. In TL the car of a list is
always an atom of the base type for that list: the car of an empty list of integers, for
example, is zero. So car expressions are also solid. In contrast, a cdr expression or a
variable of list type may contribute any number of elements including 0. Knowing
whether a term is solid lets us simplify list expressions in several ways. When the
first term in a form is solid, the car of that expression is simply that first term
125
and the cdr of that expression is the remainder of the list of terms. For example,
(car (cons (car 1) m)) and (car 1) have the same normal forms, as do (cdr
(append (cons a inil) n)) and n.
If the first term in an expression is not solid, we may still be able to sim-
plify when taking the car of that expression: in effect, we can truncate the ex-
pression after the first solid. For example, (car (append n (cons b m))) and
(car (append n (cons b inil))) have the samenormalforms, becausethe car
must come either from n or, if n is empty, from b. This truncation can be a bit
tricky since each cdr expression discards one solid: (car (cdr (append n (cons
b rn))))mayreallyneedrn,but(car (cdr (append n (cons b (cons cm)))))
will not.
In short, the normal form for list expressions handles the append/cons iden-
tities, resolves cars and cdrs when possible, and truncates car expressions. The
semantic analyzer never resorts to creating a new variable during the construction
of normal expressions for lists. This does not mean, however, that normal list
expressions are more accurate that normal integer expressions or normal bitvector
expressions. Two list expressions which always evaluate to the same list may have
different normal expressions. For example, (cdr (cdr (append n n))) has the
normal form cdr(cdr(n n)) and (cdr (append (cdr n) n)) has the normal form
cdr(cdr(n) n), but they always evaluate to the same list for any assignment to n.
Chapter 5
Repairing Violations
5.1 Overview
The methods developed in the previous chapter can be used to find conservative
thinnings. A trace graph with a conservative thinning serves as a thinning example:
an example of wasted computation that programs ought to avoid. Next, we need
a method for "learning the lesson" of a thinning example. That is, We need a way
to modify a trace grammar with respect to a thinning example, to make sure that
no graph derivable in the grammar makes the mistake illustrated by the example.
I call this the grammar thinning problem [Web92cJ.
That there is a correspondence between trace grammars and CFG's has already
been noted. My original plan of attack for the grammar thinning problem was to
make use of that correspondence. I planned to identify the relevant CFG thinning
problem, solve it, and then find the corresponding solution to the trace grammar
thinning problem. I imagined that the CFG thinning problem would be relatively
easy to solve, and that the difficult part would be lifting any solution to the realm
of trace grammars. This turned out to be incorrect: the grammar thinning problem
126
127
for CFG's is quite a rich problem and is still open, but the approximate solution
detailed in this chapter applies to trace grammars with little modification. It seems
that much of the complexity of the trace grammar thinning problem is captured
by the CFG thinning problem.
In the CFG thinning problem, a thinning example is a string ? with one or
more symbols marked. To thin a language with respect to this ? is to edit each
word to reflect the observation that the marked symbols of 6 are unnecessary and
should be omitted. When 6 is a fully terminal string we'll define a language Ls(G)
formed by carrying out a thinning procedure on each word in L(G). For a large
class of thinning examples the class for which the language of strings correctly
marked for thinning is regular we will demonstrate that L6(G) is a context-free
language and that there is an effective procedure for generating a grammar H for
which L(H) = L6(G). This procedure is developed in Section 5.2.
But it is only a special-case solution. There are fully terminal thinning examples
that don't satisfy the condition, and, moreover, thinning examples may include
nonterminals. We'll extend our definition of L?(G) to allow nonterminals in 6,
by specifying a thinning procedure to be performed on each parse tree generated
by G, rather than on each string in L(G). This yields the general CFG thinning
problem: for any grammar G and thinning example 6, find a grammar H for which
L(H) = L6(G). It is an open problem; it is not even known whether Ls(G) is
necessarily context free.
Our ultimate motivation is the optimization of trace grammars, so we don't need
to hold out for an exact solution. An approximate one, one that eliminates some
but not necessarily all instances of the thinning example, will do. One approximate
solution follows from defining a language LA(C) formed by thinning the parse trees
of G with respect to a set of thinning-tree examples instead of a single thinning-
128
string example. We will develop an effective procedure for generating a grammar
H for which L(H) = LA(G). By choosing an appropriate thinning set ?, we can
use this method to get an approximate solution to the general problem.
Unfortunately, this approximate solution isn't strong enough for some of the
optimization problems Thinner should be able to solve. In Section 5.5 we will
examine a more powerful method of constructing an approximate solution called
the method of kernels. This method is the one actually implemented in Thinner.
Section 5.6 shows an example of how the method of kernels can be applied to trace
grammars. (The details of this application will be covered in the next chapter,
which treats the implementation of Thinner.)
5.2 Fully terminal CFG thinning
In the context-free grammar version of the thinning problem we're given a CFG,
and a thinning example which is a string ? with one or more marked characters. The
example says, in effect, "The marked symbols in this sequence are unnecessary,"
and the problem is to modify the CFG to incorporate the lesson of this example.
What does this mean precisely? When ? is a fully terminal string the problem is
fairly easy to state rigorously, and we'll deal with this case first.
Suppose the thinning example ? contains terminal symbols only. We define a
function ThinString for editing a terminal string x using a thinning example 6 as
follows.
function ThinString(x, 6);
x is a string of terminals;
6 is a string with at least one marked symbol;
ThinString(x, 6) is a string of terminals (a subsequence of x);
129
begin
while there is an instance1 of 6 in x
begin
find the lefimost instance of 6 in x;
delete from x those symbols matched to marked symbols in 6
end;
return x
end;
For example, if x = aabc and 6 = ab (where underlining indicates a marked charac-
ter), ThinString deletes symbols from x to end up with the string bc: first it finds
the instance of ab in aabc and deletes that a to get abc; then it finds the instance
of ab in abc and deletes that a to get bc. The reader may wish to try an example or
two. Try thinning the string xyxxyxxyxy using 6 = ?xyxy. (Remember to thin the
leftmost instance of 6 each time.) Or try thinning abxaabxabbxaabxabbxab using
6 = abxab. In both examples the thinning step is repeated five times.
Now let G be a context-free grammar and consider the language L6(G) obtained
by thinning each word in L(C) using the string 6. That is,
L6(G) =			w = ThinString(v, 6) for some v ?
This is like learning the lesson of 6 ex post facto: first make all the mistakes of
L(C), then go back and correct them. What we want, of course, is a new CFG H
for which L(H) = L6(G). But this is jumping ahead a bit. There are really two
questions here:
1. Is L?(G) a context-free language for any CFG G and any thinning example
6?
1By "instance" I mean substring or, as some authors have it, factor.
130
2. Can we give all algoritlim that takes a CFG G and a thinning example ? and
produces a new CFG H for which L(H) = L6(G)?
The answer to these questions is unknown. However, there is a special case
(actually, a common case) for which both questions can be answered affirmatively.
This case allows a general CFG G but restricts the thinning example 6 as described
below.
Define a language M(6) of strings "correctly marked for thinning" as follows:
a string z is in M(6) if and only if exactly those characters deleted when z is
editing by 6 are marked in z. So M(?ab) includes abc and aabc but not abc (because
a character deleted during editing by ab is not marked) and not ac (because a
marked character is not deleted). As far as I can tell, the language M(6) is context
free for all 6, but this is unproven.
Conjecture 5.1 For any 6, M(6) is a context-free language.
It often happens that M(6) is not only context free but regular. This is the
kind of 6 for which I have a solution to the CFG thinning problem.
Theorem 5.2 If M(6) is a regular language then L?(G) is context free for any
context-free grammar G.
Proof: Given a pushdown automaton P accepting L(G) by final state, and given
a finite-state automaton F accepting M(6), construct a pushdown automaton P6
as follows. P6 can augment its input string nondeterministically: whenever it is
about to scan an input symbol, it may decide to leave the real next input symbol
for later and supply any marked input symbol instead. Using this augmented input
string, P6 simulates both P (ignoring the difference between marked and unmarked
symbols) and F (respecting the difference between marked and unmarked symbols).
The accepting states of P? are those in which both P and F are in accepting states.
131
All input string w is accepted by P? if and only if there is some way to augment
w with marked symbols yielding a string w' which is in L(G) (ignoring marking) and
in M(?) (respecting marking). This is true if and only if there is some ? L(G)
that yields w when thinned by 6. So P? accepts w if and only if w ? Ls(G). ?
5.2.1 Thinning by FSM
Of course, we don't just want to know that the thinned language is context free; we
want to be able to construct a grammar H for which L(H) = L6(G). (Remember,
our motivation involves the optimization of computer programs; a proof that a
thinned program exists may not be that useful.) There is always the roundabout
method suggested by the proof: construct a pushdown automaton P for G, use it
and the finite-state automaton F for M(6) to construct P6 as described above, and
then use the classical construction [HU79] to make a grammar H that generates the
language accepted by P6. The resulting grammar would be complicated, though,
and it would bear little intuitive relation to the original one. Fortunately there is
a more direct way to construct H, as suggested by the following alternative proof
of Theorem 5.2.
Alternate Proof of Theorem 5.2: Suppose G is a CFG and 6 is a thinning
example for which M(6) is a regular language. If L is a language without any
marked symbols, define S(L) to be the same language but allowing all patterns
of marked and unmarked symbols. That is, S(L) is a substitution applied to L,
which maps each terminal symbol x to either x or x. Clearly S(L(G)) is context
free. Since M(6) is regular, the intersection of 5(L(G)) and M(6) is context free.
This is the language L(G) but with each string correctly marked for thinning; so
132
by substituting the empty string for each marked symbol, we arrive at L6(G) and
conclude that it is context free. ?
This proof suggests a method for transforming grammars--Ha method I have
implemented. The tricky part is the construction of a CFG for the intersection of
two languages, one a context-free language given by a CFG and the other a regular
language given by an FSM. Salomaa [Sal73J gives a treatment of this problem. The
general idea is to specialize each nonterminal X in the grammar into a variety of
nonterminals Xi,j, where ? and j are states of the FSM for M(?). Xi,j derives a
string w if and only if X derives w and w takes the FSM from state ? to state
j. In practice, many of the Xi,j'5 are unnecessary: they either derive no fully
terminal string or are not used in any derivation starting from Sa,b (where S is the
start symbol, s is the start state and b is an accepting state). My implementation
avoids generating these unnecessary nonterminals. It also simplifies the resulting
grammar a bit, getting rid of self-productions, duplicate nonterminals (i.e. those
with identical productions), and nonterminals with single productions. Figure 5.1
shows a grammar thinning performed by this implementation.
Since we have a solution to the fully terminal thinning problem when the thin-
ning example (5 yields a regular M((5), it makes sense to ask whether there is a more
direct characterization: for what kinds of (5 is M((5) a regular language? Figure 5.2
shows a selection of thinning examples (5, categorized according to whether M((5)
is regular. Note that M((5) is regular for many thinning examples: it's regular
whenever (5has exactly one marked symbol, and it's regular even for some fairly
tricky (5's like (5= xyxy and (5= abxab (which were used above in an exercise for
the reader).
All thinning examples of which I am aware fit a simple pattern:
133
5 Al?
A H aA a c
B H bl? b
Initial grammar G
5 --H4 So,o I So,i
?o,o A0,0B0,0
5o,i A0,1l?1,1
A0,0 aA1,0 c
A0,1 H aA1,1 a
A1,0 aA1,0 c
A1,1 H aA1,1 a
l?0,0 H bl?0,0 I b
l?1,1 H l?1,1
a
c
FSM for M(?), 6 ab. Start state
is 0 and both states accept.
5 Ax,0l?0,0 I Ax,1
Ax,0 H aAx,o I c
Ax,i H aAx,? a
l?0,0 H bl?0,0 I b
Final thinned grammar
Thinned grammar, before
simplification
Figure 5.1: Construction of a grammar for L?(G)
M(&) regular M(6) not regular
ab			__
?yxy			x?xzI)y
aaa
abxab
Figure 5.2: Some thinning examples classified
134
Conjecture 5.3 If M(6) is nonregular then thinning 6 yields some string P for
which there are nonempty strings ? and ? such that 6 = QP? (and so ????? thins
toP).
The implication doesn't work the other way, as shown by the case 6 = aaa. 1
believe that a strengthened form of this condition is both necessary and sufficient.
5.3 General CFG thinning
The trouble with the terminal solution presented above is that it won't yield a
particularly general optimization method for programs. Consider the exponential-
time Fibonacci function, or loop invariant removal, or loop jamming: these are
transformations in which the thinning example will definitely contain function calls
(that is, nonterminal vertices). We could extend the terminal solution to make it
treat some nonterminals as terminals, but this would only work if the thinned
nonterminals never participate in an instance of 6. The Fibonacci example doesn't
appear to fit this case: the thinned nonterminal is a recursive call. We need a
different approach.
Suppose 6 contains nonterminals. How can we edit a language to reflect the
lesson of this 6? It seems natural to extend the original string thinning function
to parse trees. Define an instance of 6 in a tree T to be a string of nodes of T that
matches 6 and occurs on some search frontier in T
function ThinTree(T, 6);
T is a parse tree;
6 is a string with at least one marked symbol;
ThinTree(T, 6) is a tree, a pruned version of T;
begin
135
while there is an instance of ? in T
begin
find the lexically first instance of ? in T;
delete from T those subtrees matched to marked symbols in S
end;
return T
end;
This is an extension of ThinString in the sense that if ? is fully terminal, it
makes no difference whether you thin the fringe of a parse tree using ThinString,
or thin the parse tree using ThinTree and then take the fringe: the result is the
same. Proceeding as before, we can define the language L6(G) to be the language
obtained by editing each parse tree generated by G using the string ?, then reading
off the remaining terminal string. That is,
= I w is the fringe of ThinTree(T, 6) for some T generated by U?
Again, when 6 is fully terminal, this is the same as the previous definition of
L6(G). But we now have a meaningful definition even when 6 contains nonterminal
symbols.
It may be a meaningful definition, but it seems to lead to an intractable prob-
lem! We made progress in the fully terminal case by insisting that M(6) be regular;
in the nonterminal case, since we're not editing strings, the question of whether
the string language M(6) is regular seems irrelevant.
136
5.4 The thinning-set method
To make progress on this problem, we'll define a weaker thinning procedure: one
that can be used to approximate ThinTree but is computationally less demanding.
procedure SetThinTree(T, A);
T is a parse tree which is edited in place;
A is an ordered set of trees, each with at least one marked node;
begin
while some member of A is a prefix of T
begin
Q := the first member of A which is a prefix of T;
delete from T those subtrees matched to marked nodes in Q
end
end;
function WeakThinTree(T, A);
T is a parse tree;
A is an ordered set of trees, each with at least one marked node;
WeakThinTree(T, A) is a tree, a pruned version of T;
begin
for each subtree Q of T in a topological order
SetThinTree(Q, A);
return T
end;
Using this procedure, we can define the thinned language LA(C) as before:
LA(C) = fw w is the fringe of WeakThinTree(T, A) for some tree T in G?
137
WeakThinTree is certainly weaker than ThinTree. In particular, it is no longer
equivalent to ThinString in the fully terminal case. This is because any partic-
ular thinning set A has a fixed maximum tree height; it is easy to construct a
grammar in which the distance to the common ancestor of some instance of ? is
unbounded. However, there's one good thing about weak tree thinning: we can
solve the grammar thinning problem with respect to this procedure.
Theorem 5.4 LA(G) is context free for any thinning set A and any CFG G
Proof: by construction of a CFG H for which L(H) = LA(G). We begin by
showing that the following procedure carries out weak tree thinning and derivation
simultaneously; then we'll construct a CFG that mimics the new procedure.
Define a A-deep tree as one in which any nonterminal which is not expanded
has distance exactly n from the root, where n is the maximum height of any tree
in A. (Note that a A-deep tree may have paths of length ? n as long as they end
at terminals.)
procedure ExtendThinning(R, C, A);
R is a parse tree which is edited in place;
G is a context-free grammar;
A is an ordered set of trees, each with at least one marked node;
begin
1 if R is less than A-deep then make it A-deep by extending it using grammar G;
2 if some member of A is a prefix of R
3			then
4			begin
5 Q := the first member of A which is a prefix of R;
6 delete from R those subtrees matched to marked nodes in Q;
138
7 ExtendThinning(R, G, A)
8 end
9 else
10 for each nonterminal child Q of R in order
11 ExtendThinning(Q, G, A)
end;
The idea is to apply this procedure using the start symbol of G as the tree to
extend: ExtendThinning(S, G, A). It isn't a fully specified procedure, since it
doesn't say how R should be extended using grammar G at line 1. The idea is
that ExtendThinning can generate any tree grammar G can generate, but thins
the tree as it generates it. The recursive call at line 7 thins the developing tree
iteratively, just as WeakThinTree does when it calls SetThinTree. The recursive
call at line 11 visits each subtree in a topological order, just as WeakThinTree does.
(As long as it's topological, the actual order is irrelevant, since the subtrees are
thinned independently.) The only thing different about ExtendThinning is that it
derives just enough of each subtree to decide how to thin it, leaving the rest until
the descendant subtrees are investigated. So the language of trees generated by
ExtendThinning(S, G, A) is the same as the language of trees generated from C in
the normal way and then thinned by WeakThinTree using the thinning set A.
Observe that ExtendThinning(S, G, A) never results in an ExtendThinning call
with a tree that is more than A-deep. Also, the maximum degree in each tree
is bounded by the maximum length of the right-hand side of any production in
G. It follows that even though ExtendThinning may develop an arbitrarily large
thinned parse tree, the portion of it that it examines at any stage is bounded, fixed
by C and A. Thus there are only finitely many different parameters R on which
139
ExtendThinning will call itself.
Now we construct a CFG H that mimics the behavior of ExtendThinning. The
terminal alphabet of H will be the same as the terminal alphabet of G, but the
nonterminal alphabet of H will consist of a start symbol Zs, where S is the start
symbol of G, and various other symbols denoted ZR, where R is one of the finitely
many different tree parameters on which ExtendThinning will call itself. For each
nonterminal symbol ZR in H the productions are determined as follows. Let X be
the set of A-deep expansions of R using grammar &; this will be just ?RJ if R is
already A-deep. (X corresponds to the set of possible extensions generated at line
1.) For each S ? X with a prefix in A there is a production ZR ZQ, where Q
is the result of thinning S using the first prefix of S in A. (Q corresponds to the
thinning computed at lines 5-7.) For each S ? X with no prefix in A, let Q? be the
ith son of the root of s; there is a production ZR H ?1.. P?, where Pj is either
Q? if Q? is a terminal node, or ZQj if Q? is not a terminal node. (This corresponds
to recursively visiting each nonterminal child at lines 10 and 11.)
By construction, there is a one-to-one correspondence between derivations in H
and traces of ExtendThinning(S, G, A). Since we know ExtendThinning generates
LA(G), we can conclude that L(H) = LA(G). ?
This proof suggests a method for transforming grammars--Ha method I have
implemented. Given a thinning example ? and a grammar G, I generate a thinning
set A of all the derivations in G that generate ? of height less than or equal to
some small constant d. Then I generate the grammar H using a variation of the
method above. The important thing about the method above is that it constructs
nonterminals which stand for latent parse trees, thus holding pieces of the parse
tree together until it is known how to thin them. My implementation uses this same
140
technique, but does not necessarily generate A-deep trees, as ExtendThinning does
at line 1; instead, it extends a tree only until the next thinning step is determined.
Figure 5.3 shows a grammar thinning as generated by this implementation.
5.5 The method of kernels
Later we will see that the example of Figure 5.3 is a CFG analog of the Fibonacci
reformulation for trace grammars. Since the thinning-set method solves this exam-
ple, it is tempting to conclude that it is a sufficiently powerful method for Thinner
(assuming that it can be lifted to the trace grammar thinning problem). But a
closer examination shows that this is incorrect: the thinning-set method has a
critical weakness.
5.5.1 A weakness of the thinning-set method
That weakness lies in the fixed maximum height of trees in the thinning set. Con-
sider the problem of thinning the same grammar S H SbSc a with respect to
a different thinning example, ? = bSb. Figure 5.4 shows that a parse tree in this
grammar may require thinnings that span arbitrarily tall subtrees: the distance to
the common ancestor of the nodes involved in a thinning is unbounded, Any fixed
thinning set will repair only a few instances of the thinning example, leaving all
"taller" instances uncorrected.
This example, as it happens, is a CFG analog of loop invariant removal for
trace grammars: the distance in a derivation tree between the first instance of the
loop-invariant expression and some repetition of it is unbounded. It's clear the
thinning-set method is inadequate: we need, if not an exact solution, then at least
an approximate solution that avoids the fixed-height limitation.
141
5 H SbSc a
Initial grammar G
5			5
5
b			5
c			5
5			b			5			c			5			b			5
b
5			c
5			b			5
c			_			c
Thinning set ?: all trees of depth d < 2
that derive ? = cbS
5
5			b			5			c
Tree T used in thinned grammar
5			ZT a
ZT H ZTbc I ab5c
Thinned grammar
Figure 5.3: Construction of a grammar for LA(G)
142
s
S			b			S
S H SbSc a
= bSb
c
S			b			g
c
S			b			s			c
Figure 5.4: A problem that defeats the thinning-set method
143
(It is important to point out that although this problem escapes the thinning-
set method, it is still solvable; as for all examples of which I am aware, one can
manually construct a grammar H for which L(H) = L6(G). In this case we need
only observe that when the production S SbSc is used in a parse tree, the
possible terminal fringes left below the second S are exactly the regular language
ac*. So H could have productions S SbAcIa and A H Ada.)
5.5.2 Outline of the method of kernels
The purpose of the nonterminals ZR in the thinning-set construction is to hold
together a piece of the derivation until it becomes clear whether or not that piece
yields an instance of the thinning example. The next method will use the same
general idea; but instead of creating new nonterminals corresponding to partial
parse trees, we'll create new nonterminals corresponding to strings of symbols
called kernels.
Consider the thinning example 6 = bSb: how can this arise as a substring of
some sentential form in the grammar ? H SbSc a? It can arise only from a
sentential form ?bSP (for any strings ? and P) by the expansion of the 5 using
production 5 H SbSc. The string bS is therefore a kernel of 6 in the grammar:
it's a minimal string of symbols that expands into a string containing 6. Now, how
can bS arise? It can only arise from the nonterminal 5, when that nonterminal is
expanded using the production 5 H SbSc.2
More formally, we define a kernel as follows:
Definition 5.1 A kernel of a thinning example 6 in a grammar G is a string
? which is not a single nonterminal and not 6, and which has the property that
2This is not quite true: it can also arise from itself. But since we're only interested in
enumerating the distinct kernels of 6 in & this is irrelevant.
144
?G* ?6? for some strings c and P, in such a way that every symbol of ? derives
a part of 6.
If we remove &productions from the grammar, we can guarantee that for every
kernel ?, j? < 161 (and it follows that there are linitely many kernels).
Once a kernel arises in a derivation, we're sunk: the nonterminal symbols in a
kernel expand independently, and depending on these expansions we may or may
not end up with an instance of 6. Our plan will be to attempt to prevent the
occurrence of kernels by modifying the grammar so that each kernel is replaced
by a new nonterminal. The following procedures, ThinByKernels and Kernelize,
summarize this method.
function ThinByKernels(&, 6, K);
G is a context-free grammar;
6 is a thinning example;
K is a set of kernels for 6 in G
begin
1 G' := a new, empty grammar;
2 for each production A H ? in G do
3 Kernelize(A, ?, G, G', 6, K);
4 return G'
end;
procedure Kernelize(A, ?, G, &`, 6, K);
A is a nonterminal;
? is a string of symbols;
G is a context-free grammar;
145
G' is a context-free grammar, which we augment;
? is a thinning example;
K is a set of kernels for ? in G;
begin
1 ? ThinStnng(o, 6);
2 for each kernel k ? K do
3 for each x, y such that o?[x..v] = k do
4 note that ? cannot safely be broken between x and y;
5 break Q into groups wherever it can safely be broken;
6 break further if necessary to keep group size below some constant c;
7 for each group P = ?[x..y] with Pj > 2 do
8 begin
9 replace symbols x through y of o with a new nonterminal Xp;
10 if Xp is not yet a nonterminal in G' then
11			begin
12 choose a nonterminal in P;
13 for each p' derived by expanding that nonterminal using G do
14			Kernelize(Xp, P', G, C', 6, K)
15			end
16 end;
17 add production A 0 to grammar U'
end;
The Kernelize procedure is not fully specified: line 6 does not explain how
groups are to be broken up to keep the group size below c, nor does it say what
c should be; and the procedure by which a nonterminal to expand is chosen at
146
line 12 is not specified. We will return to these points shortly, but first let's walk
through a simple example, assuming for now that c is large enough that we always
have P <c at line 6. Let G be the grammar S SbSc a, and let 6be bSb. As
noted previously, the only kernel for 6 is the string bS; so let K = tbsl. What
happens when we ThinByKernels(G, 6, K)?
There are two productions in G: S			a and 5			SbSc. They yield two
different calls to Kernelize at line 3 in ThinByKernels. Kernelize(S, a, G, G', 6, K)
simply adds the production 5			a to C'.
Kernelize(S, SbSc, G, G', 6, K) is more complex. The thinning at line 1 has no
effect, since 6 does not occur in ? = SbSc. At lines 2-4 we observe that there is
an instance of the kernel bS in ?, so the string cannot be broken between those
symbols. Thus, at line 5, we break ? = SbSc into 5 bS c; and the only group P on
which lines 7-16 are executed is ? = bS. We change a from SbSc to SXbsc at line 9.
The nonterminal Xbs is not yet present in G', so we execute the loop at lines 13-14:
there is only one nonterminal in P that can be expanded, and this expansion yields
two values for P': ba and bSbSc. Using these two values of P', we use Kernelize
recursively. Kernelize(X?s, ba, G, G', 6, K) simply adds the production Xbs H ba
to G'
Kernellze(X?s, bSbSc, G, G', 6, K) is more complex. First of all, the string a =
bSbSc does contain an instance of the thinning example, and it is thinned to
a = b5c at line 1. Now it contains an instance of the kernel bS, so it is broken
into groups as b5 c, and the b5 is replaced by Xbs. This nonterminal is already
part of G', so lines 13-14 are not executed. The procedure adds the production
Xbs H Xbsc to C', and terminates. The resulting grammar C' has productions
5 SXbgc a and Xbs H Xbsc ba. This is an exact solution: a grammar G'
such that L(G') =
147
5.5.3 Shortcomings of the method of kernels
Why isn't the grammar returned by ThinByKernels always an exact solution?
There are really two sources of inaccuracy. The first arises because we may break
up some groups at line 6 (those of size c or greater). Without this step we cannot
guarantee that Kernelize will terminate. Since kernels may overlap in lines 2-4,
there is no limit to the length of the group P in line 5; so without the upper bound,
there would be no limit to the number of different nonterminals Xp we might try
to add to G'. Line 6 skirts this problem by imposing an arbitrary limit c on the
size of groups. This results in guaranteed termination, but at the cost of potential
incompleteness: by ignoring large groups Kernelize may overlook instances of the
thinning example.
For example, suppose we have this grammar G:
H S5 a b
and suppose we want to thin it with respect to the thinning example ? ab. Note
that SS and aS are kernels for ? in G; since these can overlap, any string in aS*
will be grouped together. Using the bound c = 3, ThinByKernels might construct
the following grammar:
5 H Xss a b
Xss H SXss I Xas bS
Xas axss aa a
This is thinner than the original grammar, but it isn't an exact solution. (An
exact solution can be obtained in this case using the FSM method of Section 5?2.1.)
Without the group size bound ThinByKernels would diverge, trying to construct
an infinite grammar like this:
148
S H Xss a b
Xss H Xsss I Xas bS
Xsss H Xssss I Xass I bXss
Xssss
A second source of inaccuracy lies in the choice at line 12 in Kernelize of a non-
terminal in P to expand. Suppose there is more than one nonterminal in P; which
one should be expanded? Suppose, for example, we're faced with the following
grammar G:
S AB
A H aAb x
B H bBa y
and suppose we want to thin it with respect to the thinning example ? = AbB.
A thinned grammar is not difficult to construct: except for the tree that derives
xy, all parse trees in G are thinned exactly once. Here's a grammar that generates
L6(G):
H X? aby bBa
B H bBa y
But the method of kernels fails on this problem. Clearly AB is a kernel for ? in
C: it's a kernel in two different ways, since we get a string that contains ? either
by expanding the A to aAb or by expanding the B to bBa. But therein lies the
problem: when Kernelize encounters the group P = AB at line 12, it must choose
a nonterminal to expand. Expanding the A would yield this thinned grammar:
S H XAB
XAB H abB xB
149
B H bBa y
which although thinner than the original, does not generate L6(G). Expanding the
second would yield this thinned grammar:
5 H XAB
XAB H bBa Ay
A H aAb x
J? H bRa y
which isn't as thin as possible either.
The problem is that we can't expand either nonterminal in AR without poten-
tially passing by an instance of S: in some cases 6 arises when A is not expanded,
and in some cases 6 arises when R is not expanded. As long as there is at least one
nonterminal in P that is expanded on all derivations from P to a string containing
6, the problem can be avoided.
5.6 The method of kernels for trace grammars
The method of kernels has its limitations, but it is still the most general solution
known for the CFG thinning problem. As we will see, it carries over well to the
trace grammar thinning problem, and it is the method used by Thinner. The
problems of implementing the method for trace grammars are discussed in the
next chapter; for now, we will sketch the idea by applying the method of kernels
to analogous CFG and trace grammar problems, side by side.
We'll start with this CFG C:
5 SbSc a
150
and we'll thin it with respect to ? = cbS using the method of kernels. The set K
of kernels is just tSbS?. For the sake of exposition, we'll divide the operation of
ThinByKernels into five steps:
1. Thin and group in this step, the right-hand side of 5 H SbSc is thinned
(which has no effect) and grouped into kernels, leaving 5 SbS c. (5 H a,
the other production for 5, is carried into the new grammar unchanged.)
2. Replace now the group SbS is replaced by a new nonterminal Xsbs, which
yields 5 H Xs?sca.
3. Choose and expand next, we choose a nonterminal in ? = SbS to expand.
In this case, it will be the first 5, since the other is not expanded in the
derivation from P to ?. We expand it using the original productions, which
gives two possible right-hand sides for Xs?s: SbScbS and abS.
4. Thin and group in this step, the right-hand side of Xsbs H SbScbS is
thinned (which reduces it to SbScb) and grouped into kernels, leaving 5
SbS cb. (Xsbs abS, the other production for Xsbs, is carried into the
new grammar unchanged.)
5. Replace finally, the group SbSis replaced by the nonterminal Xsbs, leaving
Xsbs Xsbscb.
The resulting grammar is:
5			Xsbsc a
Xsbg H Xsbscb abS
which gives an exact solution for L?(G).
These same five steps are echoed in the optimization of the exponential-time
Fibonacci function. Figure 5.5 shows the appropriate trace grammar thinning
151
problem: note that, as in the CFC case, we have a single kernel; that kernel
contains two nonterminals; and when one of them (the one that computes (fib
n 1))) is expanded, the resulting graph contains the thinning example.
In step 1, we thin and group the right-hand side of the second production for
fib. (The other production is carried into the new grammar unchanged.) Thinning
the right-hand side has no effect since the thinning example doesn't occur there.
The right-hand side graph after grouping is shown in Figure 5.6.
Step 2 replaces the group shown in Figure 5.6 with a new nonterminal Xp. This
yields the productions shown in Figure 5.7
We choose a nonterminal to expand and expand it in step 3. There are two
fib nonterminals in the group P, but the one that computes (f ib (- II 1)) is the
one whose expansion yields an instance of the thinning example, and it's the one
we'll expand. There are two productions for fib in the original grammar, so this
expansion yields two possible right-hand sides for Xp, as shown in Figure 5.8.
In step 4, the graphs shown in Figure 5.8 are thinned and grouped. This
time thinning does have an effect, since the first graph contains an instance of the
thinning example. After thinning, the group ? is again found to occur in that
graph. The second graph shown in Figure 5.8 is carried into the new grammar
unchanged, and the resulting productions for Xp are shown in Figure 5.9.
Finally, in step 5, we replace the group P with the nonterminal Xp. The final
grammar is shown in Figure 5.10. It is an exact solution: a thinner grammar that
never generates a graph that contains the thinning example. This is essentially
how Thinner optimizes the exponential-time Fibonacci function, as detailed in
Chapter 6. The grammar shown in Figure 5.10 is not yet as thin as possible:
using a different thinning example, the use of fib in the second production for
Xp can be eliminated. But the grammar is significantly thinner than the original.
152
fib H
fib			fib
+
1			2
1			2
fib			fib
Figure 5.5: A trace grammar thinning problem
153
fib H
Figure 5.6: Productions for fib after step 1, showing group P
fib H
Xp
Figure 5.7: Productions for fib after step 2
Xp H
154
Figure 5.8: Productions for Xp generated in step 3
fib
F
Xp H
Figure 5.9: Productions for Xp after step 4
155
Iu particular, it corresponds to a linear-time, rather than an exponential-time,
function.3
fib H
Xp H
T
Xp
Figure 5.10: Final grammar after step 5
3Actually, calling this linear time is a bit of a stretch since the size of the output is exponential
in the size of the input. But most authors make the same stretch [Roh89j.
Chapter 6
Thinner
The lower panel had not slid aside to reveal FINANCIAL MATTERS
WILL SOON IMPROVE or OLD FRIENDS WILL VISIT or DO NOT MAKE
IMPORTANT DECISION HASTILY.
It had revealed a single black word: "THINNER."
--H Stephen King, Thinner
6.1 Overview
We now turn from neat to scruffy--Hfrom theory to experiment. Earlier chapters
have provided the following tools:
. the trace grammar representation (Chapter 2);
o+ a formal semantics giving denotations of TL programs in terms of trace gram-
mars (Chapter 3);
o+ the relational constraint inference technique (Chapter 4); and
o+ the method of kernels for thinning CFG's (Chapter 5).
This chapter describes the synthesis of these tools in Thinner, an optimizer for TL
programs.
156
157
Thinner is a Common Lisp system of about 15,000 lines, implemented in Lucid
Common Lisp on a Sun-4 workstation.1 Figure 6.1 shows the major components
of Thinner. Thinner is a source-to-source optimizer: a compiler converts the input
program into a trace grammar, and a decompiler converts the final trace grammar
back into a program in the source language. In between, Thinner explores the trace
grammar, looking for opportunities for optimization. It is an iterative process, and
Thinner takes as much time as the user gives it. The more time it has, the more
optimizations it may discover.
Transformer
Analyzer
A:
v:
Explorer
Compiler			Decompiler
Figure 6.1: Structural overview of Thinner
The grammar transformer works with a thinning example: a trace graph that
has been shown to include unnecessary computation. (This thinning example may
include nonterminals.) The transformer uses the method of kernels to reformulate
the grammar. Its goal is to alter the grammar so that the language it generates
`Lucid and Lucid Common Lisp are trademarks of Lucid, Inc. Sun-4 is a trademark of Sun
Microsystems, Inc.
158
never makes the mistake shown in the thinning example.
Where does the thinning example come from? A component called the expThrer
looks for unnecessary computations committed by the trace grammar. It looks
at the right-hand sides of productions in the grammar, and it unfolds them along
possible derivations. In effect, it examines individual partial paths through the
program, embodying each such path in a trace graph.
The dotted lines in Figure 6.1 represent paths along which individual trace
graphs are passed. The explorer calls on the analyzer to reason about individual
trace graphs, to determine whether they contain any unnecessary computation--H
that is, to determine whether they compute redundant or unused values. The
analyzer uses the relational constraint method to identify equivalences among ob-
jects in a trace graph. It annotates the trace graph with equivalence information
and returns it to the explorer. If there are suitable equivalences the graph is then
passed to the transformer as a thinning example. Thinner is a "global" optimizer
in the sense that the explorer examines interprocedural slices of the computation
looking for thinning examples, and the transformer tries to correct all instances
of a given example. But the analyzer reasons locally about each trace graph the
explorer gives it.
This chapter presents a summary of the design and implementation decisions
that went into the construction of Thinner, and discusses some of the problems
of translating the theoretical tools developed in the previous chapters into a prac-
tical system. It includes some illustrative selections of code from Thinner, and a
selection of annotated transcripts showing Thinner at work.
159
6.2 Thinner fundamentals
Here are some details that may help the reader to understand the code examples
that follow.
6.2.1 Memory management
Thinner follows a strict memory-management regimen: it explicitly frees all pieces
of memory it allocates, including cons cells. This is rather compulsive for a Lisp
system since garbage collection will find any lost objects. But I wanted to gather
statistics on Thinner's use of time and space without interference from garbage
collection. All data objects are allocated and freed using macros that manipulate
a list of free objects of the appropriate type.
6.2.2 Doubly-linked circular lists
When Thinner needs a list data structure that supports random insertions or dele-
tions, it uses a doubly linked circular list structure called a dlist. Many of the
higher-level data structures of Thinner graphs, vertices, terminal and nontermi-
nal functions, and so on start with the dlist structure and can be manipulated
in circular lists.
6.2.3 Iteration macros
Thinner uses a macro package for iteration. These macros were originally writ-
ten by Professor Alberto Segre of Cornell University; he remarks that they were
inspired by the Interlisp FOR clisp macro. I modified them only slightly for the
Thinner project. Keywords beginning with a colon specify various iteration op-
tions.
160
6.3 The compiler
The structure of the compiler follows the structure of the denotational semantics
of TL: the compiler builds up the trace graph for an expression by composing the
trace graphs for subexpressions.
The compiler converts a TL function into a nonterminal with a collection of
trace graphs for productions. A nonterminal is represented by a function structure
as shown in Figure 6.2.
;; A function structure is a terminal or nonterminal "symbol" in a
;; trace grammar. All vc vertices with that function are linked to
;; the function structure through (fref-function (vertex-fref v));
;; the frefs are collected in a dlist, so we can find all uses of a
;; given function in the grammar.
(defstruct (function
(:include dlist)
(:print-function print-function))
id			; unique id
class			; term or nonterm
name			; name of this function
tin			; vector of input types
tout			; vector of output types
init-fcon			; a function to initialize functional constraints
nform			; a function to compute normal forms for outputs
refs			; dlist of frefs to this function
graphs			; nonterms: dlist of rhs graphs for this function
transp			; nonterms: list of lists of relations
dtrees			; nonterms: list of decision trees
ni
Figure 6.2: The function structure
Terminal functions (that is, primitive functions in TL) are represented using the
same structure. For each TL primitive there is a predefined function, like the one
161
for addition shown in Figure 6.3. The init-fcon field is a function that initializes
the functional constraints established by a nonterminal vertex for addition. The
nform field is a function that computes the normal form for the output, assuming
that the normal forms for the inputs are already available.
;;*function-plus* is the terminal function for addition
(defconstant *function-plus*
(make-function
.id (incf *function-id*)
class `term
name `+
.refs (new-dlist)
:tin (vector *i* *i*) :tout (vector *i*)
.init-fcon #`(lambda (v g)
(let* ((refs (new-vector 6))
con (new-fcon #`plus-constraint refs))
(a (inec v 0))
(b (inec v 1))
(c (outec v 0))
(one (graph-one g)))
(add-pairref (find-pair a b g *i*) fcon 0
(add-pairref (find-pair b c g *i*) fcon 1
(add-pairref (find-pair a c g *i*) fcon 2
(add-pairref (find-pair a one g *i*) fcon
(add-pairref (find-pair b one g *i*) fcon
(add-pairref (find-pair c one g *i*) fcon
(insert-dlist (graph-fcons g) fcon)
(ftryst fcon g)))
:nform *`(lambda (v i)
(declare (ignore i))
(sum-poly (inform v 0 nil) (inform v 1 nil)))))
Figure 6.3: The function for addition
t g)
t g)
t g)
3 t g)
4 t g)
5 t
Each nonterminal function has a list of trace graphs which are the right-hand
sides of productions for that nonterminal in the grammar. The graph structure,
shown in part in Figure 6.4, records the parts of a trace graph much as you might
162
expect from Definition 2.2. Each vertex in the trace graph is represented by a
vertex structure. For each vertex output there is an ob structure; a graph contains,
in total, one of these ob structures for each vertex output and one for each constant
used in the graph. The ob's are the focus of inference. The analyzer tries to collect
them into equivalence classes, since objects in the same equivalence class can serve
as bypasses for each other during thinning.
TL contains a variety of overloaded operators, but the compiler must decide on
a unique type for each object in the graph. TL includes a way to declare variable
types, but Thinner performs some type inference (in the style of ML) so that type
declarations are often unnecessary. This is just a convenience for the user; when
Thinner decompiles, it always includes full type declarations.
6.4 The decompiler
At any time the user of Thinner can ask to see a decompiled version of the current
grammar. The grammar transformer guarantees that the current grammar is al-
ways normal (Definition 2.19), which means that decompilation is always possible.
Because the grammar is normal, each nonterminal can be decompiled inde-
pendently. Decompilation of a nonterminal is guided by a decision tree (Defini-
tion 2.11) for the set of right-hand sides for that nonterminal, so the first step is
really the computation of a decision tree. Thinner does this by matching, from the
inputs down, all graphs in the set as far as possible, splitting the set on a matched
predicate (there must be a matched one since the set is normal), and then pro-
ceeding recursively on the two smaller sets. There may be more than one decision
tree, and Thinner will find them all. Each decision tree would yield a slightly dif-
ferent (though grammatically and functionally equivalent) decompilation; Thinner
163
(defstruct (graph
(:include dlist)
(:print-function print-graph))
function ; function in which this graph occurs
vi ; a vector of input vertices
vo ; the output vertex
vc ; a dlist of computational vertices
vt ; a dlist of true predicates
vf ; a dlist of false predicates
(defstruct (vertex
(:include dlist)
(:print-function print-vertex))
id ; unique id for debugging
class			; one of `vi, `vo, `vc, `vt or `vf
vin ; vector of obrefs to source object
vout ; vector of output obs
fref ; fref to function of this vertex
graph ; graph in which this vertex occurs
(defstruct
type
vertex
index
name
ec
refs
(ob
(:include dlist)
(:print-function print-ob))
object type
if vertex output, the vertex defining this object
if vertex output, the index in vout of the vertex
print name: constant value, variable name or gensym
equivalence class (ec structure) of object
dlist of obrefs
Figure 6.4: Graphs, vertices and objects
164
selects the first one it finds to guide the decompilation of the nonterminal. (There
is another use of decision trees in Thinner, for which it may need them all: the
computation of thinning obligations, discussed in Section 6.7.)
Decision tree in hand, Thinner proceeds to decompile the set of flght-hand
sides for the nonterminal. All graphs in the set match, up to the predicates at
the first branch in the decision tree. Thinner decompiles this matching part (this
is just like decompiling an individual trace graph without any predicates, which
is straightforward). That first branch in the decision tree splits the set of trace
graphs in two. Thinner emits an if, whose test is the predicate input and whose
two arms are obtained by recursively decompiling the two parts of the set of trace
graphs, guided by the two subtrees of the decision tree.
6.5 The explorer
The explorer generates graphs to be analyzed as potential thinning examples. Us-
ing depth-first iterative deepening, it explores the space of graphs that can be
reached from nonterminals ill the grammar, where each "move" involves selecting
and unfolding one nonterminal in the current graph. This isn't systematic: it will
sometimes develop the same graph along more than one path. It isn't particularly
clever either.
Only one heuristic is employed to guide the explorer toward significant thinning
examples: it unfolds graphs in order, treating those with the most nonterminals
first. (That is, it examines paths starting from graphs with the most nonterminals,
and it unfolds using productions with the most nonterminals.) This helps Thinner
optimize recursive cases, which can be significant improvements, ahead of base
cases, which tend to be minor.
165
(defun dfids ()
(let ((example nil) ; thinning example
thinnings ; dlist of thinnings
kernels) list of kernel graphs
;; Sort productions by number of nonterminals. This controls
;; the order of the search, making us explore base cases last.
(order-graphs)
;; Find the first example by iterative deepening.
(for depth :from 0 :until example
do (multiple-value-setq (example thinnings kernels)
(first-example-in depth)))
;; Thin the grammar using the method of kernels
(thin-by-kernels example thinnings kernels)
;; Clean up the example, the thinnings, and all kernel graphs.
(cleanup-graph example)
(cleanup-thinnings thinnings)
(free-dlist thinnings)
(while kernels
:do (let ((g (mpop kernels)))
(unless (eq g example)
(cleanup-graph g)))))
Finally, clean up any unused parameters, returned values, and
functions, in any parts of the grammar we just fabricated.
(cleanup-interfaces)
nil)
Figure 6.5: Top-level thinning function
166
Figure 6.5 shows the top-level thinning function in Thinner. This function
performs the iterative deepening search and invokes the method of kernels when
an example is found.
6.6 The analyzer
The analyzer examines each trace graph proposed by the explorer, to discover
whether it can be thinned conservatively. If all of the values computed by some
vertex in the graph are unused, no further analysis is necessary. That vertex
can be removed without bypass edges, so the graph can be used as a thinning
example. In general, however, more analysis is necessary: a deeper, semantically
aware analysis that can detect when two vertex outputs are interchangeable (or
when one is constant) for all executions of the graph. For this purpose Thinner
uses the method of kernels and the computation and comparison of normal forms
for expressions, which were developed in Chapter 4.
The analyzer decorates a trace graph with additional information. This infor-
mation remains attached to the graph until something happens to invalidate it
(that is, until the graph is altered). The most significant part of this information
is a collection of equivalence classes of objects. Each equivalence class contains
one or more vertex outputs from the graph; and if the equivalence class is known
to include a constant, that constant is recorded. Thinner uses these equivalence
classes to decide whether there is a conservative thinning for the graph.
6.6.1 Representing relations
The vocabulary of relations for each type is represented with an unsigned 29-bit
word remember that there are 29 primitive integer relations, and fewer prim-
167
itive relations for the other types.2 Figure 6.6 shows some relevant definitions
from Thinner. Note the similarity between the list of primitive integer relations,
*primitives .1*, and the definitions in Table 4.1.
6.6.2 Constructing constraint sets
The method of constraints needs a constraint set for every primitive TL function.
These constraint sets are constructed with mechanical assistance. They are con-
structed at compile time, and when Thinner runs they are available as constants.
(Because this construction takes place at compile time, these functions do not
follow Thinner's usual memory-management regimen.)
Figure 6.7 shows the Thinner function that returns a constraint set for cdr. The
intricacy of this example is about average for constraint set constructors: some,
like addition, require more finesse; others, like the boolean operators, are trivial.
The constraint set constructors generate sets of primitive relations. As noted in
Section 4.2.2, these are reduced automatically to minimal sets of cubes before being
used by Thinner.
If errors creep into a constraint set, Thinner will make errors in inference.
Because the method of constraints yields assertions rather than proofs, such errors
are very difficult to trace. To reduce the risk of this kind of error, I built functions
to test each of the constraint sets. Figure 6.8 shows the test for the cdr constraint
set constructed by the previous example.
2This representation allows the Lucid compiler to generate in-line instructions for operations
on relations. With less care it is easy to end up with a library call for every logical-and and
logical-or.
168
;; Type rel is a relation (in any vocabulary)
(deftype rel () `(unsigned-byte 29))
;; (conj-rel a b ...) takes any two or more relations and returns
;; their conjunction. The result is garbage unless the relations
;; are from the same vocabulary.
(defmacro conj-rel (&rest rels)
`(the rel (logand ,.rels)))
;; *primitives.i* is a list of the symbols for primitive integer
;; relations.
(defconstant *primitives i*
`(--<2+ --<1 O-<1 0-<2+ -+<2+ -+<1 -+=0
+-<1 +-<2+ 0+<2+ 0+<1 ++<2+ ++<1
--=0 00=0 ++=0
-->2+ -->1 -0>1 -0>2+ +->2+ +->1 +-=0
-+>1 -+>2+ +0>2+ +0>1 ++>2+ ++>1))
;; Define each symbol for a primitive integer relation, giving each
;; one a bit of its own
(for name :in *primitives.i*
:as bitno :from (1- (length *primitives.i*)) :by -1
:do (eval `(defconstant
,name
(ash 1 ,bitno))))
Figure 6.6: Definitions for relations on integers
169
(list-cdr-base) -- This returns a list of the possible triples of
primitive list relations (a::b a::nil b::nil) given that b is the
cdr of a.
(defun list-cdr-base ()
(flet
((mklrel
(eval
(5 c mag)
(intern (concatenate
`string (if 5 us? U??) (if c "0" "")
(case mag
(--H2 "<2+") (--H1 "<1") (0 "=0")
(1 ">1") (2 ">2+")))))))
(for acn in `(t nil) ; car a = car nil?
:join
(for amagn :in (if (not acn) `(1 2) `(0 1 2))
:join
(for bmagn :in (if (<= amagn 1) `(0) `(1 2))
:join
(for bcn :in (if (zerop bmagn) `(t) `(t nil))
:join
(for acb :in (cond ; car a = car b?
((or (and acn bcn) (zerop amagn))
((not (or acn bcn)) `(t nil))
(t `(nil)))
collect
(list (mklrel
(mklrel
(mklrel
t
t
t
acb (if (zerop amagn) 0 1))
acn amagn)
bcn bmagn)))))))))
Figure 6.7: Constructorforthe cdr constraint set
170
(test-list-cdr-base) -- Test the list of tuples produced by
list-cdr-base. We generate all lists of length less than 5 over a
three-integer alphabet, and test each one by taking its cdr and
generating the appropriate tuple of primitive relations. If any
tuple so generated does not appear on the list produced by
list-cdr-base, we report it If any tuple on the list produced by
list-cdr-base is not generated by this process (or if it occurs on
the list more than once) we report that
(defun test-list-cdr-base ()
(labels ((3lists (len)
(if (zerop len)
(list nil)
(for x :in (3lists (1- len))
join (list (cons 0 x) (cons 1 x) (cons 2 x))))))
(let ((lists (for i :from 0 :to 4 :join (3lists i)))
(base (list-cdr-base))
found)
(for a :in lists
:do (let* ((b (cdr a))
(tuple (list
(constant-ilrel a b)
(constant-ilrel a nil)
(constant-ilrel b nil))))
(cond ((member tuple found :test #`equal))
((member tuple base :test *)equal)
(setq found (cons tuple found)))
(t (format t `,Tuple -s not found" tuple)
(format t " for -s = (cdr S)?7."
b a)))))
(setq base (for tuple :in base
unless (member tuple found :test #`equal)
collect tuple))
(when base
(format t "?D tuples not sought: 5"
(length base) base)))))
Figure 6.8: Testing the cdr constraint set
171
6.6.3 Relational constraint implementation
The relational constraint algoritlim presented in Section 4.2.3 is implemented in
Thinner, with several optimizations as noted here. (These optimizations do not
effect the asymptotic worst-case complexity of the method, which is still 0(n2) in
the size of the graph, but they yield significant linear improvements.)
The constraint computation described in Section 4.2.2 was a general procedure
for any size tuple of relations. In practice I found it more efficient to construct
specialized a version of it for each tuple size. Figure 6.9 shows the function that
computes constraints on triples of relations.
In the earlier presentation of the relational constraint algorithm, the collection
of constraints waiting to be computed was treated as a set. The algorithm did
not specify how the next constraint to apply was to be chosen from this set. For
asymptotic worst-case analysis it makes no difference which of the waiting con-
straints is chosen, but in practice it makes a significant difference. Obviously it's
a good idea to take big strengthening steps rather than small ones, and the choice
of the next constraint to perform should reflect this:
o+ I experimented with choosing the next constraint in LIFO, FIFO, and ran-
dom order. Treating the collection of constraints as a queue gave the best
performance, while treating it as a stack gave the worst. The reason seems
to be that the "older" a constraint is, the more likely it is that its pairs have
changed since last it was computed.
o+ I experimented with giving preference to transitivity constraints, with giving
preference to all other constraints, and with treating all constraints equally.
Leaving transitivity constraints for last gave the best performance, while
doing them first gave the worst. The reason seems to be that transitivity
172
g table) makes
pairs. pi, p2
the constraint
(tableconstraint3 pi p2 p3
computation for triples of
is the graph, and table is
the constraint
and p3 are the pairs; g
set
(defun table-constraint3 (pi p2 p3 g table)
(let ((a (pair-rel pi))
(b (pair-rel p2))
(c (pair-rel p3))
(ra 0) (rb 0) (rc 0)
ta tb tc)
(declare (type rel a b c ra rb rc ta tb tc))
(for con :in table
:do (setq ta (conj-rel a (the rel (first con))))
(unless (zerop ta)
(setq tb (conj-rel b (the rel (second con))))
(unless (zerop tb)
(setq tc (conj-rel c (the rel (third con))))
(unless (zerop tc)
(setq ra (disj-rel ra ta))
(setq rb (disj-rel rb tb))
(setq rc (disj-rel rc tc))))))
(strengthen-pair pi ra
(strengthen-pair p2 rb g)
(strengthen-pair p3 rc
Figure 6.9: Constraining triples of relations
173
constraints tend to be much weaker than, for example, functional constraints,
yielding strengthening in smaller steps.
Making the right choices here instead of the wrong ones improved the experimental
performance of the relational constraint method by about a factor of three.
The initial installation of constraints was described in Section 4.2.3 as a fauc
tion Init. Init placed all constraints from the graph in the starting collection of
constraints to be computed. This makes the proof of correctness easier but turns
out to be unnecessary. Transitivity, list property, and transparency constraints
have no effect on tuples of top relations, the initial state of most relations in the
table. Wfth a little care these unnecessary initial constraint computations can be
avoided.
Lines 9-12 in Init also install a number of redundant transitivity constraints.
If both (a, b) and (b, a) participate in functional constraints, this algorithm will
install transitivity constraints both for the triple (a, b, c) and for the triple (b, a, c),
for every c. In fact we only need one transitivity constraint per triangle: one for
each set fa, b, cl, not one for each tuple (a, b, c). This can be accomplished by
ordering the objects, then installing a transitivity constraint for (a, b, c) only if
a E b E c (and, as usual, only if (a, b) or (b, a) has a functional constraint).
For simplicity of presentation the algorithm treated the collection of strongest
known relations (Rel) as one big table indexed by pairs of objects. In fact, we only
index using pairs of objects of the same type, and it makes more sense to maintain
a separate table of relations for each type. Figure 6.10 shows the Thinner function
that prints the relation for every pair of equivalence classes in the given graph.
(It`s only used for debugging.)
A more difficult optimization Thinner uses is to reason not about objects but
about equivalence classes of objects. Whenever a relation becomes as strong as
174
(show-rels g) prints the current relation for every pair of ecs
ing. Inference structure must be present in the graph. We
print each relation in one direction only (that is, we treat the
rels tables as triangular), but we check to be sure that
rel(i,j) = rev(rel(j,i)).
(defun show-rels (g)
(for type :from *minfulltype* :to *maxfulltype*
do
(format t "?7.Type D:?70" type)
(let* ((rels (svref (graph-rels g) type))
(n (array-dimension rels 0)))
(for i :from 0 :to (1- n)
:do (for j :from i :to (1- n)
:do (let ((pair (svref (svref rels i) j)))
(when pair
(let* ((eci (pair-ed pair))
(ec2 (pair-ec2 pair))
(rel (pair-rel pair))
(rp (pair-rev pair))
(rrel (pair-rel rp)))
(format t			eci)
(print-rel rel type)
(format t S?7." ec2)
(assert (= rel (rev-rel rrel type)))))))))))
Figure 6.10: Examining the relations for a graph
175
equality, Thinner merges two equivalence classes and removes a row and column
from the table. This makes a big difference in the number of transitivity constraints
that are computed, and helps make up for using only 0(n2) of them.
6.7 The transformer
We saw in Section 5.6 that one can apply the method of kernels to trace grammars
just by generalizing each step from strings to graphs. But this generalization is far
easier to imagine than to implement.
6.7.1 Graph and subgraph matching
The first hurdle is graph matching. The method of kernels uses string matching
in several places. For example, the thinning step repeatedly finds and thins an
instance of a string 6 in a target string. The graph counterpart, naturally, is to
repeatedly find and thin an instance of a graph 6 in a target graph. But this
is subgraph matching, which is NP-complete: potentially, a far more expensive
operation than substring matching. Another example is the test Kernelize makes
to determine whether the new nonterminal Xp has already been created for a
particular string P. In the graph universe, we need some kind of dictionary we can
use to look up the graph P.
Although subgraph matching is NP-complete, Thinner's application of it is sur-
prisingly inexpensive in practice. Trace graphs have several features that contribute
to this: they are dags, they have low degree (most vertices have outdegree 1), and
the vertices are labeled with functions that must match. I do not know whether
the subgraph matching problem for trace grammars generated from TL programs
is actually NP-hard. I have not been able to construct a family of programs that
176
makes Thinner's subgraph matching routines exhibit exponential-time behavior.
These are some of the graph-matching tasks Thinner performs:
o+ Determine whether two predicates match--Hthat is, whether their trees of
predecessors are identical.
o+ Find an isomorphism, if there is one, between two graphs, treating the order
of the graph inputs and outputs as irrelevant.
0 Find each way one graph can be a subgraph of another.
o+ Find each way two graphs can overlap (this is used to find kernels of a thinning
example).
o+ Look up a graph in a hash table of graphs.
6.7.2 Thinning right-hand sides
A thinning example for a graph can often be thinned in more than one way. Fre-
quently an example will turn up in which two vertex outputs are equivalent. Which
one should be removed and which one retained? Luckily, the decision can easily
be postponed. The method of kernels works to make an example graph explicit in
the grammar; how that graph is altered once it surfaces as the subgraph of some
right-hand side of a production is unconstrained. (For this reason the method of
kernels might be useful with transformations other than thinning.)
A thinning example for Thinner is a graph, together with a non-empty list of
potential thinnings of that graph. Figure 6.11 shows the structure used in Thinner
to represent an individual thinning. (The weights are not immediately significant
for a thinning example; they are used, as described below, when the example is
matched to some graph in the grammar.)
177
A thinning records a vertex to be removed and a vector of obs to
bypass the outputs of that vertex. Some thinnings are examples
(that is, they refer to vertices in thinning example graphs) and
some are "real" (that is, they refer to vertices in graphs in the
current grammar). All real thinnings are derived from some
example, and are linked to that example through thinning-exth
(defstruct (thinning
(:print-function print-thinning)
(:include dlist))
tv ; vertex to be removed
bypass ; vector of obs for bypass
weight ; numeric weight, or nil
exth ; if example, nil; if real, link to example
obligations) ; dlist of thinning obligations, or nil
Figure 6.11: The representation for a thinning
When the thinning example matches a subgraph of some graph in the grammar,
the individual thinnings for that example are weighed in context. The weight
assigned to a thinning depends on three criteria, given in order with the most
important first:
1. How many re-uses of previously created nonterminals Xp does the thinning
yield?
2. How many nonterminal vertices does it eliminate?
3. How many terminal vertices does it eliminate?
These three criteria are combined into a numeric weight in such a way that less
significant criteria can affect the comparison of two thinnings only if they are equal
in terms of more significant criteria. The weight assigned to a thinning is heuristic.
In fact, there is no exact way to compare two thinnings, since a thinning may affect
only unreachable parts of the computation and there's no sure way to detect this.
178
Eliminating primitive computations is the ultimate goal, so the least significant
component of the weight is obviously worth checking. The second component, the
number of nonterminals eliminated, should clearly have even greater significance:
it's a good bet (though not certain) that each nonterminal eliminated saves many
primitive computations. But what about the most significant part of the weight
how can the re-use of previously created nonterminals be important?
Consider the Fibonacci example from Section 5.6. In that example, we pre-
tended that there was only one way to thin the example in Figure 5.5. In fact,
there are four, as shown in Figure 6.12 below. Thinner postpones the decision
about which one to use and remembers all four possibilities while applying the
method of kernels.
The thinning example matches a subgraph of the productions for Xp con-
structed in Figure 5.8. At this point Thinner assigns a weight to each of the four
possibilities. Without considering the question of Xp re-use, Thinner might make
the wrong choice between thinnings C and D: in this context they both eliminate
one terminal and one nonterminal. But thinning D is in fact vastly superior: thin-
ning C will ultimately yield another exponential-time expression of the Fibonacci
function (though with the exponent reduced).
In short, the re-use of a previously created nonterminal Xp is desirable because
it means the re-use of a thinned part of the grammar.
One important difference between the graph case of the grammar thinning
problem and the string case is that in the graph case we have to make sure we end up
with an executable grammar (in the sense of Definition 2.12). If we always thin the
right-hand sides of productions whenever we can, without regard to executability,
we may end up with a trace grammar that is unexecutable one that cannot be
decompiled back into a program.
179
fib
fib
fib
Figure 6.12: Four thinning possibilities
fib
180
We've already observed that Thinner maintains a normal grammar at all times.
This means that it won't thin the right-hand side of a production without first
guaranteeing that there will still be a decision tree for the set of right-hand sides
of that nonterminal. Usually this can be verified immediately: if a thinning does
not alter any predecessor of a predicate in the graph then it has no effect on the
decision trees for the set, and it can be performed with impunity. Sometimes the
guarantee takes the form of "thinning obligations": by inspection of decision trees,
Thinner determines that the thinning in question can be performed only if one or
more additional thinnings are performed at the same time. Occasionally, a thinning
must be turned down fiat: this happens when Thinner can see no way to carry out
the thinning while maintaining a normal grammar.
Figure 6.13 shows the routine Thinner uses to assign a weight to a thinning.
Note that it checks whether any predecessor of a predicate is altered by the thin-
ning, and if so it computes any thinning obligations. The thinning is performed
provisionally, weighed, and then retracted. The final weight includes the weight of
any thinning obligations.
6.7.3 Replacing a group of vertices
Another interesting complication in implementing the method of kernels for trace
grammars arises at the step in which a group ? is replaced by a new nonterminal
Xp. In the string case we're dealing with adjacent symbols in a string, and the
operation of replacing them with a single symbol is well defined. But consider
the graph case. We've identified kernels in a graph, and we want two vertices to
be in the same group if and only if they occur in the same instance of a kernel.
But the groups constructed in this way are not necessarily replaceable by a single
nonterminal vertex, at least not without introducing cycles.
181
(weigh-thinning th ex kmap) sets (thinning-weight th) for a real
thinning: one that isn't just an example but really thins some
graph in a function. We test whether the thinning alters the
predecessors of any predicates; if so, we compute the thinning
obligations -- thinnings of other graphs in the same function --
and establish a dlist of them in (thinning-obligations th). The
weight of the thinning reflects the excised vertices; whether the
thinning generates any kernels; and the weights of the thinning
obligations
(defun weigh-thinning (th ex kmap)
(setf (thinning-weight th) nil)
(when (proper-thinning? th)
(let* ((v (thinning-tv th))
(g (vertex-graph v))
(f (graph-function g))
(gpreds (get-altd-preds th)))
(when gpreds (validate-dtrees f g))
(multiple-value-bind (cutv b dead) (do-thinning th)
(when gpreds (get-obligations ex th gpreds kmap cutv dead))
(when (or (not gpreds) (thinning-obligations th))
(set-cut-weight th cutv)
(when (thinning-obligations th)
(for oth :indlist (thinning-obligations th)
:do (incf (thinning-weight th)
(thinning-weight oth))))
(add-overlap-weight th kmap))
(abort-thinning cutv b dead))
(when gpreds (free-vector gpreds))))
nil)
Figure 6.13: Weighing a thinning, with obligations
182
T
lir
6 =			*			*
*			*
*
K=f
F
Figure 6.14: A thinning problem that makes grouping difficult
183
Figure 6.14 shows all initial grammar, a thinning example, and a kernel set.
(The program requires loop invariant removal: the two multiplications are loop
invariant and should be computed only once. Thinner's performance on this ex-
ample is discussed further in Section 6.8.4.) Observe that there is an instance of
the kernel in the first production for lir. Let's match the kernel as a subgraph of
the graph and try to form the appropriate group. The first multiplication is in the
group; the second multiplication is in the group; its successor, the addition, is not
in the group; its successor, the nonterminal lir, is in the group again. So if we
try to replace this group with a new nonterminal in the obvious way, we end up
introducing a cycle as shown in Figure 6.15.
lir
F
Figure 6.15: Result of naive grouping: a cycle
Any implementation of the method of kernels for trace grammars must deal
with this problem. I experimented with two approaches. The first involves allowing
group membership to bleed from group vertices to other, neighboring vertices as
necessary to prevent cycles. (In the example this means expanding the group to
include the addition.)
The second approach involves freezing vertices, eliminating them from the
184
group. It starts by detecting any vertex in the group from which another ver-
tex in the group can be reached by a path that leaves the group. (In the case
of the example, the two multiplications pass this test.) These vertices are then
frozen: they are not included in the group, but their outputs are recorded as be-
ing available for bypasses in the new nonterminal. If they are used when the new
nonterminal is thinned, these values end up as new function inputs.
Neither approach is satisfactory in all cases. Bleeding is easier and is always
possible, but it is often the wrong thing to do. The re-use of nonterminals, the
importance of which has already been noted, is curtailed by spurious additions to
groups. Freezing is safer but is not always an option, since vertices that are going
to be thinned cannot be removed from the group.
Thinner uses freezing whenever it can. (In the loop invariant example, it freezes
the two multiplications.) Occasionally it encounters a group that can't be decircu-
larized by freezing alone, and then it resorts to bleeding. This combination suffices
for many examples, but I suspect that a more general solution to the problem may
be possible. Perhaps one could go ahead and introduce circularities, carry though
the method of kernels, and then remove any circularities in a final step.
6.7.4 Constructing a kernel set
The method of kernels requires a kernel set: a set of graphs that are minimal
precursors of the thinning example in the grammar. Where does this kernel set
come from?
It is certainly possible to compute the set of all kernels for a given grammar
and example. Starting with the example, find all ways it can overlap with the
right-hand side of any production in the grammar. For each such overlapping,
construct the "previous" graph (the graph with the overlapped region replaced by
185
the corresponding nonterminal), and add that to the set of kernels. Repeat the
process for each new kernel until no new graphs are found.
I implemented this and found it to be quite expensive. The computation of an
exhaustive kernel set of size greater than about 50 dominated the cost of grammar
thinning. And it is not difficult to construct a family of examples for which the
size of the exhaustive kernel set is exponential in the size of the example.
Luckily, I found that few of the kernels from the exhaustive kernel set were ever
used. (That is, few actually occurred as a subgraph in some graph examined by
the method of kernels.) In fact, in most cases, the number of kernel graphs actually
used was exactly the depth of derivation at which the example was found and the
kernels used were exactly the minimal precursors of the example along the path of
derivation on which it was discovered.
Thinner takes advantage of this. It does not construct the exhaustive kernel
set, but constructs a basic kernel set instead: the set of minimal precursors of the
thinning example along the path of derivation on which it was discovered. In those
cases where this basic set does not include all useful kernels from the exhaustive set,
Thinner will miss some thinnings. This occurs in one instance in the transcripts
that follow, but the thinning that is missed because of the nonexhaustive kernel
set is found again in a later thinning step. (This later step is a thinning with the
same example, found on a different path of derivation.)
6.7.5 Cleaning up the results
The method of kernels may leave the program in an unattractive state. It creates
a new function for every distinct grouped subgraph it encounters. Sometimes,
when the group recurs after unfolding and thinning, these functions are recursive,
but more often they are nonrecursive and have exactly one use. These fabricated
186
functions may also be thinned in such a way that they wind up with unused inputs
or outputs.
Thinner cleans up all fabricated functions after carrying out the method of
kernels. It unfolds and eliminates all nonrecursive fabricated functions and it
eliminates any unused inputs and outputs from fabricated functions. It does not
perform this cleanup on functions provided by the compiler, but only on those that
were fabricated by the method of kernels.
These cleanup operations look like optimizations: in-line function expansion
(or "open coding") is a traditional compiler optimization, and the elimination
of unused function inputs and outputs seems like one too. But note that from
Thinner's point of view these are neutral transformations. Our formal statement
of the PLC does not attach any cost to a nonterminal other than the cost of the
terminals into which it expands.
Actually, these operations do have an impact that is more that merely aesthetic.
They affect the future behavior of the explorer by contracting the search space. The
explorer performs fewer unfoldings when fabricated functions have been inlined.
This can change the order in which Thinner treats future thinning examples.
6.8 Thinner transcripts
These examples are in the form of transcripts of Thinner sessions. They are exact
transcripts, except that some of the variable names generated by Thinner have been
changed to improve readability. (Thinner tries to preserve the original variable
names, but when it needs new variables it uses gensym to make new names.)
187
The examples use three top-level Thinner functions:
(suite filename) Compile the file named by the string filename (from Thin-
ner'stest-suitedirectory) and usethe resultingtrace grammarasthecurrent
grammar. This also resets the cumulative timer mentioned below.
(thin) Find one violation of the PLC and repair it in the current grammar. (If
no violation can be found this function diverges.) When it returns, this
function prints the message "Accumulated optimization time:" followed by
a measure, in seconds, of the time spent in thin; this time is computed
using the Common Lisp function get-internal-run-time. The timer is
cumulative, so it will grow with each call to (thin), until reset by (suite
filename)
(thin n) The thin function takes an optional parameter, which is a count of
iterations: (thin 2) is the same as two evaluations of (thin).
(decompile-all) Decompile the current grammar and print the resulting pro-
gram.
6.8.1 Common subexpression elimination
Common subexpression elimination is just a finger exercise for Thinner. Common
subexpressions could usually be detected without using the method of constraints,
just by inspecting the normal forms of expressions, and the optimization could
usually be performed just by thinning the right-hand sides of productions in the
grammar, without employing the method of kernels. But Thinner does not take
any shortcuts. It's a pretty large hammer for little nails like this:
188
> (suite IIcseiU)
Compiling test program CSEi (NIL).
(DEFUN CSEi (A B)
(CONS (OR A B) (CONS (OR A B) (CONS (OR A B) BNIL))))
NIL
> (thin)
Accumulated optimization time: O.O5s
NIL
> (decompile-all)
(DEFUN CSEi (A B)
(DECLARE (BOOLEAN A) (BOOLEAN B))
(LET ((C (OR A B))) (CONS C (CONS C (CONS C BNIL)))))
NIL
Thinner performs ?nte?wcedura1 common subexpression elimination with equal
facility. The first thinning it finds in this example is a normal common subexpres-
sion elimination:
> (suite 1cse3U)
Compiling test program F (NIL).
(DEFUN F (X Y Z) (+ (* Y X) Z))
Compiling test program CSE3 (NIL)
(DEFUN CSE3 (A B C) (F (+ A B) C (* C (+ A B))))
NIL
> (thin)
Accumulated optimization time: O.O4s.
NIL
189
> (decompile-all)
(DEFUN CSE3 (A B C)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C))
(LET ((D (+ A B))) (F D C (* C D))))
(DEFUN F (X Y Z)
(DECLARE (INTEGER X) (INTEGER Y) (INTEGER Z))
(+ (* Y X) Z))
NIL
But ask it for another thinning, and it will find and eliminate the interproce-
dural common subexpression:
NIL
> (thin)
Accumulated optimization time: O.18s.
NIL
> (decompile-all)
(DEFUN CSE3 (A B C)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C))
(LET ((D (* C (+ A B)))) (+ D D)))
(DEFUN F (X Y Z)
(DECLARE (INTEGER X) (INTEGER Y) (INTEGER Z))
(+ (* Y X) Z))
Note that in this case Thinner has cleaned up a fabricated function. The
method of kernels yields this, before cleanup:
NIL
(DEFUN CSE3 (A B C)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C))
(G A B C))
(DEFUN F (X Y Z)
(DECLARE (INTEGER X) (INTEGER Y) (INTEGER Z))
(+ (* Y X) Z))
(DEFUN G (A B C)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C))
(LET ((D (* C (+ A B))))
(+ D D)))
190
Thinner then unfolds and eliminates the fabricated function G, as discussed
above. In the examples below I will let this cleanup pass without further comment.
6.8.2 Dead code elimination and constant folding
Dead code elimination is another easy case ofthinning. Here's an example in which
one arm of a conditional can never be executed. In this example Thinner reasons
that an integer times its successor can never be equal to 1.
> (suite "dcei")
Compiling test program DOEl (NIL).
(DEFUN DCEi (A) (IF (EqUAL (* A (+ A 1)) 1) A 0))
NIL
> (thin)
Accumulated optimization time: O.03s.
NIL
> (decompile-all)
(DEFUN DCEi (A) (DECLARE (INTEGER A)) (IF FALSE A 0))
NIL
> (thin)
Accumulated optimization time: O.04s.
NIL
> (decompile-all)
(DEFUN DCEi (A) (DECLARE (INTEGER A)) 0)
NIL
It actually takes two steps to complete the dead code elimination. The first
step notes that the value on which the program branches is constant, while the
second thins away the predicate (and one of the two graphs in the executable set
of productions for DCEi).
191
All expression can also be "dead" in the sense that it computes a value which
is never used. In the following example, the expression (= 0 D) can never be
true; the inference engine discovers this as described in Section 4.2.4. Thinner
first eliminates (EQUAL C D), replacing it with the constant false. This lets it
eliminate the dead computations of C and D. Then, in another thinning, it discards
the predicate and the graph with no executions.
> (suite "dce2")
Compiling test program DCE2 (NIL).
(DEFUN DCE2 (X Y)
(LET ((A (CONS 1 X)) (B (CONS 0 INIL)))
(LET ((C (APPEND A B)) (D (APPEND Y A)))
(IF (EQUAL C D) (LENGTH A) (LENGTH B)))))
NIL
> (thin)
Accumulated optimization time: O.06s.
NIL
> (decompile-all)
(DEFUN DCE2 (X Y)
(DECLARE ((LISTOF INTEGER) X) ((LISTOF INTEGER) Y))
(MULTIPLE-VALUE-BIND (VO Vi)
(IF FALSE (VALUES 1 X) (VALUES 0 INIL))
(LENGTH (CONS VO Vi))))
NIL
> (thin)
Accumulated optimization time: O.06s.
NIL
> (decompile-all)
(DEFUN DCE2 (X Y)
(DECLARE ((LISTOF INTEGER) X) ((LISTOF INTEGER) Y))
(LENGTH (CONS 0 INIL)))
NIL
192
The program is not yet as thin as possible, since the expression (LENGTH (CONS
0 INIL)) is constant. In one more step Thinner discovers this constant folding:
> (thin)
Accumulated optimization time: O.08s
NIL
> (decompile-all)
(DEFUN DCE2 (X Y)
(DECLARE ((LISTOF INTEGER) X) ((LISTOF INTEGER) Y))
1)
NIL
6.8.3 Code sinking
We've already noted that code sinking in trace grammars works just like dead code
elimination. The decision about exactly where to place a conditional is made when
the grammar is decompiled.
> (suite "sink")
Compiling test program SINK (NIL)
(DEFUN SINK (A B)
(LET ((X (* A A))) (IF (> A B) (* x B) (* A B))))
NIL
> (thin)
Accumulated optimization time: 0.Ols.
NIL
> (decompile-all)
(DEFUN SINK (A B)
(DECLARE (INTEGER A) (INTEGER B))
(* (IF (> A B) (* A A) A) B))
NIL
193
6.8.4 Loop invariant removal
llere is the example we used in Section 6.7.3 to explain one of the difficulties that
occurs when applying the method of kernels to trace grammars--Hthe problem of
replacing a group of vertices with a single nonterminal, without introducing cycles.
> (suite `?lir1")
Compiling test program LIRi (NIL).
(DEFUN LIRi (A B) (IF (< A 0) (LIRi (+ A (* B (* B B))) B) A))
NIL
> (thin)
Accumulated optimization time: O.27s.
NIL
> (decompile-all)
(DEFUN LIRi (A B)
(DECLARE (INTEGER A) (INTEGER B))
(IF (<A 0) (LET ((C (* B (* B B)))) (F (+ A C) C)) A))
(DEFUN F (X Y)
(DECLARE (INTEGER X) (INTEGER Y))
(IF (< X 0) (F (+ X Y) Y) X))
NIL
Note that Thinner does not know that the function it's optimizing is a "loop"
This holds for the other loop optimizations that follow. Thinner performs general
elimination of unnecessary code using the method of kernels; sometimes this ends
up looking like one of the traditional loop optimizations, but Thinner treats a loop
the same as any other recursive function.
194
6.8.5 Loop splitting
To Thinner, loop invariant removal and loop splitting are quite similar.
> (suite "split")
Compiling test program F (NIL).
(DEFUN F (A B C)
(IF (EQUAL A B)
C
(+ (IF (< B C) A (* A A))
(F (--H A 1) B C))))
NIL
Thinner observes that the test (< B C) is invariant moves it outside of the loop,
just as it does with any loop-invariant computation. Then in two more thinning
steps, it makes two specialized loop bodies.
> (thin 3)
Accumulated optimization time: i.9s.
NIL
> (decompile-all)
(DEFUN F (A B C)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C))
(IF (EQUAL A B)
C
(MULTIPLE-VALUE-BIND (VO Vi)
(IF (< B C)
(VALUES A (Fl B C (- A 1)))
(VALUES (* A A) (F2 B C (- A 1))))
(+ VO Vi))))
(DEFUN Fl (B C A)
(DECLARE (INTEGER B) (INTEGER C) (INTEGER A))
(IF (EQUAL A B) C (+ A (Fl BC (-Al)))))
(DEFUN F2 (B C A)
(DECLARE (INTEGER B) (INTEGER C) (INTEGER A))
(IF (EQUAL A B) C (+ (* A A) (F2 BC (-Al)))))
195
6.8.6 Loop jamming
Loop jamming is another traditional compiler optimization Thinner can perform.
The following function F takes integer arguments I and START and returns two
values: the sum of START and the integers from 0 to I, and the sum START and
the squares of the integers from 0 to I. These two values are computed in separate
loops, and naturally we want to jam the loops together. Thinner accomplishes this
optimization, although by the fortunes of blind search it finds several others first.
> (suite "jam")
Compiling test program SUM (NIL)
(DEFUN SUM (I SOFAR)
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
Compiling test program SUMSQ (NIL).
(DEFUN SUMSQ (I SOFAR)
(IF (EQUAL I 0) SOFAR (SUMSQ (- I 1) (+ SOFAR (* I I)))))
Compiling test program F (NIL)
(DEFUN F (I START) (VALUES (SUM I START) (SUMSQ I START)))
NIL
Thinner first observes that near the end of the recursion in SUMSQ, the program
will compute (* I I) when I is 1.
> (thin)
Accumulated optimization time: O.58s.
NIL
196
> (decompile-all)
NIL
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(VALUES (SUM I START) (SUMS? I START)))
(DEFUN SUMS? (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EqUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EqUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER ?))
(IF (EqUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
Next, Thinner observes that, when both SUM and SUMS? are unfolded in the body
of F, there are two computations of (= I 0). This looks like the loop jamming
we were hoping for. It takes several thinning steps since there is dead code to be
eliminated: when the two instances of (= I 0) are made into one, Thinner realizes
that one cannot be true and the other false, and eliminates the unreachable code.
This captures the idea that SUM and SUMS? must terminate after the same number
of iterations.
> (thin 4)
Accumulated optimization time: 2.Ois.
NIL
197
> (decompile-all)
NIL
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(IF (EQUAL I 0)
(VALUES START START)
(LET ((X (--H I 1)))
(VALUES (SUM X (+ START I)) (G I START X)))))
(DEFUN SUMSQ (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER Y))
(IF (EQUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
It seemed like that last step was going to give us loop jamming, but it didn't.
It only jammed the first iteration. What went wrong?
The problem lies ill Thinner's use of a nonexhaustive kernel set. It got its
thinning example by unfolding SUM and SUMSQ in the body of F, and this path of
derivation was used to construct the basic kernel set. But there is another way to
get the same thinning example: by unfolding SUM and the fabricated function G
together. If Thinner had constructed an exhaustive kernel set, it would have found
this and completed the loop jamming. (Or if it had found this example before
constructing the function G, the basic kernel set would have been sufficient.)
But as things stand, another instance of the same thinning example is still there
to be found. It will only be found when Thinner unfolds SUM and G in the body
of F the explorer will do this when it looks for examples at depth two, but it
now has additional examples at depth one. In particular, it identifies a redundant
198
computation of the value (+ START I). When G is unfolded in the body of F it
computes that value twice.
> (thin)
Accumulated optimization time: 2.51s
NIL
> (decompile-all)
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(IF (EQUAL I 0)
(VALUES START START)
(LET ((M (--H I 1)))
(MULTIPLE-VALUE-BIND (VO Vi)
(IF (EQUAL M 0)
(LET ((N (+ START I))) (VALUES N N))
(VALUES (G M (+ START (* I I)) (- M 1))
(+ START I)))
(VALUES (SUM M Vi) VO)))))
(DEFUN SUMSQ (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER Y))
(IF (EQUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
NIL
Next, still at depth one, there is a redundant computation of (EQUAL M 0).
When SUM is unfolded in the body of F it computes (EQUAL M 0) twice.
> (thin 2)
Accumulated optimization time: 3.26s.
NIL
199
> (decompile-all)
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(IF (EQUAL I 0)
(VALUES START START)
(LET ((M (--H I 1)))
(IF (EQUAL M 0)
(LET ((N (+ START I))) (VALUES N N))
(VALUES (SUM (- M 1) (+ (+ START I) M))
(G M (+ START (* I I)) (--H M 1)))))))
(DEFUN SUMSQ (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER Y))
(IF (EQUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
NIL
Now there is an optimization at depth zero: aredundant computation of (- M
1) inthebodyofF. (Notethatthecomputationof(+ START I) is notredundant.)
> (thin)
Accumulated optimization time: 3.45s.
NIL
> (decompile-all)
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(IF (EQUAL I 0)
(VALUES START START)
(LET ((M (--H I 1)))
(IF (EQUAL M 0)
(LET ((N (+ START I))) (VALUES N N))
(LET ((P (- M 1)))
(VALUES (SUM P (+ (+ START I) M))
(G M (+ START (* I I)) P)))))))
200
(DEFUN SUMSQ (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER Y))
(IF (EQUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
NIL
Finally, there are no more thinning examples to be found before depth two.
Thinner now finds the other way to generate the thinning example for loop jam-
ming, the one we missed before by using a nonexhaustive kernel set. Unfolding
both SUM and G in the body of F yields a graph that computes (EQUAL P 0) twice.
Subsequent thinnings clean up the dead code that results, and this completes the
optimization.
> (thin 4)
Accumulated optimization time: 8.is.
NIL
> (decompile-all)
(DEFUN F (I START)
(DECLARE (INTEGER I) (INTEGER START))
(IF (EQUAL I 0)
(VALUES START START)
(LET ((M (--H I 1)))
(IF (EQUAL M 0)
(LET ((N (+
(H (+ START
(DEFUN SUMSQ (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (G I SOFAR (- I 1))))
(DEFUN SUM (I SOFAR)
(DECLARE (INTEGER I) (INTEGER SOFAR))
(IF (EQUAL I 0) SOFAR (SUM (- I 1) (+ SOFAR I))))
START I))) (VALUES N N))
(* I I)) (+ (+ START I) M) M (--H M 1))))))
201
(DEFUN G (I SOFAR Y)
(DECLARE (INTEGER I) (INTEGER SOFAR) (INTEGER Y))
(IF (EqUAL Y 0)
(+ SOFAR 1)
(G Y (+ SOFAR (* I I)) (- Y 1))))
(DEFUN H (A B C D)
(DECLARE (INTEGER A) (INTEGER B) (INTEGER C) (INTEGER D))
(IF (EqUAL D 0)
(VALUES B (+ A 1))
(H (+ A (* CC)) (+ B D) D (-Dl))))
NIL
6.8.7 Fibonacci reformulation
This is the Fibonacci example we walked through in Section 5.6. The first thinning
step does most of the work, making the running time linear.
> (suite "fib?')
Compiling test program FIB (NIL)
(DEFUN FIB (A) (IF (<A 2) A (+ (FIB (-Al)) (FIB (-A 2)))))
NIL
> (thin)
Accumulated optimization time: O.28s.
NIL
> (decompile-all)
(DEFUN FIB (A)
(DECLARE (INTEGER A))
(IF (< A 2)
A
(MULTIPLE-VALUE-BIND (X Y) (F A (- A 1)) (+ Y X))))
202
(DEFUN F (A B)
(DECLARE (INTEGER A) (INTEGER B))
(IF (< B 2)
(VALUES (FIB (- A 2)) B)
(MULTIPLE-VALUE-BIND (X Y)
(F B (- B 1))
(LET ((T4069 (- A 2))) (VALUES Y (+ Y X))))))
NIL
This is good, but not yet as thin as possible. There is an unnecessary compu-
tation of (- A 2). A second effort removes it:
> (thin)
Accumulated optimization time: O.33s.
NIL
> (decompile-all)
(DEFUN FIB (A)
(DECLARE (INTEGER A))
(IF (< A 2)
A
(MULTIPLE-VALUE-BIND (X Y) (F A (- A 1)) (+ Y X))))
(DEFUN F (A B)
(DECLARE (INTEGER A) (INTEGER B))
(IF (< B 2)
(VALUES (FIB (- A 2)) B)
(MULTIPLE-VALUE-BIND (X Y)
(F B (- B 1))
(VALUES Y (+ Y X)))))
NIL
We're still not done. Thinner notes that in F, if (< B 2), then A must be 2.
> (thin)
Accumulated optimization time: O.68s.
NIL
203
> (decompile-all)
(DEFUN FIB (A)
(DECLARE (INTEGER A))
(IF (< A 2)
A
(MULTIPLE-VALUE-BIND
(DEFUN F (B)
(DECLARE (INTEGER B))
(IF (< B 2)
(VALUES B (FIB 0))
(MULTIPLE-VALUE-BIND (X Y)
(F (- B 1))
(VALUES (+ X Y) X))))
NIL
(X Y) (F (- A 1)) (+ x y))))
Now (FIB 0) is a constant, which Thinner eliminates in two more steps:
> (thin 2)
Accumulated optimization time: 1.21s
NIL
> (decompile-all)
(DEFUN FIB (A)
(DECLARE (INTEGER A))
(IF (< A 2)
A
(MULTIPLE-VALUE-BIND (X Y) (F (- A 1)) (+ x y))))
(DEFUN F (B)
(DECLARE (INTEGER B))
(IF (< B 2)
(VALUES B 0)
(MULTIPLE-VALUE-BIND (X Y)
(F (- B 1))
(VALUES (+ X Y) X))))
NIL
We're still not done. For (FIB 2) the program will compute (+ X Y), where Y
is zero. Thinner repairs this in the next step:
204
> (thin)
Accumulated optimization time: 1.5s.
NIL
> (decompile-all)
(DEFUN FIB (A)
(DECLARE (INTEGER A))
(IF (< A 2) A (MULTIPLE-VALUE-BIND
(DEFUN H (B)
(DECLARE (INTEGER B))
(IF (< B 2)
(VALUES B B)
(MULTIPLE-VALUE-BIND (X Y)
(H (- B 1))
(VALUES (+ X Y) X))))
NIL
(X Y) (H (- A 1)) X)))
In fact, Thinner never stops finding optimizations for this function. For each
subsequent thinning, it extracts the next base case of the computation and renders
it as a constant. If we thin, say, five more times we'll end up with this:
> (thin 5)
Accumulated optimization time: 5.21s.
NIL
> (decompile-all)
(DEFUN FIB
(DECLARE
(IF (< A
A
(LET
(IF
(A)
(INTEGER A))
2)
((B (--H A 1)))
(< B 2)
B
(LET ((C
(IF (<
2
(LET
(IF
(--H B 1)))
C 2)
((D (--H C 1)))
(< D 2)
3
(LET ((E (- D 1)))
205
(IF
(< E 2)
5
(LET
(IF
(DEFUN H (B)
(DECLARE (INTEGER B))
(IF (< B 2)
(VALUES 13 8)
(MULTIPLE-VALUE-BIND (X Y)
(H (- B 1))
(VALUES (+ X Y) X))))
NIL
6.9 What Thinner can't do
((F (--H E 1)))
(< F 2)
8
(MULTIPLE-VALUE-BIND
(H (- F 1))
(X Y)
First of all, Thinner can't find optimizations that are not thinnings! Low-level opti-
mizations like register allocation, peephole optimization and instruction scheduling
are obviously out of the picture, and some high-level optimizations are not thin-
nings either.
Some traditional optimizations like code hoisting and strength reduction make
use of a more complicated cost model than the formal PLC. Code hoisting reduces
the size of a program without reducing the length of any path through it; this is
not thinning since the PLC says nothing about the size of the grammar. Strength
reduction is useful because some primitive operations are more efficient than others;
this is not justified by the PLC, which treats all terminals as having unit cost.
Strength reduction fails to be a thinning for a more obvious reason: Thinner has
no way to substitute one primitive for another. When we thin a program we remove
206
redundant computations, but we never add, alter or reorder the computations that
are preserved. For this reason, not all algebraic optimizations are thinnings. For
example, we might want to optimize (+ (* a b) (* a c)) to (* a (+ b c)),
but this is clearly not a thinning, for the simple reason that the intermediate value
(+ b c) did not occur in the original.
Because it uses the relational constraint method of inference, Thinner will some-
times miss thinning examples that seem reasonably clear to a human programmer.
For example, consider this function:
(defun 11r2 (a b c)
(declare ((listof integer) c))
(if (or (= a inil) (> (length c) (length a)))
b
(lir2 (cdr a) (cons (car a) b) (cdr c))))
The expression (> (length c) (length a)) is loop-invariant, since lists a and
c shrink in unison; but Thinner does not detect this.
Thinner uses the method of kernels, which is only an approximate solution to
the trace grammar thinning problem, and it does not compute an exhaustive kernel
set. So there are cases in which Thinner will identify a thinning example but not
be able to remove all violations of that example from the grammar.
One final limitation of Thinner involves the way it searches for thinning exam-
ples: depth-first iterative deepening, stopping at the first violation of the PLC it
sees. As the Fibonacci example shows, Thinner may start materializing the con-
stant bases cases of a recursion. Once such a function appears in the grammar,
Thinner will never seek any deeper. It will always find the next base case at the
same depth of derivation. This can mask other, more useful optimizations that
might be found if Thinner searched more deeply.
Chapter 7
Conclusion
7.1 Summary
In the first chapter 1 began by noting that in terms of power and flexibility, people
make better optimizers than machines. Observing that people make use of high-
level principles of optimization, I set out to develop an automatic optimizer with
this same ability: the ability to derive useful optimizations as necessary from a
high-level, intuitive principle.
The remaining chapters fought for this goal. The battle for an ambitious,
underspecified goal like this is always a rear-guard action: the trick is to retreat
into a territory that is implementable while yielding as little ground as possible.
But some ground is inevitably lost. Let's review the results:
o+ I decided to stake out a single principle (the Principle of Least Computation)
and a single domain (the domain of TL programs). Human programmers
don't operate with such a restriction, so this certainly represents a significant
reduction in territory. For a single dissertation such a restriction is inevitable.
207
208
o+ In Chapter 2 I chose a formalization of the principle. This formal statement
of the PLC (Definition 2.21) in terms of the thinner-than relation on trace
grammars may agree with the informal statement, but there can be no proof
of this. I acknowledge that I may have given up more ground here.
o+ The question of whether a thinning is conservative or not is undecidable
(Theorem 2.4). It follows that ground must be lost here too: we cannot iden-
tify all opportunities for thinning. But this undecidability is less important
in practice than the intractability of general inference.1 To deal with this
intractability I developed in Chapter 4 the method of relational constraint,
which completes in polynomial time but, naturally, gives up more territory.
o+ In Chapter 5 I chose a method for repairing violations of the principle. I found
an approximate solution to the grammar-thinning problem for context-free
grammars. This solution, the method of kernels, applies reasonably well
to trace grammars. But the method is not perfect. One can construct a
grammar and a thinning example for which the method yields an optimized
grammar that still makes the mistake illustrated in the example.
o+ A final loss of ground occurred in the implementation of Thinner. I chose to
use a non-exhaustive kernel set and a simple method of searching for thinning
examples.
Thinner itself delineates the territory that remains. It discovers a large, uniform
space of optimizations, and demonstrates that the ideas developed in this thesis
1In my earliest sketch of a design for Thinner I drew a box labeled "Inference Engine". I
have learned to see this as a serious mistake. One might as well draw a box labeled "Lots of
Computation" or "A Miracle Occurs". Thinner would not have been successful without efficient
inference, and general-purpose efficient inference is not available as an off-the-shelf component.
209
are practical. It is possible to derive optimizations as needed from an intuitive,
high-level principle.
7.2 Open problems
We have seen that the path from the idea of principled optimization to its realiza-
tion in Thinner involved many compromises. Each of these compromises represents
an open problem: can it be improved?
o+ There are certainly other principles of optimization and other interesting
domains of computational mechanisms. I think the domain of sequential
digital circuits would be an interesting one to investigate. The PLC applies
here, but less absolutely: sometimes a little redundancy yields a better layout
and hence a better circuit. As in real-world planning problems, there are
resource constraints to consider.
The digital circuit domain is an example of a domain for which there is no
well-established canon of rote optimizations. Applying compiler optimiza-
tions to the source code used in high-level circuit synthesis sometimes makes
the circuit worse; this is noted by McFarland, Parker and Camposano as one
of the important open problems in circuit synthesis [MPC88]. Principled
optimization may provide some leverage on this problem. (It is interesting
to note in this context that the high-level circuit synthesis community has
recently shown increased interest in functional languages [JWB89] [BT89]
[Joh83].)
o+ The formalization of the PLC could be expanded. The most significant weak-
ness of the current formalization is that it requires substitution with identity,
210
where substitution with adequacy might be more natural. For example, if
a program computes (length a), and if there is a list b available which,
though not equal to a, has the same length as a, then we might want to be
able to substitute b for a and eliminate the computations that generated a.
This transformation is not justified by my formalization of the PLC since it
is not a conservative thinning. But it may well be justified by the intuitive
principle, so perhaps the formalization should be strengthened.
I imagine defining another class of thinnings, the liberal thinnings, that allow
substitution with adequacy. But the notion of adequacy is tricky. Should we
insist that a function remain adequate with respect to some specification?
This is the approach taken by Hall [Hal9O], but I am uncomfortable with
the idea of depending on external specifications since, in practice, functions
that come with formal specifications are rare. Should we insist on functional
equivalence and allow substitution with adequacy only for intermediate val-
ues?
o+ Greater inference power could be developed. Trace grammars often suggest
simple inductions: a nonterminal will often have obvious "base cases" (fully
terminal productions) and "inductive cases" (recursive productions). Per-
haps a method for induction could be added to Thinner's semantic analyzer.
o+ The CFG thinning problem is open. Any solution to the CFG thinning
problem may yield an optimization technique for trace grammars, as the
method of kern?ds did. This potential application makes the CFG thinning
problem especially interesting.
One CFG thinning solution is the FSM method (this is the one that works on
fully terminal thinning examples ? with M(?) regular, from Section 5.2.1).
211
I have not yet examined this to see whether it might yield an optimization
technique for trace grammars. If it does, the restriction that it only applies
when ? is fully terminal might be relaxed: in the CFG case one can sometimes
treat nonterminals in ? as if they were terminal (provided such nonterminals
do not contribute to instances of ? after expansion).
o+ The implementation limitations of Thinner might yield to further research.
In this area two advances would be especially useful: a way to construct a
larger collection of useful kernels at reasonable expense, and a more intelligent
way to search the space of possible thinning examples.
7.3 The benefits of principled optimization
The problems addressed in this thesis have not yet received wide attention, so I
will conclude by explaining why I think they are worth studying.
7.3.1 The science of optimization
Flip through any standard text on compiler construction and you will find a collec-
tion of recommended compiler optimizations. Decades of research and development
and thousands of person-years of field work have produced this treasure, the mod-
ern canon of compiler optimizations. Let any who would alter this canon beware:
with so much research behind it, how can it be far from optimal?
While this canon is of unquestionable value to compiler engineers, it leaves
many questions unanswered. From compiler texts one gets the impression that
the canon is everything: optimization is just a list of special cases, a grab-bag of
hacker's tricks. But where's the science in that?
212
Thinner demonstrates that a variety of compiler optimizations that were for-
merly thought of as independent tricks are actually instances of a single principle
at work. It unifies and illuminates these well-known optimizations by showing that
they can be derived automatically from the PLC. There is a science of optimization,
and it is found in the study of the principles that justify optimization.
Looking to the future: if it took decades of research and thousands of person-
years to come up with this canon for Von Neumann architectures, must we repeat
that effort to develop satisfactory compilers for each new architecture? Perhaps,
with advances in the science of optimization, we will be able to develop optimiza-
tions by some more refined method than trial and error.
7.3.2 The tools of optimization
The development of Thinner has produced two tools which may be useful outside
of principled optimization: the relational constraint method of inference, and the
method of kernels for program transformation,
The relational constraint method is a fast method of inference which should
be applicable to other areas. Whenever part of an inference task can be treated
as constraining k-ary relations for fixed k, and whenever part of that vocabulary
of relations can be captured in a finite lattice, the relational constraint method
may be of assistance. We have already observed that the method would be easy
to parallelize.
The method of kernels is not restricted to thinning, but could apply to a vari-
ety of different optimizations. It serves to expose instances of a particular example
graph in the right-hand sides of productions in the grammar. From there, what
you do to transform the graph is relatively unconstrained. Nor is the method of
kernels the last word on grammar thinning. The more important, fundamental con-
213
tribution is the observation that one can exploit the connection between program
optimization and grammar transformation.
7.3.3 Practical applications
Thinner is a proof of concept, not an application; there are no TL programs outside
the Thinner project. But I believe that principled optimization will ultimately find
a place in practical optimizing tools.
Let me make it clear that I am not advocating the construction of a practical
optimizer that painfully rediscovers loop invariant removal over and over. The
world does not need another extremely slow compiler. The canon of rote compiler
optimizations has its place: these are precisely the optimizations that experience
shows are easy to find, and occur frequently enough to be worth looking for. The
canon is both necessary and, in most cases, sufficient. Studies suggest that state-
of-the-art optimizing compilers find most of the optimization there is to be found
in most programs, so that further optimization by hand rarely yields as much as a
factor of 2 in speed [Pot9lJ.
Expert human programmers rarely write programs that contain egregious vio-
lations of the PLC. It is, after all, an intuitive principle of optimization with which
we are extremely familiar. I believe that principled optimization has little practical
potential for programs written by human experts.
But the future of software engineering holds more and more mechanical assis-
tance for programmers: more and more programs written by machines, or written
by novices with mechanical assistance. I do not refer to full-blown automatic
programming only, but to the trend toward higher-level tools and higher levels
of abstraction in general: object libraries, very high level languages, specification
languages, proofs as programs, etc. We already see that many compiler optimiza-
214
tions are useful largely because they repair inefficiencies introduced by high-level
language compilation: most loop invariants, for example, are array indexing in-
structions not explicit in the source text.
The well-known "argument from re-use" draws on this idea. This is the ar-
gument that high-level optimization techniques will promote the re-use of code
from procedure and object libraries, by allowing general routines to be specialized
automatically [Pot91] [llal9O] [Tur86]. In effect, this argument says that while
the original, human-coded functions may not benefit from advanced optimization,
machine-generated specializations of them will.
I believe that as we move to high-level tools and higher levels of abstraction,
we must be prepared to build in greater optimizing sophistication. The expert
human programmer's actions are informed by principles of optimization, and if
lower-level actions are to be carried out by tools, these tools must have access to
similar principles. This is the practical importance of the study of the principles
of optimization.
Epilogue
Burn This Thesis
Seized with a prolific, vernal mood
It sings again the old arboreal hymn:
A verse, a year of life, a layer of wood,
An ever tighter trap of trunk and limb.
I weave, my hands are knotted on the loom,
I think, my mind divided into thought,
I feel a joyful urge to build a tomb,
I act, the walls solidify, I'm caught.
And God is everywhere. What must it mean
To make the world and with the world to join!
How must He brood within each dark machine
As He receives Himself in each dark coin!
Come, devour, plane, exalt, make low,
Release us in a gentle even heat.
215
Bibliography
[A1187]
[AN87]
[Bac85]
Samson Abramsky and Chris Hankin. An introduction to abstract in-
terpretation. In Samson Abramsky and Chris Hankin, editors, Abstract
Interpretation of Declarative Languages, chapter 1. Ellis Horwood Lim-
ited, 1987.
Arvind and R. Nikhil. Executing a program on the MIT tagged-token
datafiow architecture. In PARLE: Parallel Architectures and Languages
Europe, volume 2. Springer-Verlag, June 1987.
John Backus. From function level semantics to program transforma-
tion and optimization. In H. Ehrig et al., editors, Proceedings of the
International Joint Conference on Theory and Practice of Software De-
velopment, pages 60--H91. Springer-Verlag, March 1985. Lecture Notes in
Computer Science 185.
[BD77] R. Burstall and J. Darlington. A transformation system for developing
recursive programs. Journal of the Association for Computing Machin-
ery, 24(1), January 1977.
[Bll90]
[Bib91]
[BM75]
H. Bunke and B. Haller. A parser for context free plex grammars. In
M. Nagi, editor, WG `89 Craph-Theoretic Concepts in Computer Sci-
ence: Proceedings of the 15th International Workshop WC `89. Springer-
Verlag, 1990. Lecture Notes in Computer Science 411.
W. Bibel. Perspectives on automated deduction. In Robert 5. Boyer,
editor, Automated Reasoning, chapter 4. Kluwer Academic Publishers,
1991.
Robert 5. Boyer and J. Strother Moore. Proving theorems about
LISP functions. Journal of the Association for Computing Machinery,
22(1):129--H144, January 1975.
[BT89] Alexandre Bronstein and Carolyn L. Talcott. Formal verification of
pipelines based on string-functional semantics. In Luc J. M. Claesen,
216
217
editor, International Workshop on Applied Formal Methods for Correct
VLSI Design, pages 347--H364. Elsevier, November 1989.
[C+86j Robert L. Constable et al. Implementing mathematics with the Nuprl
proof development system. Prentice-Hall, 1986.
[CC76]
[CC77j
[CC79]
[CD93]
[CH78]
[Cho62]
[Coh83]
[CS7O]
Patrick Cousot and Radhia Cousot. Static determination of dynamic
properties of programs. In Proceedings of the 2nd International Sympo-
sium on Programming, 1976.
Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified
lattice model for static analysis of programs by construction or approx-
imation of fixpoints. In Conference Record of the 4th Annual ACM
Symposium on Principles of Programming Languages, pages 238--H252,
1977.
Patrick Cousot and Radhia Cousot. Systematic design of program anal-
ysis frameworks. In Conference Record of the 6th ACM Symposium on
Principles of Programming Languages, 1979.
Charles Consel and Olivier Danvy. Thtorial notes on partial evalua-
tion. In Conference Record of the Twentieth Annual ACM SICPLAN-
SICA CT Symposium on Principles of Programming Languages, pages
493--H501. ACM, January 1993.
Patrick Cousot and N. Halbwachs. Automatic discovery of linear re-
straints among variables of a program. In Conference Record of the 5th
ACM Symposium on Principles of Programming Languages, 1978.
Noam Chomsky. Context-free grammars and puslidown storage. In
Quarterly Progress Report No. 65, pages 187--H194. MIT Res. Lab. Elect.,
1962.
Norman H. Cohen. Eliminating redundant recursive calls. ACM Trans-
actions on Programming Languages and Systems, 5(3):265--H299, July
1983.
J. Cocke and J. T. Schwartz. Programming Languages and Their Com-
pilers; Preliminary Notes. Courant Institute of Mathematical Sciences,
New York University, 1970.
Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. Variations
on the common subexpression problem. Journal of the Association for
Computing Machinery, 27(4):758--H771, October 1980.
[DST80]
218
[Ers78] A. P. Ershov. On the essence of compilation. In E. J. Neuhold, editor,
Formal Description of Programming Concepts. North-Holland, 1978.
[Ers82] A. P. Ershov. Mixed computation: potential applications and problems
for study. Theoretical Computer Science, 18,1982.
[Fed71] Jerome Feder. Plex languages. Information Sciences, 3:225--H241, 1971.
[FH88] Anthony J. Field and Peter G. Harrison. Functional Programming.
Addison-Wesley, 1988.
[FLS64] Richard P. Feynman, Robert B. Leighton, and Matthew Sands. The
Feynman Lectures on Physics, volume 2. Addison-Wesley, 1964.
[Fut71] Y. Futamura. Partial evaluation of computation process an approach
to a compiler-compiler. Systems, Computers, Controls, 2(5), 1971.
[GJ79j Michael R. Garey and David 5. Johnson. Computers and Intractability.
W. H. Freeman and Company, 1979.
[Gra89] Philippe Granger. Static analysis of arithmetical congruences. Interna-
tional Journal of Computer Mathematics, 30,1989.
[Gra9 1]
[G590]
[Hal9O]
Philippe Granger. Static analysis of linear congruence equalities among
variables of a program. In TAPSOFT `91 Proceedings of the Interna-
tional Joint Conference on Theory and Practice of Software Develop-
ment, pages 169--H192. Springer-Verlag, April 1991. Lecture Notes in
Computer Science 493.
C. A. Gunter and D. 5. Scott. Semantic domains. In Jan van Leeuwen,
editor, Handbook of Theoretical Computer Science, volume B, chap-
ter 12. The MIT Press/Elsevier, 1990.
Robert Joseph Hall. Program improvement by automatic redistribution
of intermediate results. Technical Report AI-TR 1251, MIT Al Lab,
December 1990.
[HU79] John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata
Theory, Languages, and Computation. Addison-Wesley, 1979.
[Hud87]
Paul Hudak. A semantic model of reference counting and its abstraction.
In Samson Abramsky and Chris Hankin, editors, Abstract Interpretation
of Declarative Languages, chapter 3. Ellis Horwood Limited, 1987.
John Hughes. Analysing strictness by abstract interpretation of con-
tinuations. In Samson Abramsky and Chris Hankin, editors, Abstract
Interpretation of Declarative Languages, chapter 4. Ellis Horwood Lim-
ited, 1987.
[Hug87]
219
[IK87]
[JGB+90]
Klaus Indermark and Herbert Klaeren. Compiling fibonacci-like recur-
sion. SIGPLAN Notices, 22(6):101--H107, June 1987
Neil D. Jones, Carsten K. Gomard, Anders Bondorf, Olivier Danvy, and
Torben Mogensen. A self-applicable partial evaluator for the lambda
calculus. In IEEE International Conference on Computer Languages,
pages 49--H58, 1990.
[JM84] J.P. Jones and Y.V. Matijasevic. Diophantine representation of the set
of prime numbers. Journal of Symbolic Logic, 49,1984.
[Joh83]
Steven Dexter Johnson. Synthesis of Digital Designs from Recursion
Equations. Ph.E. dissertation, Indiana University, May 1983. MIT
Press, 1984.
[Joh86] J. Howard Johnson. Rational equivalence relations. Theoretical Com-
puter Science, 47:39--H60,1986.
[JWB89]
Steven Dexter Johnson, Robert M. Wehrmeister, and Bhaskar Bose. On
the interplay of synthesis and verification. In Luc J. M. Claesen, editor,
International Workshop on Applied Formal Methods for Correct VLSI
Design. Elsevier, November 1989.
[Kar76] M. Karr. Affine relationships among variables of a program. Acta In-
formatica, 6,1976.
[KS81]
R. B. Kieburtz and J. Shultis. Transformations of FP program schemes.
In Proceedings of the Conference on Functional Programming Languages
and Computer Architecture. ACM, October 1981.
[LN73] Hans Lausch and Wilfried N5bauer. Algebra of Polynomials. North-
Holland, 1973.
[MPC88]
Michael C. McFarland, Alice C. Parker, and Raul Camposano. Thtorial
on high-level synthesis. In 25th Design Automation Conference, pages
330--H336. ACM/IEEE, 1988.
[MR79] E. Morel and C. Renvoise. Global optimization by supression of partial
redundancies. Communications of the ACM, 22(2), 1979.
[Myc87] Alan Mycroft. A study on abstract interpretation and `validating mi-
crocode algebraically'. In Samson Abramsky and Chris Hankin, editors,
Abstract Interpretation of Declarative Languages, chapter 9, pages 199--H
218. Ellis Horwood Limited, 1987.
220
[Per89]
Mark Perlin. Call-graph caching: Transforming programs into networks.
In Proceedings of the Eleventh International Joint Conference on Arti-
ficial Intelligence, pages 122--H128, 1989.
[Pot91] I. V. Pottosin. Analysis of program optimization possibilities and fur-
ther development. Theoretical Computer Science, 90:17--H36,1991.
[Ric86]
Charles Rich. A formal representation for plans in the Programmer`s
Apprentice. In Charles Rich and Richard C. Waters, editors, Readings
in Artificial Intelligence and Software Engineering. Morgan Kaufn?ann,
1986.
[Roh89] J. 5. Rohl. Recursive and iterative functions for generating Fiboaacci
numbers. Technical Report TR 89-1017, Cornell University, June 1989.
[Roz87] Grzegorz Rozenberg. An introduction to the NLC way of rewriting
graphs. In Hartmut Ehrig, Manfred Nagl, Grzegorz Rozenberg, and
Azriel Rosenfeld, editors, Graph Crammars and Their Application to
Computer Science. Springer-Verlag, 1987. Lecture Notes ill Computer
Science 291.
[RW88] Charles Rich and Richard C. Waters. A research overview. Computer,
21(11):11--H25, November 1988.
[RWZ88]
Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Clobal
value numbers and redundant computation. In Proceedings of the Fif-
teenth Annual ACM Symposium on Principles of Programming Lan-
guages, pages 12--H27, January 1988.
[Sa173] Arto Salomaa. Formal Languages. Academic Press, 1973.
[SG87j
Devika Subramanian and Michael R. Genesereth. The relevance of ir-
relevance. In Proceedings of the Tenth International Joint Conference
on Artificial Intelligence, 1987.
[Ste9O] Guy Steele. Common Lisp: the Language, 2nd Edition. Digital j?ress,
1990.
[5ub89] Devika Subramanian. A Theory of Justified Reformulations. I)h.D.
dissertation, Stanford University, March 1989.
Valentin F. Thrchin, Robert M. Nirenberg, and Dimitri V. Thrchin.
Experiments with a supercompiler. In Conference Record of the 1982
ACM Symposium on Lisp and Functional Programming, pages 47--H55,
August 1982.
[TNT82]
221
[Tur80]
Valentin F. Turchin. The use of metasystem transition in theorem prov-
ing and program optimization. In Automata, Languages and Program-
ming, the 7th Colloquium, pages 645--H657. Springer-Verlag, 1980. Lec-
ture Notes in Computer Science 85.
[Thr86] Valentin F. Thrchin. The concept of a supercompiler. ACM Transac-
tions on Programming Languages and Systems, 8(3):292--H325, July 1986.
[Wad87]
[Web92a]
Phil Wadler. Strictness analysis on non-flat domains (by abstract inter-
pretation over finite domains). In Samson Abramsky and Chris Hankin,
editors, Abstract Interpretation of Declarative Languages, chapter 12.
Ellis Horwood Limited, 1987.
Adam Brooks Webber. A formal definition of unnecessary computation
in functional programs. Technical Report TR 92-1260, Cornell Univer-
sity, January 1992.
[Web92b] Adam Brooks Webber. Relational constraint: a fast semantic analysis
technique. Technical Report TR 92-1319, Cornell University, December
1992.
[Web92c] Adam Brooks Webber. Thinning context-free languages. Technical Re-
port TR 92-1318, Cornell University, December 1992.
[WZ91j Mark N. Wegman and F. Kenneth Zadeck. Constant propagation with
conditional branches. ACM Transactions on Programming Languages
and Systems, 13(2):181--H210, April 1991.
