BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1361
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Alex - A Paradigm for Expressing and Compiling Matrix Functions
AUTHOR:: Ressler, Gene K.
DATE:: June 1993
PAGES:: 201
COPYRIGHT:: Eugene Kenneth Ressler - All Rights Reserved
ABSTRACT::
This work presents formal and practical tools to support the Alex paradigm 
for expressing and compiling matrix functions. Alex programs are recursive 
definitions over matrices with the same flavor as the elegant FP languages of 
Backus. Many useful matrix algorithms can be expressed in both FP and Alex 
without any explicit index arithmetic. While FP is very difficult to compile 
to fast numerical codes, we show that Alex functions admit compile-time 
analyses and code generation techniques with efficient results. In 
particular, we give a type system that expresses matrix sizes and shapes, a 
sound and complete inference algorithm for these types, and a code generation
algorithm that is space-efficient--storage is allocated only for user function 
parameters and results. Finally, we give an algorithm that determines when it 
is safe to replace call by value array parameters in the compiled code with 
call by reference. A compiler using these techniques generates C code from 
Alex programs which is at least as fast as code one would write by hand for 
some small problems. We conclude with discussion of how the fundamental 
techniques given here relate to previous work and how they should be extended 
to a general functional language or incorporated as a feature of an existing 
one.
END:: CORNELLCS//TR93-1361
BODY::
Alex - A Paradigm for Expressing and
Compiling Matrix Functions
Eugene Kenneth Ressler
Ph?D Thesis
93-1361
June1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
ALEX A PARADIGM FOR EXPRESSING AND
COMPILING MATRIX FUNCTIONS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Eugene Kenneth Ressler
May 1993
Qc Eugene Kennetli Ressler 1993
ALL RIGHTS RESERVED
ALEX --H A PARADIGM FOR EXPRESSING AND
COMPILING MATRIX FUNCTIONS
Eugene Kenneth Ressler, Ph.D.
Cornell University 1993
This work presents formal and practical tools to support the Alex paradigm for
expressing and compiling matrix functions. Alex programs are recursive definitions
over matrices with the same flavor as the elegant FP language of Backus. Many useful
matrix algorithms can be expressed in both FP and Alex without any explicit index
arithmetic. While FP is very difficult to compile to fast numerical codes, we show
that Alex functions admit compile-time analyses and code generation techniques
with efficient results. In particular, we give a type system that expresses matrix
sizes and shapes, a sound and complete inference algorithm for these types, and a
code generation algorithm that is space-efficient--Hstorage is allocated only for user
function parameters and results. Finally, we give an algorithm that determines when
it is safe to replace call by value array parameters in the compiled code with call by
reference. A compiler using these techniques generates C code from Alex programs
which is at least as fast as code one would write by hand for some small problems.
We conclude with discussion of how the fundamental techniques given here relate to
previous work and how they should be extended to a general functional language or
incorporated as a feature of an existing one.
Biographical Sketch
Gene Ressler was born in Allentown, Pennsylvania on July 20, 1956. He graduated
from Northampton High School, Northampton, Pennsylvania in 1974, and accepted
an appointment to the United States Military Academy at West Point, New York. On
June 7, 1978, he graduated with a Bachelor of Science degree and was commissioned
a Second Lieutenant of the Army Engineers. He served as a platoon leader, execu-
tive officer, intelligence officer, and commander of Company B of the 79th Engineer
Battalion, a unit whose peacetime mission was military construction in Germany.
He was married to Friederika Campbell on August 2,1980 in Karisruhe, and their
first child Daniel was born on May 11, 1981 in Heidelberg. They left Germany in
December, and at their next station, Fort Belvoir, Virginia, daughter Julia was born
on July 23, 1982. Gene received a Masters Degree in Computer Science from the
University of California at Berkeley in July 1985, and taught and did research in
computer-assisted topographic mapping at West Point until July, 1988. For the fol-
lowing year, stationed at Seoul Korea, he managed the facilities maintenance budget
and design staff for Eighth Army. From July, 1989 to the present date, he has worked
toward the PhD in Computer Science at Cornell University, Ithaca, New York.
iii
To Freddie, Daniel, and Julia,
and especially to Mom and Dad.
iv
Acknowledgements
I am grateful to the Army for the incredible opportunity of my time at Cornell.
I deeply appreciate the help and support of my advisors Jack Muckstadt, Keshav
Pingalli, and, most especially, Dexter Kozen throughout this research. Finally, I
thank my family, who cheerfully endured me as less of a husband, father, son, and
brother than I really should have been during this time.
v
Table of Contents
2
3
Introduction
1,1 The Functional Programming Style
1.2 Some llistory . .
1.3 Problems.
1.4 Progress .
1.5 Contribution of this Thesis
1.6 A Preview of Structured Values
1.7 A Larger Example
1.8			Alex . . .			. .			.
1.9			Generality . . .			.
1.10 Denotational Semantics
1.11 Outline . . . .
Matrices and Structured Values
2.1			Domains			.
2.2			Matrices . .			.
2.3			A and Other Tools			. .
2.3.1 A as matrix constructor
2.3.2 Function patches .
2.3.3 Conditional values
2.3.4 Promotion
2.3.5 Matrix join
2.4 5-vals
Semantics of Svexp
3.1 Abstract Syntax.
3.2 Semantic Groundwork
3.3 Unchecked Semantics
3.3.1 Typ and Proj
3.3.2 Extr
3.3.3 Join . . .
3.3.4 Arithmetic .
3.3.5 Conditionals
3.3.6 Expressions
vi
2
3
5
8
10
12
16
17
19
20
20
22
23
24
29
29
30
31
32
33
35
38
38
41
42
43
45
45
46
47
48
3.4
3.3.7			Declarations .			.
Full Semantics . . .			. .			.
3.4.1 A primitive wrapper
3.4.2 Typ and axis assertions for static typing
3.4.3 The other unary primitives
3.4.4 5-val closure property for unary primitives
3.4.5 Binary primitives . .
3.4.6 The conditional
3.4.7 Function application and construction
3.4.8 5-val closure for svexp
Types and Type Inference
4.1			Overview
4.1.1			History . . . . .
4.1.2			The approach . . . .
4.2			The Axis Type Language
4.3			Axis Set Inference Rules . .
4.3.1			A look at Typ . . . .
4.3.2			Primitive typing rules of unqualified soundness
4.3.3			Primitive axis typing rules of qualified soundness
4.3.4			Other program structures . .
4.3.5			Automatic axis type derivations
4.4			Axis Type Um'fication.
4.5			Generating Axis Type Equations
4.5.1			Syntactic soundness and completeness
4.5.2			Soundness and completeness cases
4.6			Dimension Inference			. . .			.
4.6.1			Dimensions .			.			. . .
4.6.2			Inference rules .
4.6.3			Type unification			.
4.6.4			Most general			function types
4.6.5			Generating Type			Equations
4.6.6 Simplifying dimension equality constraints
4
5
Code Generation and Improvement
5.1			The Standard Compilation Model			.			.
5.2			The Target Language . . .
5.2.1			Semantics of proc			. .			. .
5.2.2			Subarrays . . . .
5.3			Compiling proc to C			.			.			. . .
5.3.1			Dope vectors . .			. .
5.3.2			Element addresses and transposition
5.3.3			Subarrays.			.
5.3.4			Strength-reduced stepping			.			.
5.4			A Correctness Criterion
vii
49
51
51
52
53
54
55
58
61
62
64
64
65
66
67
71
71
73
74
75
79
81
91
92
95
97
98
99
101
103
105
107
112
113
116
118
121
122
122
123
124
125
127
5.4.1 Active sites
5.4.2 Active Assignments . .
5.4.3 Active Expressions
The Rewrite lunction
5.5.1 Rewriting active assignments
5.5.2 Rewriting active expressions
5.5.3 Loop nest optimizations. .
Perspective
130
130
134
134
134
146
148
150
151
151
153
154
155
158
166
166
169
173
174
175
176
177
177
178
179
182
182
183
187
187
187
188
189
191
191
194
196
5.5
5.6
6
Evaluation In Place
6.1			Overview. . .			. .
6.1.1			The remaining problem			. .
6.1.2			A solution
6.2			proc2			sharing syntax .			.			. .
6.3			Assignment ordering . . .
6.4			Extensional Equivalence . .			.			.			.
6.5			The Sharing Condition .
6.6			A Syntactic Sharing Condition			. .			.
6.7			Strengthening ?
6.8			Interprocedural Sharing Analysis			.
6.8.1			The nonrecursive case .			.
6.9			The Recursive Case . . .
6.10			A Sharing Optimization			Algorithm
6.11			Deducing prospects . . .			.			.			. . .
6.11.1			Memory allo???ifion .
6.11.2			Prospects due to =
6.11.3			Prospects due to non?recursive calls
6.11.4			Prospects due to recursive			calls
6.11.5			Efficiency			. .			.
Related Work and Conclusions
7.1			Related work . .			. .
7.1.1 Array semantics .
7.1.2 Type inference
7.1.3 Compilation .
Alex
7.2.1 svexp subterms as pictures
Generalizations			.
Summary . .			.
7
7.2
7.3
7.4
Bibliography
198
viii
List of Tables
3.1 Axes and dimensions for unary primitives applied to vA,Q; C,?
4.1 Primitive typing rules of unqualified soundness.
4.2 Axis typing rules of qualified soundness
4.3 Type rules (axioms) for constants and variables.
4.4 Axis type rules for general program structures.
4.5			Primitive typing rules. . .			. .
4.6 Constant and variable typing rules. .
5.1			Dope vector al as parameters.			.
ix
55
73
74
75
77
loo
102
125
List of Figures
1.1 Pascal code for LU factorization .
1.2 svexp code for LU factorization
3.1 Abstract syntax of svexp. . .
3.2 An svexp program.
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Deriving an axis type. . . . . . .
The axis type unification algoritlim
Axis type equation generation algoritlim.
Function body action rules for axis typing.
Type equation generation algoritlim. .
Function body action rules for typing (part
Function body action rules for typing (part
5.1 The target language proc
5.2 Example of proc translation to C. .
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
7.1
7.2
7.3
17
18
39
40
1).
2).
Defs for proc2 subterms						. .
Uses for proc subterms
Intra-procedural use and def ordering for proc2.
Algorithm for sharing correctness. 			.
Algorithm for s(ak, be, f).
Algorithm returning all optimizations of f given P
In--Hplace prospect [t[3:5]/sJ. . .
Algorithm P to find in--Hplace prospects.
Faster algorithm returning all optimizations of f given P.
Final algorithm returning all optimizations of f given P.
Typical element operations in Alex.
Extract operations in Alex
Multiple Typical element operations.
x
80
83
92
93
105
106
107
117
126
161
162
164
170
172
178
181
183
184
186
192
193
193
Chapter 1
Introduction
This thesis formalizes and gives implementation techniques for the functional pro-
gramming paradigm of Alex [KTC+87], a graphical programming environment for
matrix functions. It also shows how the paradigm extends to functional languages
in general. The Alex paradigm is based on a new denotational definition of matrix
values, called s-vals, that supports a small but expressive collection of primitives for
mapping functions over matrix elements. For illustration, we give a small, Alex-
representative textual language, svexp, which uses these primitives for all compu-
tation. We show svexp sufficient to express many common numerical algorithms
using no declarations or indexing expressions. This is achieved through a type sys-
tem that includes a notion of the sizes and shapes of s-vals. An associated inference
algorithm eliminates the need for run time representations of these types. We derive
a code generator for svexp expressions, finding that for non-recursive definitions,
the well-known functional aggregate update problem of superfluous memory alloca-
tion and copying is easily avoided. The recursive case is solved by using abstract
interpretation techniques to determine when function results can be evaluated into
the storage provided for arguments. Policy decisions that trade space for speed and
available parallelism are explicit black boxes in the code generator that may be filled
2
with resource-dependent heuristics. Finally, we show how the work on svexp extends
to Alex and to other, more general, languages, including those with function values.
In a global sense, this thesis is a step in the ascendance of functional programming
techniques from research topic to useful tool. Matrix calculations are certainly vital
to scientific and business applications, where the advantages of functional techniques
are relevant. Yet matrices have been addressed in only a limited way by functional
language research and implementations, perhaps because only recently have the per-
formance problems regarding simpler data types appeared surmountable. Our goal
is to describe the key additional problems that matrices introduce, give a framework
in which the problems can be solved, and give solutions for what we judge to be some
of the more important ones, as we now explain.
1.1 The Functional Programming Style
A program in a functional language is a series of function definitions in the form
of equations, followed by an expression applying these definitions. The value of the
expression is the answer returned by the program. Function application (of user-
defined functions as well as primitives like +) is the sole operation. Variable names
denote values, e.g. numbers or sequences which, once associated with the variable
in question, do not change. The process of value-variable association occurs only
during function calls, when argument values are bound to formal function parameters.
Control statements also return values:
if B then E1 else E2
returns the value of E1 if B has value true and E2 for false.
In this way functional languages are distinguished from imperative ones, wherein
variables stand for cells or blocks of cells in a global store. Cells hold values, which
are changed by assignment operations. An assignment such as i = i + 1, a dubious
3
mathematical assertion, has an imperative meaning "replace the value in cell i
with itself plus 1." In this setting, control statements do not return values. Instead
they determine which assignments are executed in order to influence changes in the
state of the store. Some part of the store finally becomes the answer returned by the
program.
The notion that functional variables denote values rather than cells in storage is
so essential that it is given the special name refevential transpa?ncy.
1.2 Some History
John Backus inspired much research in functional languages with his seminal Thnng
Award paper [Bac78j. In it, he asserts that the imperative global store is problematic
in several ways, which we examine as they apply to this work:
o+ The global store is humanly incomprehensible for programs of any size. This
leads to programming errors.
o+ The store is difficult to handle in mathematical proofs of program and compiler
correctness.
o+ The store introduces arbitrary constraints on the structure of data flow in
programs that, ill the end, limit performance.
Appropriately, he demonstrates with a code for vector inner product:
cO; for i =0 ton--Hi do c := cta[i] *b[i]
In correspondence with the the points above, we note:
o+ The programmer is responsible for ensuring that arrays a and b are big enough,
that n matches the sizes of the vectors stored therein, and that neither c nor i
hold some later-required value when the code executes.
4
o+ It is cumbersome to prove this program computes the dot product. Indeed, one
starts by assuming a and b are n-vectors, and this must be remembered when
proving surrounding code correct.
o+ The fact that the products could be computed by many processors working
independently and the sum accumulated through a tree-like network is hidden
in the sequential assignments to c, where each successive value depends on the
previous one.
In addition, behavior is unspecified when a and b are not n-vectors. A compiler for
the program may assure that unassigned storage is set to zero and that so-called
index range errors are detected at run time, but these do not help get the right
answer unless we are very lucky.
Functional languages address these points as follows:
o+ Functional programs are essentially declarations of the problems they solve; if
the problem specification is correct, a correct program is not far away.
o+ Functional primitives are easily axiomatized equationally, and many correctness
and equivalence proofs can be carried out in a straightforward fashion using
equational logic.
o+ Since a functional language cannot express storage locations, it cannot con-
strain data flow by storage dependence. Moreover, good functional languages
are rich in operators (and ways to define new ones) that work simultaneously
on many data elements rather than on one at a time, the so-called mopping
primitives.
5
To demonstrate these points, here is Backus's functional definition of inner product:1
InnerProduct x (Reducet)(ApplyToAll*)Transpose x
(a)
(1.1)
A proper argument to this function is a sequence of two equally long sequences of
numbers. Transposition (c) changes the pair of lists into a list of pairs. The multi-
plication of all pairs (b) reduces each to a number. Reduce+ (a) sums the resulting
vector. It is easy to write formal axioms for Transpose, Reduce, and ApplyToAll that
make a proof that this program computes dot product a simple exercise. Further, we
can make error handling precise by e.g. having Transpose return a special value erwr
if its inputs have unequal length, and by making all primitives strict. That is, on
inputs of erwr, they return erwr, so the behavior of all functions is completely de-
fined. No preconditions to the application of InnerProductare required in correctness
proofs.
1.3 Problems
We conclude that functional techniques are extremely worthwhile. Why then, after
fourteen years since Backus's incitement to study, have they not flourished, except in
the halls of computer science research? The answer is three words --H performance,
performance, and performance. A naive compiler of functional programs generates
code that typically runs one to three orders of magnitude slower than equivalent,
naively compiled imperative code. Why?
Functional language implementations tend to be slow exactly because the func-
tional programmer has no control over storage, as this places the entire burden of
storage management with the compiler and runtime system; and these tasks turn out
to be formidable. For instance, even our tiny inner product example uses sequences
ThActually, Backus's language FP is a purely combinatorial functional language
with no variables at all. We have added x for consistency with our own syntax. This
is largely a matter of taste.
6
of arbitrary length. To represent these in a computer generally requires extra space
for link pointers and a heap and garbage collector like Lisp. All entail significant
overhead. Moreover, the pure function compiler has a harder job than one for (non-
pure) Lisp. When a Lisp programmer wishes to concatenate sequences A and B, she
is free to assign B to the tail of A to avoid building a fresh list for the result. This
destroys the old value of A, a fact the programmer must remember and consider in
the rest of the program. The functional program, however, must have the value of A
(without B appended) available if it is ever needed after the concatenation is formed.
Thus A (at least) must be copied as the price of general referential transparency.
The function call stack2 is another problem. In functional programs, recursive
functions are used to express looping computations. E.g., to sum the integers 1 to
n, one might write
sum n if n = 1 then n else n t sum
(1.2)
Naively compiled code for this program makes n recursive calls to sum, requiring an
0(n) stack.
Many other performance problems arise when e.g. functions can be bound to
variables and passed as arguments, when evaluation is lazy, and for other language
features. These are mostly independent of the discussion at hand, however, and we
refer the interested reader to [FH88] or [PJ87j.
Indeed, referential transparency can be extremely expensive if arrays are offered
as a primitive data type. This has been explored in [HB85J and [Blo89], with the
introduction of three operators for making, accessing, and updating vectors.
Definition 1.1 (Hudak's operators) For positive integer n and function f over
integers, mkv n f returns v such that v[i] = f i for 0 < i <n. We call v an n-vector.
2Or equivalently, continuation memory when this supplants a stack. See for in-
stance [AT89].
7
Furthermore, for n-vector v and integer i, 0 < i < n, the expression updt v i x,
returns a new vector v' such that for 0 < j <n, v'[j] = v[j], except v'[i] = x. ?
Updt is conceptually like assignment to array elements in imperative programs. One
can increment elements 1 to n of a vector with this function:
inc v n if n = 0 then v else inc (updt v n (v[n]+1)) (n--H1)
(1.3)
This example tempts us to compile updt into an assignment to the storage allocated
for the current value of v. As with list concatenation, however, assignment can
compromise referential transparency. If v is a vector with v[Oj = 1, and the left
operand of + is evaluated first, then
(updt v 0 O)[O] + v[O]
has value 1 if references to v are transparent and 0 if updt assigns to the storage
of v. We conclude that the compiled code for updt v... may have to copy all
of v into fresh storage. If a compiler naively generates such copying code for all
updates, (1.3) becomes a monster requiring 0(n2) run time and storage. In general,
if a program makes m transparent updates to an fl-vector, the total cost is O(mn)
rather than 0(m) for imperative array assignments. This nasty behavior is known
as the aggregate update problem.
Another important source of overhead comes from checking arguments so that
error can be returned when they are not of proper form. For example, sequences
must be distinguishable from atoms if Transpose is to check that its argument is
a sequence of sequences. We may have to augment each value with explicit type
information for this purpose. Type information consumes space and type checks
consume processor cycles at run time.
8
1.4 Progress
Thus it appears at first glance that the advantages of convenience and mathematical
clarity of functional programming over comparable imperative idioms are offset by
asymptotically worse performance. Remarkably, in an environment where impera-
tive compiler writers are striving to shave a few per cent from run times with elabo-
rate optimizations, functional language proponents continue to attack the formidable
problems we have described and many others with enthusiasm, and with steadily in-
creasing success. We next consider some successful techniques that are related to
the examples given above and to the techniques developed in the remainder of this
thesis.
First, though there seems no way to avoid garbage collection completely, the
best algorithms are quite efficient, requiring execution of just a few instructions per
allocated object on average [App87,Ung84]. Most recently, work such as [JM89] gives
analysis techniques that tell a compiler when it is safe to "compile in" [sicj the reuse
of argument storage for function results in the case of cons structures (general binary
trees). This can eliminate garbage collection in the functions where the analysis
succeeds.
Unconstrained stack growth of functions that emulate loops is often easily pre-
vented by a syntactic analysis known as tail recursion removaL Any function with
form
function f(x)
return( f(a)
end
can be transformed to machine code of the form
9
function f'(x)
start:
x a; goto start;
end
which has equivalent meaning,3 but no recursive call or stack growth. [FH88] shows
how this is done in a stack-based code generator. Another approach is to compile
with a continuation-passing intermediate form [AT89J, which makes tail recursion
almost trivial to find and eliminate. Unfortunately, Example (1.2) is not in tail
recursive form! Here the program can be transforme?d using algebraic identities to
the following:
SUmr n r =--H if n = 1 then (rtl) else Sumr (n--H1) (rtn)
sum n =--H sumr n 0
This version is tail recursive.
The aggregate update problem is tackled in the same work of Hudak and Bloss
that introduced the vector operations mkv, [.j, and updt. Their approach is to
analyze the uses of updt in a program to determine when it is safe to compile them
as assignments. Briefly, this is whenever the value of the vector argument to updt will
never be needed after the update is performed. They note that imperative matrix
programs translate in a line-for-line fashion to functional form as follows:
o+ mkv is used to simulate array declarations. Two and higher dimensional arrays
are ostensibly represented as vectors of vectors.
o+ [.] is used in the same way as imperative array access.
o+ Tail-recursive functions simulate for loops.
3We mean equivalent in the extensional sense: For all x, f'(x) returns r (or recurs
forever) if and only if f(x) returns r, (or loops forever)
10
Such programs submit completely to update analysis precisely because it is impossible
to use the un-updated value of an imperative array after the array is updated by
assignment.
Finally, [Mil78j and much subsequent work have given us practical static type
systems and compile time type inference algorithms for functional languages. The
latter compute with perfect certainty the types of definitions and subexpressions of a
program, returning a type-annotated program or fail when no annotation exists. The
only additional input is the types of primitives; e.g., + has type int int int.
Annotations may include polymorphic types, which at once represent many types
with type variables. For instance, in
fx=--Hx
f will be annotated with V?.c ?, signifying that the type of the result is the
same as the type of its input, whatever that may be. Such quantified types are type
schemes. However,
gx=--Hx+1
has type int H int owing to the application of +.
Type annotations often allow a compiler to avoid run time type checking com-
pletely, without compromising the safety of the compiled program. In the example
above, any correctly typed application of ? must be to an int, so the compilation of
x t 1 may safely add without a check of the type of x. This saves both the run time
of a type check and the space for type information,
1.5 Contribution of this Thesis
We are now in a position to consider a fresh approach to matrices in functional
languages based on the qualities of existing approaches. Here are our observations:
11
o+ Backus showed the power of mapping arithmetic primitives over collections of
numbers. However, his prototypical language can only represent matrices with
sequences of arbitrary length. In implementations, these consume space for
link pointers and require garbage collection. Moreover, determining the exact
lengths of sequences in languages with general sequence-building primitives is
an undecidable problem, and useful approximations are hard to compute. Thus,
many checks for proper arguments, e.g. that Transpose is presented with a list
of lists of equal length, must be performed at run time.
o+ Hudak proposed vector operators that presume underlying implementation as
arrays in contiguous storage. They admit translation of an imperative pro-
gram P to a functional form that compiles to efficient code; indeed P may be
returned as output. We believe that though this is an important step forward
as an analysis technique, it is a step backward for programming style simply
because mkv, [.j, and updt allow the programmer to bind a whole vector of
values to a variable, then access and update these values randomly. This effec-
tively simulates a piece of storage, realizing many drawbacks of the imperative
global store in the functional model. As one instance, the correctness of (1.3)
has the precondition that v is an m-vector with m > n. For another, when
access to an undefined location of a vector occurs, we must resolve either to
let it go, the equivalent of imperative programs ignoring references to cells on
which no assignment has been performed, or ensure that error is returned.
The former is clearly unacceptable, and the latter generally requires a run time
check of every vector access. Finally, with individual element updates instead
of function mapping as the fundamental operation available to the programmer,
the parallelism inherent in many looping calculations is likely to be hidden in
syntax.
12
This brings us to the contribution of this thesis, which is a way to use matrices
themselves, rather than sequences or vectors that represent matrices, as a data type
in functional languages. The approach has inherent advantages, which we intend to
exploit:
o+ Matrices admit implementation as arrays, avoiding the pointer space and ele-
mentwise garbage collecting overhead of sequences.
o+ We can design flexible ways to map functions over matrices, making it easy
for programmers to avoid elementwise access and update with their attendant
difficulties. Moreover, mappings of common scalar and reduction primitives
and nonrecursive user functions can be compiled so that copying is simply not
required. With some work, this property extends to a useful class of recursive
functions.
o+ The number and sizes of dimensions of a matrix can be represented in its type.
A type inference procedure then annotates programs with the sizes of matrices
produced by each subexpression, or returns fazi when it can determine that
dimension or size constraints inherent in the program could never be met at
run time. Efficiencies in the compiled matrix code accrue as for other type-
annotated programs; only a relatively few dimension and size constraints that
are not resolved at compile time appear as checks in the compiled code.
o+ This style of program, where functions are mapped over matrices, submits to
analyses that almost trivially disclose some optimizations which are difficult to
perform on imperative programs.
1.6 A Preview of Structured Values
It is clear from the points of Section 1.5 that mappings of functions over matrix
elements are essential to our paradigm. A simple way to provide such mappings
13
would be with a matrix version of ApplyToAll in Example (1.1). The problem is
that we would like to express many different kinds of mappings succinctly:
(a) Over the rows or columns of a rectangular matrix.
(b) Over the proper tri-diagonal band of a rectangular matrix.
(c) Over all contiguous 3 x 3 submatrices of a rectangular matrix.
(d) Over all m x n elements of a p x q x m x n four-dimensional matrix.
Operation (a) is a quintessential matrix algorithm building block, (b) is a typical
sparse matrix operation, (c) arises in image processing problems. Finally, (d) is an
unconventional formulation of a so-called blockmatr?operation, where a (p m)x
n) two-dimensional matrix is considered as px q blocks of size m x n (see [GVL83J).
It is not hard to invent many other examples of useful mappings. Thus we might
need many variants of ApplyToAll, each with its own definition.
Our approach is to consider the way we think about matrix calculations. One
way to visualize vector inner product is as follows:
Multiply all pairs formed from the the i-th element of a and the z-th
element of b. Put the i-th product at position i of a vector and reduce
this vector by addition to a scalar answer.
To amplify this idea, consider matrix multiplication:
Form an (i,j)-th scalar result by applying inner product to the i-th row
of A and the j-th column of B. Form the answer matrix such that the
(z, j)-th scalar result is in row ? and column j.
Conceptually, we are decomposing the input matrices into spaces of matrix compo-
nents. In this case they are rows and columns, but other possibilities, even with more
dimensions, are obvious. We call these typical components. Further, we say spaces
14
rather than collections or sets because the order of components (in the matrix they
came from) is significant, and is carried along in the decomposition. This allows a
space of typical components to be recombined into a matrix later on, an operation
called projectzon. Thus inner product uses two spaces of i-th scalars operated upon
in i-th pairs by multiplication. Matrix multiply has a space of i-th and a space of
j-th vectors where inner product is applied to all (i,J)-th pairs. In both, the ap-
plication of a function (resp. multiplication and inner product) is simultaneous over
many typical components. Inner product projects the space of multiplication results
to form a vector. Matrix multiply projects the space of inner product results to form
the result matrix.
Clearly, the components, being pieces of matrices, are matrices themselves. What
may not be obvious is that the spacesare also best thought of as matrices. We refer
to such a two level matrix --H a matrix of matrices of values from some base set
as a structured vatue or s-val. In s-val calculations, the notion of a conventional
matrix A (including a zero dimensional matrix, a conventional scalar) is replaced
by an s-val containing exactly one component, the matrix A. Suppose A has m
rows and n columns; then the s-val that represents A as a space of ith typical rows
is an m-vector of typical components that are n-vectors. Moreover, the m-vector
extends along a dimension we choose to label i. The n-vectors each extend along the
dimension we choose to represent columns.
We will study five primitive operations on s-vals that, while not expressive enough
to encode arbitrary random matrix access and update (as do Hudak's operators), are
sufficient to express a large variety of mappings over matrices, and also computations
that involve recursively splitting a matrix into smaller pieces. In very informal terms,
the primitives are:
Typ Form typical components. Build an n+l-dimensional s-val of m--H1-dimensional
components from an argument with n dimensions of m-dimensional compo-
nents.
15
Proj Project typical components. Project an n-dimensional s-val of m-dimensional
components into an n--H1-dimensional s-val of m+1-dimensional components.
Proj is roughly the inverse of Typ.
Ext00 (resp. Ext01) Extract first (resp. last) components. Return the argument s-val
with each component replaced by its own first (resp. last) component along
some dimension. For example, extract the first row of each two-dimensional
component of the argument.
Ext? (resp. Ext11) Extract all but first (resp. all but last) components. Return the
argument s-val, with each component replaced by itself, except its own first
(resp. last) component is missing along a given dimension.
Join Adjoin pairs of components from two s-vals, V1 and V2. We describe this by
example. Suppose V1 has scalar components arrayed along its i and j axes and
V2 also has scalar components, but along its j and k axes. Then (Join Rows V1
V2) has three dimensions ii, kits (i,j, k)-th component is a vertical vector
with two elements, the (i,J)-th element of V1 and the (j, k)-th element of V2.
It is convenient to define arithmetic in vector-reducing form, e.g. Redt as well as
the normal infix binary operators. Reduction is performed on all components of the
single argument s-val simultaneously. Infix + behaves according to the following
program fragment:
a + b Red + Rows (Join Rows a
That is, we join the arguments into a 2-vector, then reduce by addition.4
4There is actually a slight difference between a+b and Red+ Rows (Join Rows a
in the final semantics of svexp. The former signals a type error unless both argu-
ments have typical components are that are scalar. The latter produces a result for
scalars and typical column vectors.
16
Extending the inner product example to s-vals, we have
InnerProduct a b
Red+ Rows
Proj ? Rows
(Typ Col i a) * (Typ Row i b)
Assume for the moment that the arguments a and b each contain only one typi-
cal component. Beginning with the last line, the i-th typical scalars of each input
component are formed and pairs of these are multiplied to form i-th products. The
projection of i onto Rows turns this space of scalars into a column vector, which is
reduced by + to form the answer.
Of course, the inputs need not be so simple. Consider matrix multiply:
MatMultA B
Proj i Rows Proj j Cols
InnerProduct (Typ Row i A) (Typ Col j B)
Again we form i-th typical rows of A and j-th columns of B. The semantics of
InnerProduct, as hinted above, are such that the function acts simultaneously on all
(i, j)-th pairs of vectors simultaneously.
1.7 A Larger Example
Many matrix algorithms use problem subdivision strategies that are conventionally
hidden in explicit iteration. Consider LU factorization; one possible imperative cod-
ing in Pascal is given in Figure 1.1: Patient readers will glean that loop (a) with
index j selects subproblems A\j. .n, j..n] of ever decreasing size. Loop (b) gets multi-
pliers into the bottom n --H j elements of the j-th column. These form the L triangle
of the factorization. Loops (c) and (d) do a rank one update of A\j + 1..n,j + 1..n],
which are the source of data for the next subproblem selected by loop (a). Careful
inspection will show that while the iterations of loops (b), (c), and (d) are free to
17
procedure lu (var A: matrix; n: integer);
var i, j: integer;
begin
forj :1 ton--Hi do begin
for i := jtl ton do
Ak,j] := A[i,j]/A[j,j];
for jj := jtl ton do
for i := j+1 ton do
A[i,jj] := A[i,jj] --H A[i,j] * A[j,jj]
end
end
Figure 1.1: Pascal code for LU factorization
(a)
proceed simultaneously, each iteration of loop (a) depends on the values produced
by the previous one, except the first, which depends on the input.
With s-vals, we express the dependency between the value of A in the j-th it-
eration and the jt 1st with recursion. The other computations are typical element
arithmetic. A code is given in Figure 1.2. Here Numof A axis n returns true if and
only if A has size n along axis; it tests for the base case. If the test fails, then several
extract operations cut A into four parts --H the upper left element (piv), the rest of
the top row and left column (top?rt and boLift), and the rest of A, with all but the
first row and column (boLrt). The two typical element computations are completed
(mults and updt), then the recursive call factors updt. The final step is to join the
multipliers and top row back onto the call result.
1.8 Alex
The syntax above is simple and verbose, as we do not wish to give formal semantics
for a lot of syntactic sugar. Pattern matching (see e.g. [Rea89]), for instance, can
18
def lu A =
if (Numof A Rows 1) or (Numof A Cols 1) then A
else let
top = Extr Fst Row A, bot= Extr Allbut Fst Row A,
p?v = Extr Fst Col top, top??= Extr Allbut Fst Col top,
boLift = Extr Fst Col bot bot_rt = Extr Allbut Fst Col bot,
mults = Proj i Rows (Typ Row ? boLiftI piv),
updt = Proj i Rows Proj j Cols
(Typ Row i Typ Col j boLrt) --H
(Typ Row i mults) * (Typ Col j Thp?Tt)
in Join Rows top Join Cols mults (lu updt)
Figure 1.2: svexp code for LU factorization
make s-val expressions more succinct and elegant, as it does in other functional
languages. However, s-vals were actually conceived for a very sweet syntax indeed, the
alexical syntax of the Alex programming environment [KTC+87J. In this system for
matrix programming, one forms a function by manipulating a picture in a workstation
window much as one might in a so-called paint program. Textual notes can be added
for docmentation, but very little other typing is required. Values are denoted by
boxes. S-val primitives correspond to manipulations of boxes. For instance, one
forms `Extr Fst Col V' and `Extr AllBut Fst Col V' at once by clicking an Extract
tool on the box that represents V. These are represented by a new, partitioned
box that overlays V, but is offset a little down and to the right. Function calls
are boxes labeled with the name of the called function, with small knobs to denote
inputs and outputs. There is a library facility for user functions. When two different
boxes represent the same value, they are given the same color. Conditionals are
subwindows of the function window. Editing constraints on boxes in conditional
windows prevent scope violations. One clicks on a button to see the true and false
19
values of the conditional in alternation.
It turns out that our type inference algorithm can be adapted to work on in-
complete Alex functions, returning a type if and only if a term may be extended to
a complete Alex function body. The algorithm runs fast enough that it can be run
after every user interaction as a function is created. Hence a user's request for an edit
operation is honored only if it could result in a type-correct function. By induction,
complete Alex function definitions are type-correct.
1.9 Generality
Clearly, though we have picked our primitives to exploit the kind of structure often
found in matrix programs, not all programs will be expressible purely in this form.
Some will need to compute indices and update matrix elements or components. For
these instances, we fall back on s-val versions of Hudak's primitives, finding the
extensions natural. For example, a function utilizing them behaves as expected
when applied to a space of typical components.
A separate issue is expanding the set of primitives. Though our small set neatly
expresses plain LU factorization, for instance, they cannot express partial pivots,
which involve computing the row index of the maximum element of a column, and
then swapping two rows at each iteration. Though we could use Hudak's primitives,
explicit permutation primitives seem a preferable approach. We exclude such matters
of taste from this thesis. Our intent is not to design a language, but to show how
languages can be designed around s-vals with an expectation of good performance.
We also restrict most of our discussion to first order constructs; wherein functions
cannot be passed as values. We also assume call-by-value argument passing semantics
all arguments are evaluated before control is passed to a called function. In principle,
the s-vals presented here extend naturally to higher order languages and also to
languages with lazy evaluation, but a treatment of the practical issues involved would
20
easily make another thesis. Some directions to take are discussed in Chapter 7.
1.10 Denotational Semantics
Though s-vals and the primitives appear simple in the examples shown thus far,
questions of how they should behave in general settings are somewhat involved and
have deep interrelationships. As such, we choose a standard tool for unraveling pro-
gramming languages denotational semantics.5 We find that a clean denotational
definition provides natural answers to questions that are otherwise unclear. For in-
stance, what should happen if we add a space of i-th pairs with one element drawn
from each of two vectors as in the inner product example, but the input vectors have
unequal lengths?
1.11 Outline
In Chapter 2, we introduce denotational semantics for matrices that support the for-
mal definition of s-vals. We establish some basic consequences of this definition that
will be of later use in designing a type system and inference algorithm. In Chapter
3, we give the denotational semantics of a small functional language, svexp, that
uses s-vals for all of its calculations. We add theorems about primitives that are
again helpful with types. Chapter 4 describes a type system and type inference al-
gorithm that infers the numbers and sizes of dimensions of subexpressions in svexp
programs, or reports fail when none exist. We are prove our types semantically
sound; a program with a type yields an s-val result. We proceed to show the in-
ference algorithm sound and complete (the latter in a restricted sense). Chapter 5
gives a code generator that allocates memory only for user function arguments and
results. The code generation strategy is highly architecture-independent. It gener-
5The uninitiated reader should refer at least to [Rea89] on functional language
semantics; [Sto77] is more comprehensive.
21
ates a single-assignment intermediate code that is portable to many architectures.
Examples in the C programming language are given. Chapter 6 tackles the elimina-
tion 0 unecessary copying by the single assignment output of the svexp compiler.
Finally, Chapter 7 covers related work and describes Alex in more detail. We draw
final conclusions and give pointers toward work to be done.
Chapter 2
Matrices and Structured Values
In Section 1.6, we described an s-val as a space of typical components, giving a certain
semi-formal, technical meaning to these terms through several examples. We then
pointed out that a space of typical components is more precisely characterized as a
matrix of matrices of base values. Indeed, we now replace the evocative terms space
and typical component with more precise ones, outer matrix and inner matrices.
The outer matrix is the s-val itself, an indexed collection of inner matrices that are
ultimately acted upon "in parallel" by s-val arithmetic primitives. The inner matrices
are closer to the standard notion of arrays indexed collections of elements from
some base domain. In Svexp, the language described in Chapter 3, programs accept
and return s-vals with exactly one inner matrix. Within a program, however, the Typ
primitive splits single inner matrices into many for parallel computations, after which
Proj assembles the results. It is curious that the definition of matrices that expresses
this intuitive behavior, as described in Chapter 1, is itself rather nonintuitive. None
the less, the definition and supporting notation and theorems allow us to define s-vals
in the last section of this chapter. Finally, we give s-val semantics for classical matrix
bracket notation. Not coincidentally, bracketed constants have values that are just
(a subset of) the allowable svexp inputs and outputs mentioned above.
22
23
2.1 Domains
Matrices are built from Scott domains ([S871,Sto77J) to ensure that recursive func-
tions 011 s-vals have meaning.1 We begin with representative flat base value domains
from which s-val elements are taken:
Bool = t?U?? ?f?1S?			booleans
-1 0 1
Int			--H			integers
B			=			Bool + Int + B2 + B3 + ... other base types
For convenience, we refer to an arbitrary summand of B as Bj, where B0 and B1 are
synonymous with Bool and Int respectively.
We next select the fundamental representation for matrices. Conventionally, one
picks a representation for vectors,2 identifies zero-dimensional vectors with base
values, then denotes an n-dimensional matrix recursively as a vector of (n --H 1)-
dimensional vectors. Unfortunately, a recursive definition does not cleanly support
our most fundamental operation --H splitting a matrix into pieces along an arbitrary
axis. We adopt an alternative choice for matrix representation, a function of tuples
over the flat domain of natural numbers:
0123...
N =			I			fiat natural numbers
Intuitively, n-tuples would be appropriate indices for an n-dimensional matrix. In-
stead, for reasons elaborated later, we choose infinite tuples of natural numbers. We
define an infinite tuple as a function from positions, denoted by natural numbers,
to values that are also natural numbers. Among index tuples, the ones that return
I for any input are all considered equally divergent. Hence, rather than the usual
infinitely tall domain used for general functions, we arrange index tuples in a fiat
1Scott originally used complete lattices, but we use complete partial orders
(CPOs). The distinction is unimportant here.
2[Gor79] uses a list of index-value pairs.
24
CPO, I. All index tuples containing I (that is, returning I on any argument other
than I itself) are equal to I. All other tuples are greater than I and otherwise
incomparable.
Example 2.1 Even though index tuples have infinitely many elements, we will be
interested in only finitely many of them. A bracketed list representation is sufficient
to represent these, but we must append and elipsis to stand for the infinitely many
values not denoted:
i= (3,5,1,...?
Formally, i is a total function with i 0 = 3, i 1 = 5, and i 2 = 1, and the rest of the
map unspecified. E]
Thus, if i is an index tuple, it is natural to speak of the value (i p) as the value of i
at position p. The notation above displays positions 0,1, and 2. To avoid confusion
with the finite tuples expressible in many languages, we henceforth call the elements
of 1 indices.
We can now give the domains of that contain matrices as a proper subset:
Wr			=			t?angerr1			range error value
MR			=			I (R t Wr)			contains matrices over domain R
=			?R MR			contains all matrices
The special value rangerr is returned by a matrix when it is applied to an index that
is out of range in a sense to be defined. Matrix values, R, may in principle be any
domain, but we will be considering only certain cases elaborated in the rest of this
chapter.
2.2 Matrices
Clearly, matrix domains as given above are overstocked with values that we would
not consider matrices. A matrix should be a map on a finite collection of short index
25
tuples. Yet here we have it defined as a map on an infinite set of infinite tuples. We
need to restrict our semantics to a useful subset of MR.
Definition 2.2 (Matrix) M ? M* is a matrix whenever it has prope?ies 1, 2, and
3 below. ?
Our first property says that indexing computations always succeed and that a matrix
cannot magically make sense out of a divergent index expression.
Property 1 (Strong strictness) For matrir M,
i=I if and only JMi=I
In particular, this leads easily to the following:
Theorem 2.3 The matrices over any base domain plus I form a flat (CPO) sub-
domain of M*.
Proof. It is sufficient to show
M1 E M2 if and only if (M1 = M2 or M1 = I)
The if direction is trivially true, and so is the only if except for the case, call it (*),
I E M1 E M2. llere we need to establish
Vi1,i2 ? I.ii L i2 implies (M2i1) L (M1i2)
For then M2 E M1, and finally M1 = M2 as desired. By Strong Strictness, the
implication is obvious when i1 = I or i2 = I. Conversely, if I E i1 E i2, then
(M2 i1), (Mi i2) ? I, also by Strong Strictness. Taken with (M2 ii) Z (Mi i2), a
consequence of (*), we have (M2 i1) = (Mi i2) because all these applications range
over a fiat base domain. E]
26
Next we associate with each matrix a finite axis set A c N and a dimension
function o? : A N. We write MA,Q to assert this association; it may be read as
"M is a matrix with axis set A and dimension function ?." Conceptually, A and ?
give the shape and size of the matrix. To formalize this, we define the following set.
Definition 2.4 The set (inrange MA?a) is the set of indices i ? I satiss"ying:
Va ? A.O <ia < ?a
Example 2.5 Suppose MA,a, where A = ?OJ and ?O = 3. Then inrangeM is all
indices i that have values 0 to 2 at position 0 i e_ 0 < (i 0) < 2. For clarity, we can
state this in an informal notation:
inrangeM = f(k,*,*,.. .?0 ? k < 3J
where the angle brackets construct indices in the obvious way and * denotes any
natural number, and the elipsis extends them infinitely to the right. Similarly if
NA1,a', A' = flJ and & 1 = 2, then
inrangeN = f(*,k,*,.. .?0 <k <
We can now insist that matrices be rectangular in a formal sense.
Property 2 (Rectangularity) For matrix M and index i,
i ? inrange M if and only if M i ? ran?err
Example 2.6 For this and all subsequent examples, we identify 0 ? N with the
notion of rows and 1 with columns. Then given M and N as above, M is a column
3-vector; it returns non-rangerr values if and only if it is presented with an index
(k,*,*,.. .?, where 0 < k <3. Likewise, N is a row 2-vector. ?
27
Hence we say MA,Q has size n on axis a if (? a) = n, and M is d-dirnensional if
Al = d. The final property insists that if two indices agree at all elements of A, they
must be mapped by MA,Q to the same value in R.
Property 3 (Finiteness) If MA,oL, then for ii, i2 ? I,
E]
(Va?A.iia=i2a)			Mi1=Mi2
Example 2.7 Given M as above, M maps all indices (k,*,*,...) to some m? ?
It follows that the range of M has at most three elements (other than 1 and rangerr).
Thus, a matrix MA,Q essentially ignores all the index positions not in A. Consid-
ering only the positions that are in A, M behaves just like a conventional matrix of
mathematics.
Example 2.8 Suppose M and N as in the previous example, with values given by
= 2k + 1, 0 < k <3 and N(*,k,*,...) = 3k + 2, 0 < k < 2
These correspond to the classical notation:
3
-5-
M = and N = 2 5
Intuitively, this example shows that our matrices have natural implementations as
arrays in computer store, whereas most of the elements of M* do not. Thus we will
be careful to stick with language semantics that are closed for niatrnces. That is if
an expression has a semantic value in some matrix domain (other than I), then it
must satisfy Strong Strictness, Rectangularity, and Finiteness.
28
A question remains --H why have we bothered using infinite structures for indices,
just to define matrices that ignore all but a finite set of positions? The detailed answer
is apparent only in the semantics of svexp given later, but the essence we can see
right now: for any collection of matrices Mj, 1 < i < n the set fltffi--Hi inrange Mj is the
inrange set of another matrix. This property lets us naturally specify the behavior
of primitive and user functions of more than one argument when applied to s-vals
with arbitrary outer matrices; i.e?, the outer matrix of the result is non-rangerr for
exactly the set of indices that is the ?ntersection of the znrange sets of the arguments.
Example 2.9 Suppose MjA???? = (1,2), A1 = fol, ?i0 = 3, A2 = flJ, ?2 1 = 2.
Then
inran?eM1 n inrangeM2 = f(k,...? 10< k <31 n f(*,k,.. .? 10< k <21
= f(k,l,. . .? 10< k <3, 0<1<21
Now if we construct A = f0,11, ?0 = 3, ?1 = 2, then any M such that MA,a has
inrange M equal to the above set. E]
This example generalizes:
Theorem 2.10 1. C?ven n matr?es MjA??Q? , 1 < i < n, let I = fl?ffi1(inrange Mi)
and A = Utffi-i Aj. Then:
I=filVaEA.(?a)<(?a)1where(ca)=minf?a Ai?Q1<?z<?n1 (2.1)
2. For any M ? M* such that MA,Q, we have I = znrange M.
Proof. By the definition of inrange,
fl?inran?e??=tilv?<?<mae??.?ja?<???a??
which is equal, by tedious set-builder manipulations, to (2.1). This proves part 1.
Part 2 follows immediately from Rectangularity. ?
29
2.3 A and Other Tools
We now define some helpful notations and functions for dealing with matrices, indices,
and s-vals.
2.3.1 A as matrix constructor
Since we have used functions to represent matrices, a A notation provides a natural
way of describing them.
Example 2.11 Consider matrix MA where A = ?. By Property 3, M contains a
single value, say 3. Then in mathematics M is conventionally described with
M=fwhereforalli?I, fi=3
But it is far easier to write M = Ai. 3. El
This use of A is just a notational convenience for writing down semantic values in
M*. Syntactic A terms appearing in programs are entirely different entities.
Definition 2.12 For any expression E, the following texts have equivalent meaning,
provided the name f is not used elsewhere:
1. Functionf, where for alli?I, fi=E.
2. Ai.E
Furthermore, the meaning of 1 is taken as the usual one of function construction. El
Of course A provides no guarantee that the constructed function is a matrix; e.g.
Ai.(iO).
30
2.3.2 Function patches
We will in many contexts need to describe a new function that is the same as an old
one except at one value. For this we use the "patching" operator.
f? y#x
f [a/x] = f' where f' y =
a			y=x
An important example is patches to indices. If i ? I, then i [nia] is the same as z
except it has value n at position a.3
A and index patches together let us patch matrices.
Example 2.13 Suppose Mf01??, i.e., M is a vertical vector. Then
= Ai.Mi[(il)/O]			(2.2)
is M transposed to form a horizontal vector. El
To see why this is true, consider any index (*, k,.. .?. Then
o+ .N) = (Ai.AJi[(i 1)/oJ) (*,k, . .
The last step follows from M?0Y?? and Finiteness. It is the convenience of expression
exemplified here that makes the seemingly involved definition of matrices worth the
trouble.
This example clarifies another feature of our matrices that first appeared in Ex-
ample 2.8: they are strictly more expressive than the usual recursive definition of
3Of course, the function patch is itself a function. In fact, it stands for a family
of functions, one for each domain of objects patched; indices, environments, etc.
Conditionals are similar.
31
arrays (see Section 2.1). Vertical and horizontal n-vectors are distinct entities here,
whereas there is no such distinction in a recursive definition. More generally, all
index positions are first class citizens to our matrices. In recursive definitions, there
is a total order on index positions implied by the left-to-right order in which they
appear in programs. At best, subarrays of an n-dimensional array can be selected
only by giving k < n indices, which are assumed to index the first k dimensions in
order:
Example 2.14 Here is a recursive array declaration in Pascal:
A : array[0..9J of array[0..9] of Integer;
The (p, q)-th element of A is expressed with A[p][q], and the p-th row with A[p], but
there is no way to express the q-th column as a whole without explicitly copying it
to a fresh 10-vector.
Conversely, given Mfo,l1,a, (Q 0) = (? 1) = 10, which is equivalent to the defini-
tion to A above, the p-th row of M is R = Ai. M (i [p/0]) and the q-tli column is just
C = Ai. M (i [q/1]). The reader can easily verify R?11,? and c?oy,?. ?
This idea is generalized as follows.
Definition 2.15 The p-th component of matrix M on axis a is Ai. M (i [p/a]). El
Of course this is finally the formal notion of typical components presented in Chap-
ter 1.
2.3.3 Conditional values
To select one of two matrix values based on a truth value, we introduce the following:
32
Definition 2.16 Operator' o+, : Bool x M* x M* H M* has the following value:
T, B = true
B?%F=			F, B=false
I B=I
We will freely use the same notation when T and F are drawn from some other
domain. For instance, we make the following handy abbreviation:
2.3.4 Promotion
B1 A ... A B? =--H B1
B2
false,
false
true,
false,
Given MA and an axis a ? A, we will sometimes need to coerce M to a new matrix
Mt which is the same as M, except it has size one (rather than no size at all) on axis
a:
Definition 2.17 The promotion operator ? : M* x N M* is described as follows.
If MA,a, then
El
Mta =			M			if a? A
Ai. (ia) = 0 H (M i), rangerr			otherwise
Example 2.18 Suppose Mf01?Q and ? 0 = 3, i.e. M is a vertical 3-vector. Then
MIO is M, M?1 is a matrix of three rows and one column, i.e. it has exactly one
33
non-rangerr element along axis 1. Further, M?1I2 has three rows, one column, and
also one non-rangerr element along axis 2, one plane. El
We can generalize this example:
Theorem 2.19 If MA,Q, then (MIa)A'Q' where A' = A u ?a) and
o,			a ? A
o[1/aj, a?A
Proof Trivial if a E A. If a ? A, then we must verify (1) M' e M*, (2) Strong
Strictness, (3) Rectangularity with inrange M' given by A' and ?`, and (4) Finiteness.
Indeed, (1) follows directly from the definitions of A and M, (2) by case analysis and
the definition of the conditional, and (4) from the finiteness of M. For (3), we observe
from the given A' and ?` that inrange M' is just all indices i ? inrange M with value
o at position a. For such i, M i $ rangerr by the rectangularity of M. Therefore
M' i $ rangerr from the definition of promotion. This establishes the if part of
Rectangularity for M'. Conversely, all other indices it are either not in inrange M or
have (i'a) > 0. Thus M' it = rangerr, establishing the only if part of Rectangularity.
El
2.3.5 Matrix join
As a precursor of the definition of the svexp Join primitive, we now give auxiliary
function iM C that, when applied to two matrices, say M1 and M2, concatenates them
along axis c. Before a formal definition, let's see how it behaves in selected cases.
Example 2.20 Suppose M1 and M2 are both vertical 3-vectors as in the previous
example, and j = iM 0 M1 M2. Then j behaves just like M1 on indices of inrange M1.
J(k,*,*,.. .? = Ali (k,*,*,...), 0 < k < 3
34
Furthermore, the values of M2 are "stacked above" those of M1 on axis c:
and finally
J(k,*,*,...? =M2(k--H3,*,*,...?, 3< k <6
J (k, *, .... .? = rangerr, k > 6
We conclude J is a 6-vector. ?
Furthermore, iM c is overloaded in the sense that it works whether or not c is in the
axis sets of its argument matrices:
Example 2.21 Suppose M1 and M2 of the previous example and J = iM 1 M1 M2.
Then
=			M1(k,*,*,...?			ifo<k<3
=			M2(k,*,*,...?			ifo<k<3 and
J(k,l,*,.. .? = rangerr			otherwise
This behavior is achieved by having iM promote its arguments on axis c.
Definition 2.22 Given matrices M1, M2 ? Mj for some i (?.e. both matrices are
over the same base domain), let M11 = M1?c and M2, = M2?c. By Theorem 2.19,
these are matrices with known axis sets and dimension functions:4 M1, ?`1 ? and
M2?C21,%. Then
iMeMiM2 =
E)
Ai.(ic) < (?1,c)			M11i, M21iKic)--H(?1,c)/c1,			M1,M2 ? I
I,			otherwise
4We have used C and ? here rather than A and OL here because the former will
soon be adopted as the denotation for axis sets and dimension functions of inner
matrices of s-vals, and it is to these that iM is eventually applied.
35
The case for bottom-valued arguments preserves Strong Strictness, and Finiteness of
the result given that the arguments have these properties. However, join results are
usually not Rectangular. To get matrix closure, we will have to check the arguments
for conformability. We forestall this until it is clear what to do with non-conformable
arguments.5
2.4 S-vals
We can now formalize s-vals.
Definition 2.23 A structured value or s-val is a matrix of matrices of elements
from some flat base domain Bk. Moreover, simple s-vals satisfy Property 4 given
below. Ei
Thus, the domains of s-vals (one for each base domain) must be
Sk=MMBk =IH(Wr+MB?)=I?(WrtIH(WrtB?))			(2.3)
Recall that the Bj are our domains of base values, one per type.
We see an s-val is a curried function of two index arguments. Thus it is natural
to speak of an s-val as an outer matrix of inner matrices. Respectively, two indices
presented in order to an s-val are called outer and inner indices. We have expanded
the domain equation above as a reminder that, for a s-val V, V ij = rangerr holds
if and only if i ? inrange V or else j ? inrange (V i). That is, we can get a range
error if either the outer or inner index is not an element of inrange for the respective
matrix.
The property required of simple s-vals in Definition 2.23 is just this: all inner
matrices are of the same shape and size.
Property 4 For any simple s-val V, there exists an axis set C and dimension func-
tion ? such that for all i ? inrangeV, (Vi)C?. E)
51.e. signal a type error.
36
We consider only simple s-vals. Triangular arrays and computations may be expressed
in an extension of svexp allowing inner dimensions that vary systematically. This
is a topic for future research. Itfollows from Property 4 that the size and shape
of the outer and all inner matrices of an s-val V can be characterized by two axis
set/dimension function pairs, one for the outer matrix, which we denote with A, ?,
and one for all inner matrices, C, ?. We assert that the outer matrix and inner
matrices of V ? Sk satisfy Rectangularity with respect to A, o and C, ? by writing
VA,a;C,?. Portions of this unwieldy superscript are omitted whenever possible.
Example 2.24 An s-val with exactly one inner matrix of m rows and n columns
VA;Cq has A = ? (thus the outer dimension function is the empty map), C = ?O, 1?
and?suchthat?O=m,?1=n. L]
The use of A to express matrices extends naturally to s-vals.
Example 2.25 Suppose M = Aj .3. That is, M is a matrix of empty axis set that
maps all indices to 3. Further suppose V is an s-val of empty outer axis set, which
maps all indices toM. Then V = Ai.M = Ai.Aj.3 = Aij.3. ?
The last step above is a frequently used abbreviation for nested lambda terms.
Example 2.8 asserted the equivalence of certain s-vals having exactly one inner
matrix and matrices modeled with classical bracket notation. Indeed our tools are
now strong enough to give s-val semantics for matrix brackets:
Definition 2.26 The bracket constructor for certain s-vals is given by:
- xi = Aij.x
xo
Aij.(jO)?m H X(j0), rangerr
Xm?1 -
Xo			X??1			= Aij.(jl) <n H X(j1), rangerr
XO,O			Xo,n?1
Aij.(jO)<m A (jl)<n			X(j0),(j1), rangerr
- Xm?1,o			Xm?1,n?1
37
L]
It is not hard to prove that the constructed values are in fact s-vals with exactly one
inner matrix.
We hasten to add that there are interesting uses for s-vals with more than two
inner axes, and with inner axes other than 0 and 1. We have refrained from using
them in examples thus far merely because they are harder to write down.
Chapter 3
Semantics of Svexp
This chapter gives the denotational semantics of a small language, Svexp, that uses
s-vals to express matrix programs with no index calculations, i.e., without Hudak's
operators. We use svexp to show at once how the tools of denotational semantics
apply to languages including s-vals and to explore the kinds of programs expressible
using only the style of function mapping supported by s-vals.
An important step in developing svexp and any other language with s-vals is
a proof that the semantics is closed for s-vals. That is, the semantic value of any
expression is an s-val, a special value for "type error," or the so-called bottom value,
which means that the expression loops forever. Such a proof ensures that values flow-
ing through a program can be properly represented as blocks of contiguous storage, a
vital consideration for efficient code generation, which is one of our main objectives.
We prove the closure of svexp here as well as the foundational theorems for the type
system developed in Chapter 4.
3.1 Abstract Syntax
We will work with the domain named svexp of terms with abstract syntax given
in Figure 3.1. Some notes on the syntax: In general, n is used to represent arity.
38
39
(svexp?			def (def?+ in (expr?
(def? ::= (fvar-n? (bvar)?			(expr)
(expr? (const? (bvar? (fvar-n? (expr)? (prnm-n? (epr??
(prnm4? ::= TypcaiPr?acjExtr'?cY,j=o,i)iR?ed+ c
(prnm-2) ::--H--H Joinc +
(prnm-3?			Cond
Figure 3.1: Abstract syntax of svexp.
As a superscript over syntax, it means the syntax is repeated n times. The non-
terminal (const? ranges over bracket terms as in the left hand sides in Definition 2.26.
Binary + is given in prefix form for convenience. The nonterminals (frar-n? and
(bvar) represent the sets of n-ary function varnables and bound variables respectively.1
Such a separation is natural for first order languages and simplifies the semantics of
function application:
Example 3.1 In the term foo x y, the grammar assures that foo is a function variable
of arity two and that x and y are bound variables. ?
An example of a program is given in Figure 3.2. For clarity in this work, we use
numbers 7, 8, and 9 for outer axes to distinguish them from inner ones by inspection.
The reader may find it helpful to consider these as synonyms for "i-th, j-th, and
k-th" in the sense described in Section 1.6. For inner axes, 0 continues to stand for
rows, 1 for columns, and 2 for planes. Thus Typ 0 7 x can be read "typical row i of
1This terminology conficts with the notion of bound variables in e.g. lambda
calculus and logic, but is conventional in the literature for first order functional
languages.
40
def
scan-plusx
mat-add a b
in
maw x y
Proj 7 0
+ (Typ 0 7
Proj 8 1 Proj
Red + 2
Typ 0 7 Typ 1 8
Join 2 a b
Proj 7 0
Cond
x)(Typ 0 7 Extr01 0 x)
70
(>(Typ 0 7 x) (Typ 0 7 y))
(Typ 0 7 x)
(Typ 0 7 y)
Proj 7 1
scan-plus Typ 1 7
mat-add 1 2			5 6
34			78
Figure 3.2: An svexp program.
41
x." In this program, scan-plus returns a vector one element shorter than its input,
where the i-th element is the sum of the i-th and (i+1)-st elements of the input.
Function mat-add adds rectangular matrices a and b by joining them like the bread
slices in a sandwich along axis 2, taking a typical 2-vector plug from the sandwich
(by two Typ operations), reducing this by addition, and projecting the result (twice)
to again obtain a rectangular matrix. Finally, max scans its two vector inputs and
puts the maximum of the i-th element of vector x and the i-th element of vector y
into the i-th position of the result. The + in the abstract syntax is representative of
all the usual scalar operations, including the test ? in this code. The program body
adds two matrices and applies scan-plus to each column of the result.
3.2 Semantic Groundwork
There are three kinds of expressible values in svexp semantics: s-vals, primitive func-
tions, and user functions. Two special values are reserved for type errors. Roughly
speaking, the first is for axis set and base type mismatches, and the second is for
dimension mismatches. Hence the following domains include everything except user
functions:
typerr2			typerr1
Wt =			type error
= ?k>O Sk + Wt			s-vals and errors
P = S			S			primitive functions
All domain sums coalesce bottom elements. Here Sk is as in Equation (2.3). It
seems natural that P would also serve as the domain of user functions, but technical
considerations lead us to a different approach. We will have user functions accept
and return not s-vals, but simply matrices over the base domains. Allowing also for
type errors in inputs and output, we have:
M = ?k>O MBk + Wt			matrices over base types and type errors
F = M			M			user functions
42
The input matrices come from the inner matrices of s-val arguments to which the
function is applied. Similarly, the output matrices become inner matrices of the
s-val result of the function application. It follows that the semantics of user function
application must in concept "disassemble" argument s-vals, feed the pieces to the
user function, and "reassemble" the answers into an s-val result.
Interpretation of variables in svexp programs is provided by environments, which
are functions from variables to values. There is a set of these for bound variables
and one for function variables.
bve = bvar H			bound variable environments
fve = frar-n			F			function variable environments
As a convention, ? will range over bve and ? over fve. The binary function ?
catenates environments. That is, if p = P1?P2 then (Px) = (Pi x) if x E domPi
and P2 x otherwise, and respectively for ?i and ?2 Note that ? is not commutative.
We compose the meanings of programs from the following semantic functions:
1: svexp H P
: svexp fve H bve S
D: svexp fve H fve
: svexp H
It remains to elaborate their definitions.
3.3 Unchecked Semantics
primitives
expressions
declarations
programs
It is appropriate to consider unchecked semantics for the Svexp primitives first,
making assumptions as needed about the sizes and shapes of primitive inputs. We
then complete the semantics by discharging these assumptions with so-called type
checks that map assumption-violating inputs to a type error value.
43
3.3.1 Typ and Proj
The semantic function I interprets the s-val primitive operators as functions from
s-vals to s-vals. It captures formally the behavior of the primitive operators described
in Chapter 1. We begin with the Typ primitive. Recall (in the original, rough terms)
that its purpose is to split the inner matrices of an s-val into components along some
axis labeled, say, rows, and redistribute these along some outer matrix axis, labeled,
say i. llence we called the components "i-th typical rows" (see Section 1.6 and
Definition 2.15). Using the foundation laid in Chapter 2, we can restate this notion
precisely. Let c and a be natural numbers denoting an inner axis and an outer axis,
respectively. Informally, the expression Typ c a will be interpreted as the operator
that takes an s-val V and produces a new s?val V' in which the inner matrices of V
have been split along axis c and the components spread out along axis a. In other
words, assuming that a is not already in the axis set of V, the k-th component of
V' along axis a is an s-val of the same shape as V, but whose inner matrices are the
k?th components of the corresponding inner matrices of V along axis c.
If a is already in the axis set of V, then the k-th component of V' along axis a
is the same as the k-th component of V along axis a, except that the corresponding
inner matrices of V along axis c are replaced by their k-th components in V'.
Using the definition of k-th component (Definition 2.15), if i is an index with
(i a) = k, then we want (V' i) to be the k-th component of (V?) on c, which is
Thus we have
Aj.Vi(j [k/cj)
V'?j = V?(j [(i
Index patches, e.g. (j [(?a)/c]), are parenthesized here for clarity, but the parentheses
are not written below; patches associate with the index to the left. This leads us to
44
the formal definition
I[Typ calV = Aij . Vi) [(i a)/c]			(3.1)
Example 3.2 Suppose V is an s-val of exactly one inner matrix MC,?, c = fo, ij,
? 0 = m, ? 1 = n; i.e. V represents a conventional m x n matrix. Then I?Typ 0 7]V
is a new s-val that is an m-vector (along axis 7) of inner matrices, each an n-vector
(along axis 1), a typical row of M. El
Proj effectively inverts this operation:
I?Pwjca?V = Aij.Vi[(jc)/a]j
Example 3.3 Suppose VA, where 7 ? A, then
El
IiProj 7 Oj (I?Typ 0 7j V) = IiPr? 7 0?(Aij . Vi) [(i 7)/O])
= Apq.(Ai).Vi)[(i7)/0])p[(q0)/7]q
= Apq. Vp [(q O)/7] q [((p [(q O)/7j) 7)/O]
= Apq.V p[(q0)/7Jq[(q0)/0]
= Apq.Vp[(qO)/7]q
= Apq.Vpq=V
(3.2)
The second to last step follows from 7 ? A and Finiteness. This generalizes easily to
the following.
Theorem 3.4 Let V be an s-val with outer axis set A and a ? A be an outer axis
number. Then IIPwj a cl (I?Typ c al V) = V. El
The astute reader will now ask, when is the condition a ? A not true, and what
happens then?
Example 3.5 It is not hard to verify the following:
45
I?Proj 7 O?
I?yp17?
I[TypO7?
123			1
456			=			5
7 8 9			9
In general, composing the meanings of Typ Cj a for several inner axes Cj and finally a
projection from a onto c yields a function that gets the "diagonal" along the axes Cj
of each inner matrix of the operand.
3.3.2 Extr
Extract selects one or more adjacent components along a given axis c of each inner
matrix of its argument. The simplest incarnation is Extr00 c, which merely selects
each zeroth component. The selected components become the inner matrices of the
result. In the following equation, the A term modifies its inner index argument so V
always sees a zero in position c.
i?Extr00 c? V = Aij . Vij [O/c]			(3.3)
The complementary operator selects all but the first component. Here a A term
increments the inner index at position c:
i[Extr01c?V = Aij.Vij[(jc)+1/cj			(3.4)
Thus each inner matrix of the result is the same as the corresponding inner matrix of
the argument, except all components along axis c are "shifted down" by one position;
the component at zero is lost. The superscript one versions similarly select the last
and all but the last element, and we omit them.
3.3.3 Join
Join concatenates pairs of inner matrices from its arguments to form the inner ma-
trices of its result as follows. Consider all i E I. If V1 and V2 are the Join arguments
46
and V' the result, then (V' i) is just (V1 i) concatenated with (V2 i), wherever both
of these are non-rangerr. Concatenation is by matrix join given in Section 2.3.5.
I?oinc?VjV2=Ai.i? ?nrangeV?,V2 H jMc(V1i)(V2i), rungerr
(3.5)
Of course we have still taken no precautions to ensure iM is rectangular. This is
another spot to mend with type checks.
3.3.4 Arithmetic
Reduction by the representative arithmetic operator + occurs simultaneously on all
inner matrices, which are assumed to be vectors along axis c. To establish the upper
limit for the reducing summation, part of the definition names the inner dimension
function, which gives the length of the inner vectors:
I?Red+ c?V=Aij. ?
k=o
Vij [k/c] whenever V?			(3.6)
Notice that j is used here merely because it is handy. Any index patched with k at
c would work as well.
We pointed out in Chapter 1 that binary arithmetic can be expressed with Join
and vector-reducing arithmetic. This suggests the following derivation, where the
inner axis sets of V1 and V2 are assumed empty:
II+i(V?,V2? = I?Red$0?I?oin0jV1V2 (a)
= Aij. ?k%0o??1 (Az.i ? ?nrange V1, V2
iMc(V1?)(V2?), range?r)ij[k/cj
= A?j.z??nrangeV1,V2 H
?k2?o1 k<1 (Viij[O/cJ), (V2ij[O/c]), rangerr
= Aij.iEinrangeV1,V2 H (V1zj)+(V2ij), rangerr
(3.7)
This applies (3.5) and (3.6) above as well as Definition 2.22 and Theorem 2.19 in a
straightforward manner. Axis 0 was used here for the join and reduction, but any
47
other axis would serve as well, as is shown by the absence of zeros in (d). Note ? in
(b) is the inner dimension of the 5M result. We presume a little by asserting ? 0 = 2
in (c), though it makes sense that joining two matrices with empty axis sets yields a
2-vector. We prove a generalization of this shortly.
3.3.5 Conditionals
Equations (3.7) ultimately offer an important insight. If we have an 11-ary primitive
Op that operates 011 base domains, the unchecked semantics for the corresponding
"s-val version," Op, which acts on all inner matrices simultaneously, has a general
form:
1?Op? V0			-			= Aij . i E inrange ..... .
(3.8)
Op(V0ij),. . . (Vn?iij), rangerr
We apply this reasoning to the base domain conditional operator to obtain:
i?Condj VB VT VF = Aij . i ? inrangeV?, VT, VF (3.9)
((VBij)H(VTij), (VFij)), rangerr
Here we assume the inner matrices of VB map all indices to a single element of Bool,
and that VT, VF E Sk for some k.
Example 3.6 Recall Definition 2.26. Suppose we have
true			1			3
VB =			VT =			,			and VF =
false			`			2			4
Then form typical ?-th components of all these and present them as conditional
arguments, projecting the result:
i?Proj 7 Oj iiCond? (i[Typ 0 ?1 VB) (i?Typ 0 7? VT) (i?Typ 0 7? VF) =
4
This behavior is much like the vector merge instructions of high performance vec-
tor processors. On the other hand, let us now form i-th components of VB, J-th
48
components of VT, and k-th components of V?:
I?Pwj 9 0? I?Pwj 8 1? IIPrQ 7 2? I[Condi (3.10)
(I?Typ 0 7? VB) (I?Typ 081 VT) (I?Typ 091 Vp)
What is the value of this expression? Recall that, conceptually, Cond will return an
(i, j, k)-th (i.e. (7, 8, 9)-th) typical component for each triple formed of an i-th, j-th,
and k-th component of the inputs. The triples then consist of a truth value b from
VB, an integer t from VT, and another integer f from VF. The (i,j, k)-th answer
component for the triple is just t if b is true, otherwise f. The j, j, and k axes
are finally projected onto planes, columns, and rows of a three-dimensional result
respectively. Thus the 0th plane corresponds to all the triples with b = true. Below
we write the plane with these conceptual triples alongside the true answer, obtained
by picking tin each triple.
(true, 1,3)			(true,2,3)			___			12
(true, 1,4)			(true, 2,4)			12
Similarly for the false plane:
(false,1,3)			(false,2,3)			33
-			(false, 1,4)			(false,2,4)			4 4
It is easy to verify this intuitive result formally. ?
We have troubled with this long example because there was difficulty in reasoning
ad hoc about how Cond should behave in such cases. Then Definition 2.2 gave us
(3.8) and (3.9), and the difficulty went away.
The inner axis sets and dimensions of VF and VT match in this example. We
must yet consider the case when they do not.
3.3.6 Expressions
The semantic function S gives the denotational meaning of expressions, composing
the meanings of subexpressions in an appropriate way. For instance, we want (3.10)
49
to be the meaning of any svexp expression
Proj9O Proj81 Proj72 Cond
(TypO7 EB) (TypO8 ET) (TypO9 EF)
where EB, ET, and EF are terms with meanings VB, VT, and VF respectively. Thus
for an n-ary primitive, we apply the meaning of the primitive to the meanings of its
arguments. Recall from Section 3.2 that ? and ? are function and bound variable
environments respectively. Then we have:
S?(prnm-n) ..... . E?? ?p = II(prim-ni (??Eii ..... . (?[En? ?P)			(3.11)
Applying a user function is similar, but recall that user functions act on matrices
over base domains rather than s-vals. The semantics of their application must encode
application to each inner matrix explicitly. Again (3.8) is the appropriate idea. We
need only "lift" it to the case where Op acts on matrices of base elements rather than
base elements themselves:
Ei...En??P = Ai.i? inrangee1,...e? H f(eii)...(e?i), rangerr
wbere e? = ??k1 ?P
and f=????p
(3.12)
The recursion in these rules is grounded at constants and variables. For a constant
bracket term C, ??C? ?P is given in definition (2.26). Variable values come from the
appropriate environment.
3.3.7 Declarations
= (Px)			(3.13)
= (?f)			(3.14)
The semantic function D assembles the meanings of function definitions to form the
function variable environment. Each definition yields a singleton map for the defined
50
function variable:
DU Xi...Xn=?E]?= [f/f], where (3.15)
f = YAgy1... y? . M ?[E? (? [gif]) (I [(Syi)/xij... [(Syn)/xn])
Here (Yf) takes f to its least fixed point, providing the so-called natural meaning
of recursive terms. See [Sto77]. In fact the only uncommon feature of this equation
is the interjection of auxiliary functions S and M. The former matches the input
values xk, which are matrices over base types, to the range of the bound variable
environment, the s-val domain S. Thus we want
Sx = Ai.x			(3.16)
which changes a matrix over base types to an s-val of one inner matrix. On the other
hand, if v is the s-val result of evaluating the function body, then (M V), must be a
matrix of base domain elements to provide the proper signature for the meaning of
f. For this, we need to apply it to some index, but which one? Indeed, we achieve
reasonable semantics for user functions only if the choice does not matter, i.e. if v is
an s-val of empty axis set and has exactly one inner matrix. Thus user functions like
f x = Typ07x
whose bodies produce a space of components from inputs with only one component
are not allowed. For the time being, we adopt this as an assumption. The type
discipline of svexp will enforce it. This leads us to the following simple definition:
MV = VO			(3.17)
where 0 is the (arbitrarily chosen) index that is zero at all positions.
D also accumulates successive function definitions:
D[D1,...			= ?4tD[Dnj?
where ? = .......
(3.18)
51
Finally, ? gives the meaning of a complete program:
??ef D1,. . D? in E? = S[E?(D[D1,. . D?? A) I
(3.19)
The equation accumulates function definitions, then evaluates the body expression
in the function variable environment they produce.
3.4 Pull Semantics
We now set out to remove the simplifying assumptions made above that primitive
and user function inputs are well-behaved, and that user function bodies produce
values with only one inner matrix.
3.4.1 A primitive wrapper
The unary s-val primitives become well-defined on all inputs if we wrap the unchecked
semantics in a filter that handles the cases we earlier assumed away. The filter
functions like this:
o+ Propagate I, typerr1, and typerr2.
0 Assert the argument is an s-val with certain axis sets and dimension functions.
o+ Test the axis sets and dimension functions for some property. If the test fails,
return typer?1. Otherwise, return the value given by the unchecked semantics.
We capture this behavior in an auxiliary function:2
Definition 3.7
when VA,a;c,? has P it's V' =
E]
(V = I) I,
(V = typerr1) H typerr1,
(V = typerr2) type??2,
(VA?Q;c?? A P) H V', typerr1
2This form of auxiliary semantic function is borrowed from [Blo9D]. We find it
much more readable than others more common.
52
As a bonus, this maps all non-s-val inputs to typerr1. However, we are out to prove
that this never happens.
The interesting part of adding wrappers is in picking appropriate predicates P.
Generally, the weakest acceptable P is dictated by our desire to obtain s-val closure
for svexp. Recall that this involves showing that every program produces one of
I, typerr1, typerr2, or an s?val. It follows that we must show all primitives closed
for s-vals. The wrapper handles the details of propagating divergence and type
errors. However, we still need to ensure that each primitive returns s-vals on s-val
arguments. In doing this, we discover that the weakest P achieving closure can
often be strengthened in a way that does not compromise expressiveness, but which
prepares the ground for static type checking in the next chapter.
3.4.2 Typ and axis assertions for static typing
Wrapping the unchecked semantics for Typ with Definition 3.7 where P =--H c ? C
yields
i?Typ c alv =--H when vA,Q;c,? has c E C
(3.20)
it's Aij.Vij[(ia)/c]
The choice of P is important. Indeed, if vA,a;c,? and it is not true that c ? C (there
are no components to make typical), then Aij . Vzj [(i a)/c] = V. Thus we could
have chosen P =--H true and (3.20) would still be closed. However, requiring c ? C
opens the door for static type inference --H a term (Typ c a E) may be read as an
assertion that the meaning of E has c in its inner axis set, whereas with P = true,
no such reading is possible.
This is comparable to the treatment of the cons primitive in dynamically typed
Lisp vs. statically typed ML [MTH9OJ. In the former, lists may include any number
of differently typed elements, so the code (cons a b) says nothing about the types of
a or b. Conversely, ML lists are constrained to have elements of identical type, and
53
the call (cons a b) adds a constraint to the overall type of the program: "b's type is
`list of T', where T is the type of a". Our constraint c ? C sacrifices far less than the
ML list constraint, however. There are many applications of dynamically typed lists,
but no important ones for a new way to write an expression that is already written.
3.4.3 The other unary primitives
The wrapped version of Proj is:
I?roj cajV = when vA,a;c,? has a E A A c ? C
(3.21)
it's Aij.Vi[(jc)/a]j
If a ? A, the unchecked semantics again returns V: Aij . Vi [(j c)/a] j = V. Further-
more, if a E A and also c ? C, then the unchecked semantics gives an interpretation
without apparent use--Hmore than one outer axis is projected onto the same inner
one. llence an application of Proj asserts a type constraint: "V has outer axis a and
no inner axis c," and again no useful expressive power is sacrificed.
Similarly for Extr we have:
1iExtr0 cj V = when VA,a;c,? has c C C
it's Aij.Vij[O/cJ
i?Extr1 c? V = when VA,Q;C,? has c ? C
it's Aij.Vij[(jc)+1/d\
(3.22)
(3.23)
Vector-reducing arithmetic requires checks that the operand's inner matrices are
appropriate vectors of integers:
1iRed + c? V = when VA,a;C,? has V ? Si A C =
it's Aij. zkQCo)?l Vij[k/c]
Recall Si is the domain containing s-vals over integers.
(3.24)
54
3.4.4 S-val closure property for unary primitives
Let a b = 0 if a < b and a --H b otherwise, We begin showing svexp is closed for
s-vals by showing that the unary primitives return s?vals on s-val inputs:
Theorem 3.8 Let U be a unary s-val primitive. If VA,?;C,? and V' = I?U?, then
VlA',Q'; C1,7 with A', &, C', and ?` as given in Table 3.1.
Proof. The proof is by elementary consideration of primitive behavior on all possible
s-val inputs. Consider V = Typ c a V. Thus
V' = when VA,a;c,? has c C C
it's Aij . Vij [(ia)/c]
V is an s-val. Hence it satisfies strong strictness and finiteness in both outer and
inner matrices. Equation (3.20) preserves these properties by observation. Thus V'
satisfies the properties as well.
It remains to show V' is rectangular in both inner and outer matrices with A',
C', ?`, and ?` as given ill the first line of Table 3.1. For this, we have
i ? inrange V'
i ? inrange V and then j [(i a)/c] ? inrange (V i)
Vb ? A.0 ? (ib) < (ab) and then 0 <(ia) < (?c)
? VbEAU?a?.0<?b)<(o'b)
where ?` =Q[min?(aa),(?c)i/a]
Thus A' = A u fai and a' = ? [min f(a a), (? e)lIa]. Similarly, for all i E inrange V'
we have
j E inrange (V' i)			j [(i a)/e] ? inrange V
?			VbE C--Hfcj.(jb) ? inrangeV
Thus C' = C --H fej and ? suffices for ?` as desired. Proofs for the other primitives
are similar. ?
55
Table 3.1: Axes and dimensions for unary primitives applied to vA,a;C,?.
U			A'			C'			?`
Primitive Outer axes Inner axes			Outer dimension			Inner dimension
Typca			Aufal			C--Hf4			a[rnin?(?a),(?c)J/a]
Projac			A--Hfal			cu?4			? K? a)/c]
Extr0c			A			C--Hfcj
Extr1c			A			c
Red + c			A
3.4.5 Binary primitives
It is not hard to see that the unchecked semantics for Join (3.5) produces a matrix.
However, the contents of the matix are not, in general, matrices. They may fail to
satisfy Rectangularity. To rule out this case, we define a predicate on the inner axis
sets and dimensions of the s-vals to be joined that is true if the result is rectangular.
The definition is given in a compositional form: conformable1 can be thought of as
the "axis set part" of full conformability, which is tested by conformable2.
Definition 3.9
(conformable1 c C1 C2) =
true if Ci --H fcj = C2 --H
false			otherwise
true			if (conformable1 c C1 C2) and
(conformable2 c C1 ?i C2 ?2) =			v c ? C1 --H Ici (?i c) = (?2 c)
false			otherwise
A pair of matrices with axis sets and dimensions that satisfy conformable2 is called
a conformable pair. E]
Then a conformable pair joins to form another matrix:
Theorem 3.10 If M1CI,? and M2C2,?, then (conformable2 C C1 ?i C2 ?2) = true
implies iM c M1 M2 is a matrix.
56
Pwof. It is easy to see that Definition 2.22 on page 34 for matrix join preserves
strong strictness and finiteness. It remains to show the join is rectangular if and only
if its arguments are conformable. Let M = iM c M1 M2 and also let M1,, M2,, ? be
as in Definition 2.22. Consider the set of indices
s = ? inrangeM1, or i[(ic)--H(?1,c)/c] E inrangeM21i
Straightforward case analysis yields
(Mi)#rangerr ? i?S
Thus the proof finally devolves to showing that S is a valid inrange set when
Indeed, when this is true, we have
(conformable2 C C1 ?i C2 ?)
3 =			Vb? C1 u fcj.O <			< (?b)J
where ? = ?i [(?i' c)+(?2, c)/cJ. El
This proof also implies the following.
Corollary 3.11 With M1 and M2 as in Theorern 3.10 and a conformable pair, let
M = iM cM1 M2, where (by Theorern 3.10) MC,?. Then
(a) C = C1 u ?4 = c2 u Ici
(b) Vc? C--H tci.(?c) = (?ic) =
(c)			(? c) =			(?i c) + (?2 c) if c E Ci, cc C2;			(?i c) + 1 if cc C1, c ? C2
(?2 c) + 1			if c ? C1, cc C2;			2			if c ? C1, c ? C2
El
57
With this, the full semantics of Join is:
IiJoin c? V1 V2
= when A1,a1;C1,? has true it's
V1
when V2A2 ,a2;C2,? has (samebase V1 V2) A (conformable1 Ci C2)
it's (conformable2 c C1 ?1 C2 ?2)
Ai.i ? inrange V1, V2 jMc(Vli)(V2i), rangerr,
typerr2
(3.25)
Here samebase : S S H Bool returns true if its two s-val arguments have the
same underlying base type. This is clearly necessary for s-val closure, as the Join of,
say, an s-val of integers and another of booleans would otherwise be some function
not even in S. We then take pains to differentiate between type errors resulting
from mismatched axis sets and those from mismatched dimensio??. The distinction
is helpful later for proving type inference sound.
Theorem 3.12 The full semantics of the Join prnmitive is closed for s-vals. That
is, if ?1Ai,ai;Ci,?, ?2A2Q2;C?,?, and V = I?oin c? V1 V2, then VA,oL;C,?, with
(a) A = A1 u A2,
(b) Va?A.?a=minf(?1a), (??a)J
(c) C and ? as in Corollary 3.11.
Proof. Strong strictness and finiteness are again preserved by the semantic equation
(3.25) for both inner and outer matrices. It remains to verify rectangularity. The
inner matrices are join results on conformable pairs; hence they are rectangular
by Theorem 3.10 with size given by Corollary 3.11. These inner matrices, which
are non-rangerr values, are returned when and only when the outer index is in
inrange V1 n inrange V2. This implies VA,a as desired by Theorem 2.10. Li
58
The full semantics for binary arithmetic follows by derivation from those of Join
and vector-reducing arithmetic as in the unchecked case, (3.7) with the additional
refinement of a check for empty inner axis sets for both operands.
I?+?V?V2 = whenvik?al;Cl?? has V1 ES1AC1 =?ft's
when v2A2,a2;C2,? has V2			? Si A C2 = $ it's (3.26)
Aij.i?inrangeV1,V2			(Viij)+(V2ij), ranger?
Theorem 3.13 If V1Al,al;?? E S1 and V2A2,a2;?? ? Si, then (11 + Vi V2)A?Q;??
where
A=A1uA2 and Va?A.(?a)=mi4(A1a),(A2a)J
Proof. Similar to Theorem 3.12. E]
This is not surprising. When we add two spaces of typical scalars, we obtain another
(appropriately shaped) space of typical scalars.
3.4.6 The conditional
The conditional requires care. We certainly have to ensure that the predicate is a
boolean inner-scalar, but more is required to get s-val closure. Recall Example 3.6
and suppose
VF = 35
4 6
instead of the value given. This presents a problem, as the final (unchecked) seinantic
value is, with an abuse of bracket notation:
4 61
More precisely, all inner indices (0,*,.. .? are mapped to a single value, 1, whereas
the same cannot be said for (1, *,...?, which map to one of 4, 6, and rangerr. Thus
the unchecked semantic value violates inner Rectangularity and is not an s-val. One
59
recourse is to insist that the inner axis sets and dimension functions of the true and
false branches agree:
IiCond? VB VT VF = when v?AB?aB; CB,? has VB ? S0 A CB = ? it's
when vAT,aT;CT,?T has true it's
T
when vAF,aF;CF?F has Q it's			(3.27)
F
Aij.i ? inrangeV?,V?,V?
((VBij)			(VTij), (VFij)), rangerr
where
Q = (samebase VT VF) A CT = C?AVc? C?.(?c) =
(3.28)
The following is a nearly immediate consequence:
Theorem 3.14 The full semantics of Cond given in Equations (3.27) and (3.28) is
closed for s-vals. That is, if
E S0, v?AT?aT;C?? E Sk, and vTAF?QF;C???s?
for some k, then (IiCond? VB VT v?)Aa;C? ? Sk, where
A = ABUATUAF and Va? A.(?a) =min t(o?a), (??a), (o?a))
Proof. Similar to Theorem 3.12. ?
Let's look at where the search for closure has led us. First, Equations (3.27)
and (3.28) have good properties for type inference and code generation. Type in-
ference relies on computing representations of axis sets and dimension functions at
compilation time. This cannot be done when the shape and size of conditional results
depend on the value of the predicate. For code generation, if a compiler knows the
exact size (by examining type information) of a conditional result, then it can allo-
cate storage a priori, rather than, say, generating separate calls to a heap allocator
for each branch.
60
These good features are offset by the definition's rather heavy-handed restric-
tion of the svexp programmer. For instance, lists and vectors are both useful for
representing sequences. Since s-vals generalize vectors, we would expect them to
represent sequences as well. However, the conditional as given above allows us to
write only programs where the true and false branches of conditionals produce (s-val
representations of) sequences with lengths provably equal in any execution context.
To this there are two replies. First, we can adopt a narrow view. If some program
condition determines the size of the output, then we might immediately question
whether an s-val is the right data structure for the application. In other words, we
can admit that s-vals are not a good way to represent general sequences.
The second reply is more flexible. There is an important generalization to Equa-
tion (3.27) that still admits s-val closure, though it makes svexp harder to compile.
Observe that Rectangularity failed in the modified Example 3.6 because, intuitively,
components from VT and VF were "shuffled" according to the inner truth values of
the predicate to obtain the result. Since the components were not of the same size
and shape, the shuffle result was not rectangular. Equation 3.27 fixed this by forcing
all the components to be similar. An alternative course is to separate the case where
no shuffle occurs, i.e. when the outer axis set of the predicate s-val is empty. Since
there is only one (inner) truth value in this case, either VT or VF becomes the exact
value of the conditional. It follows immediately that the result is an s-val, provided
all the inputs are s-vals. We obtain such a semantics for cond by replacing (3.28)
with
(3.29)
In general, we may take Q as the disjunction of (3.28) and (3.29). This issue is
somewhat of a Pandora's box that we examine again in Chapter 7. For the time
being, we use the following semantic equation, which splits possible type errors into
61
axis and dimension mismatches as we did for Join:
= A?,??;C?,? has VB ? S0 A CB = ? it's
I?Cond?V?VrV? whenV?
when A?,a?;C?,?? has true it's
VT
when v?AF?aF;CF?? has Qi it's
Q2			Aij.iEinrangeV?,V?,V?
((VBij)			(VTij), (VFij)), rangerr,
(3.30)
t?perr2
where Qi = (samebase v1 v2)A CT = CF and Q2 = V c C CT . (?T c) = (?F c). Since
this merely specializes (3.27), we have
Corollary 3.15 The full semantics of Cond given in Equations (W.3()) is closed for
s-vals. That is, Theorem 3. 1? applies to these semantics also.
3.4.7 Function application and construction
We give svexp strict function call semantics by propagating bottom for any input
to the result. We also check the environment value for f to propagate "undefined
function variable" errors.
?VE1???En?4)P = whene1???e?hastrueft's
f=I HI,
Ai.i ? inrangee1,...,e? H f(eii)... (e?i), rangerr
where e? = ??ki ?P, 1 < k <n
andf=(?f)
(3.31)
From this rule, we obtain the following closure theorem.
Theorem 3.16 Suppose for given Ck, ?, and base domains Bbk, 1 ? k < n there
exist C, ?, and Bb, such that Ck,? ? MBbk implies ((? ....... Mn) C,? ? MBb.
Mk
Let v?Ak??k;CkJk ? Sbk and V = ?? .... Enj ?P. Then VA,a;C,? ? Sb where
A=A1U???UA? and VaEA.(?a)=minf(o?a)I 1?k<nJ
62
Proof. Similar to Theorem 3.12. The requirements for dimensions of inputs of f
replace the constraints of conformable2. E]
For function construction, Equation (3.15) needs only the addition of a type check
in the auxiliary function M. Recall that M coerces the s-val of a function body to a
simple matrix by applying it to the (arbitrarily chosen) zero index. We remarked in
Section 3.3.7 that this is reasonable, i.e. that outer components are not lost, only if
any other index would serve as well. In turn, this occurs only if the outer axis set of
the body is empty. From this we have:
MV = when VA has A = ? it's (VO)
3.4.8 S-val closure for svexp
(3.32)
The remaining unchecked equations for primitive application, variable reference, ac-
cumulation of definitions, and program values need no changes in the full semantics.
The first big goal of our study is at hand:
Theorem 3.17 The full semantics of svexp is closed for s-vals. That is, the value
of an svexp program is one of I (denoting divergence, i.e. an undefined variable
error or infinite looping), typerr1 or typerr2, (denoting type errors), or else an s-val.
Proof. As we have shown the primitives closed by Theorems 3.8, 3.12, 3.13 and 3.14,
closure of svexp follows by elementary structural induction to prove the following
assertions:
(a) Environments bind variables only to one of I, typerr1, typerr2, or an s-val.
(b) Function variables are bound in environments only to functions that are closed
for matrices.
(c) svexp terms have values that are one of I, typerr1, typerr2, or an s-val.
63
Since a program is a term, this is sufficient. The only interesting case arises for user
function definitions, where we must prove (b). Recall
DUxi...xn1?= [f/f], where
f = YAgy1 . .yn.M??EJ (?[g/f]) ( [(Syi)/xi]... [(Syn)/xn])
From the structure of our domains, we know
where fo ....... Xn.I and
oo
f= Lifi
i=O
fi = ..... . y? . M ?[E? (? [fi--Hi/f]) (I [(Syi)/xij... [(S yn/xn])
Moreover, Theorem 2.3 assures us that if all the f? are closed, then so is f. Certainly
fo is closed. Then assuming it for fi--Hi, we have (c) for the A body by the overall
structural inductive hypothesis (E is a smaller term than the definition wherein it
lies), and thus (b) holds for f?. ?
Chapter 4
Types and Type Inference
The object of the foregoing chapters has been to define s-vals and svexp, and to
show in a fundamental way that the definitions make sense. The final step was The-
orem 3.17, which showed svexp programs do one of three things: diverge, generate a
type error, or return a structured value. This is useful because s-vals, although they
inhabit a general function domain, are neatly characterized by three finite features:
0 The base element domain (name).
o+ Outer and inner axis sets.
o+ Outer and inner dimension functions.
We refer to these features collectively as the extent of an s-val. While proving s-val
closure, we have given theorems for the syntactic constructs of svexp that allow us
to precisely deduce the extent of the construct's semantic value from the extents of
the values of its parts. An extent thus derived is a valid extent.
4.1 Overview
In this chapter, we develop an algorithm that deduces valid extents automatically.
It manipulates expressions in a type lan?uage, which is a symbolic representation of
64
65
extents. We want the algoritlim to assign each subterm of a svexp program with
a type whose meaning is a valid extent (or perhaps a set of them). This is akin to
what is interchangeably called type inference and type reconstruction for conventional
languages. However, s-vals introduce subtleties that do not otherwise occur.
4.1.1 History
In the pursuit of an extent inference algorithm, we follow a long line of distinguished
research. Even early FORTRAN compilers deduced that the result of 1.2 + 3 is a
floating point number rather than an integer. There are good pragmatic reasons for
this:
o+ Convenience. The compiler takes care of required details. Here the program-
mer is spared writing types for all subexpressions, which are needed for code
generation.
o+ Abstraction of hardware. Types put a gloss on the details of physical imple-
mentation. Here the hardware has different internal models of numbers with
and without fractional parts, which the type system casts in familiar terms of
"real" and "integer" respectively.
Much subsequent work on type disciplines concentrates on convenience. At the
extreme, the goal of modern type theory is to encode entire program specifications in
types that are logical formulas provided by the programmer. A well-typed program
is then certain to be correct.
On the other hand, our work is inspired by [Mil78] (and later work of many
others), which shows how the types of all subterms in ML programs can be inferred
by the compiler with no explicit programmer guidance. The compiler succeeds with
type inference only if it is impossible for the program to suffer a type error later when
the compiled program is run. Thus, ML compilers always have enough information to
66
generate code without run time checks for the types of operands. This is somewhat
surprising considering the power of ML. Functions can be polymorphic, e.g. a single
sort function can sort lists of any type. Functions are also first class objects they
can be passed about as values much like integers. This contrasts with imperative
languages like Pascal where monomorphic type expressions are a mandatory part
of the syntax, and with dynamically typed functional languages like Lisp where no
declarations are required, but type errors are discovered at run time.1
With this background, we can put our goals for extent inference in perspective.
We seek the extension of ML's convenience benefits to matrix calculations. This
includes removing, as much as possible, the programmer's burden of writing tedious
and error-prone index arithmetic, loop bounds, and array declarations and offering
her a matrix version of polymorphism. We also want to turn the full power of type
inference back toward hardware abstraction, putting a gloss on the linear address
space of memory in most modern processors.
4.1.2 The approach
The extent inference algorithm makes two passes over the program. The first dis-
covers all axis set assignments and s-val element base domains consistent with the
semantics, or stops and signals failure when no assignment exists. A language of
axis types represents axis sets and element domains. Typing rules describe the axis
types of svexp constructs in terms of the types of their parts. A proof tree re-
sults when typing rules are employed in sequence to determine the type of an entire
svexp program. The inference algorithm constructs proof trees automatically, using
a unification algorithm to apply typing rules.
The second pass is similar to the first, except that full-fledged types replace axis
types. Types encode dimension functions in addition to axis sets and element do-
1Avoiding dynamic type errors is thus equivalent to the halting problem.
67
mains. A small language of arithmetic expressions encodes dimensions. Construction
of proof trees relies on axis types computed in pass one to choose typing rules and
"guess" the types of recursive functions before their types are fully known. Type
unification generates a set of dimension equality constraints as a side effect. A pro-
gram for which type inference succeeds and wherein equality constraints are satisfied
is certain to produce an s-val result or diverge. Moreover, the algorithm produces a
unique "most general" typing. Roughly speaking, this means that the types inferred
by the algorithm encode all possible typings that can be achieved using the typing
rules.
4.2 The Axis Type Language
We begin with a suitable syntactic representation of axis sets called an awis map:
(map? ::= [(adim?/(awis number)J(map? (map variable?(map?
(adim) ::= (adim varnable? o
(4.1)
An axis map (or adim) with no variables is called a gwund map (or ground adim).
The name "adim is a contraction of abstract dimension. In the second pass, we
will replace adim with dim, the language of dimension arithmetic mentioned in Sec-
tion 4.1.2. Here we are concerned only with ascertaining whether or not a dimension
will be needed at all.
To motivate the syntax of axis maps, we state a key property in advance of the
formalities. The interpretation of a map M as an axis set A satisfies the following.
(a) [./a] occurs anywhere in M only if a ? A.
(b) [o/a] occurs anywhere in M only if a ? A.
Thus, the association of o with axis a in the map pair [./.] represents certainty that
a ? A. Similarly, [o/aJ ill the map represents certainty that axis a ? A. It follows
68
that an axis map with both [./a] and [o/a] has no interpretation as an axis set. This
is an invalid map. All others are valid. Since the pairs containing ? and 0 denote
axis set membership or the lack of it, their order is not important, and many maps
may have the same interpretation.
The notion of maps with the same interpretation is important, and carries forward
into axis types. As we detail the interpretations of axis types, we will simultaneously
describe the equivalence relation =--H, consisting of all pairs of maps and axis types
with the same interpretation.
The formal interpretation of a valid ground map M is simply
fa [./a] occurs in MJ.
This is sufficient to characterize for valid ground maps by set equality. For example,
[./a] [o/a?]?			[o/aii [./a]?
Non-ground maps are interpreted in the context where they are employed; this is
described below.
We next need a language of base types to stand for the base domains. This consists
of a set of base type symbols, e.g. Int and Bool, and also base type variables.
We arrange two maps and a base type in a lexical triple, called an extent type, to
represent an entire extent. This is written
((MA, Mc, 4
where MA is the outer axis set map, Mc is the inner, and T is the base type. We
interpret valid ground extent types, those with two valid ground maps, as the set of
s-vals they describe plus bottom.
Finally, a general axis type is either an extent type, or a function type of the form:
((Mci, ri?,..., ((Mcn, Tn)			((M, 4
69
Since user functions operate on matrices over base domains rather than s-vals, the
extents here have no outer map. We interpret valid ground function types, those
containing only valid ground extent types, as the obvious function space including
bottom. A ground axis type has only ground maps and base types.
A substitution is a function taking map, adim, and base type variables to maps,
adims, and base types respectively. We will also apply substitutions to complete
maps and axis types, where we assume they return the map or type with variables
substituted, but otherwise unchanged. Thus the trivial substitution is the identity.
We interpret an extent type T with variables as the union of all interpreta-
tions induced by all substitutions taking T to a valid ground type. Likewise, if
T? T is a function type with variables, its interpretation is the space of all
continuous functions f (including I) such that for all substitutions 3, if Ek is in the
interpretation of (STk) for 1 < k < n, then (f Ei ... En) is in the interpretation of
(ST). We write V : T to assert that value V is in the interpretation ofT.
The relation extends to types and substitutions as well. For types T1 and T2,
we have T1 T2 if and only if for any substitution 5 taking these types to ground
we have (ST1) (ST2). It is easy to show by induction on the structure of axis
types that this extension is the same as one we might have given inductively on the
structure of axis types themselves:
Theorem 4.1 For axis types T1 and T2, T1 T2 if and only if one of the following
hold:
(a) T1 = ((MAl, Mci, Ti) and T2 = ((MA2, M02, T2), where MAl = MA2 and
Mci = Mc2 and r1 = T2,
(b) T1 and T2 are function types of the same arity, wherein corresponding extent
types are equivalent with respect to =--H. [3
Intuitively, this is good because we can focus on the equivalence of maps, presuming
70
that full axis types will take care of themselves in proofs of soundness and complete-
ness. Finally, two substitutions S and S' are equivalent with respect to if they
have the same domain, and for any element x in the domain, (S x) (S' x).
The point of developing the relation is to extend the normal concept of syntactic
substitution as follows:
Definition 4.2 Let S be a substitution. Then T' is a substitution instance of T
under S if (5 T) =--HT'. [rn]
Thus the substitution instances of any axis type are in general a superset of the
axis types we can get by simple syntactic substitution. For instance, if T is just
a variable x and 5 takes s to [./a1]?, then [./a1J? is a substitution instance of x
under 5, but so are all equivalent types with respect to =--H, e.g. [./ai] [o/a?J?.
This completes the discussion of the syntax and interpretation of axis types, and
we are ready to state the property of axis types that justifies all the effort:
Values from Wt (the type error values) are not in the interpretation of
any valid axis type.
We exploit this by seeking an algorithm to associate a valid axis type with each svexp
subterm in a program such that all possible values the subterm can take are in the
interpretation of its type. Then, whenever the algorithm succeeds, the property
above assures us that the program cannot produce a type error. This is exactly
the semantic soundness property of conventional typing rules. Unfortunately, there
exists no such algorithm in our case, because the language of axis maps is insufficient
to code the dimension constraints imposed by the semantic equations for Join and
Cond. However, we can achieve soundness in a weaker sense, wherein all possible
values a subterm can take are either in the interpretation of its type or are typerr2.
We take care of typerr2 values in the second inference pass, where general maps
replace axis maps.
71
4.3 Axis Set Inference Rules
Our interpretation of types taken together with the theorems of Chapter 3 give rise
to succinct axis set inference rules that lead to the desired algorithm.
4.3.1 A look at Typ
Consider the primitive "Typ c a". Equation (3.1) shows that the operand must have
inner axis c; it may or may not have outer axis a. From the definitions above, one
type that encodes all such s-vals (as well as I) is
T1 = (([d/a]m, [./c]n, b?
(4.2)
Here we adopt the convention that letters m,n,... are map variables, d, ..... are
adim variables, and b is a base type variable.
Now consider s-vals that can result from a Typ application. Theorem 3.8 shows
that inner axis c surely does not occur in the result and that outer axis a surely does.
Furthermore, the axis type of inner matrix elements in the result is the same as in
the input. Hence, in the context of (4.2), we can code all possible result extents with
T2 = (([./ajm, [o/c]n, b?
(4.3)
With this derivation, we have shown the following:
Theorem 4.3 For any substitution 5, if V : (ST1), then I?Typ c aj V: (ST2). ?
The reasoning behind (4.2) and (4.3), hence Theorem 4.3, includes semantic val-
ues. However, for compilation we need purely syntactic rules that assign a particular
axis type to each svexp subterm. Since subterms may include free variables, any
such rule must be given with respect to a type environment, denoted here with F.
This is just a map from svexp variables (including function variables) to axis types.2
2Do not confuse type environments with substitutions. The former "store" the
72
A term E that can be assigned axis type T by applying the rules with respect to
environment F is said to have axis type T in F, written F H E : T.3 Here is the rule
for Typ:
l'?E: (([d/a]m, [./cjn, b)
F F Typ caF: (([o/a]m, [o/c]n, b? TYP (4.4)
The top of the rule is its antecedent; the bottom is its consequent. We read it as
follows:
For all substitutions 5, if E has an axis type that is a substitution instance
of (( [d/a]m, [./c]n, b? under Sin F then Typ ca E has (in F) all types T
such that
T (5 (([./a]m, [o/c]n, b))
An essential quality of this rule is that if the semantic value of E is in the inter-
pretation of the axis type (5 (([d/a]m, [./c]n, b?) for some substitution 5, then the
semantic value of Typ Ca E is in the interpretation of (5 (([./a]m, [o/c]n, b)). This
property is of course semantic soundness for just this rule. Hence we should write it
carefully:
Theorem 4.4 (Semantic soundness ofTyp) Let ? and P be function and bound
variable environments and let F be a type environment wherein E has axis type T
and Typ c a E has type T' by rule TYP. Then
?[E??P:T implies ??ypcaE???:T'
types of svexp variables, allowing the typing rules given shortly to insist that all
occurences of the same svexp variable have the same type. On the other hand,
substitutions store bindings for type variables.
3We are invoking the slick (and occasionally abused) convention of the literature
of overloading the colon. To recap our usage, H E : T, where E is a svexp subterm,
means that E may be proven with type inference rules to have an axis type of the
form T. Conversely, V : T where V is a semantic value asserts that V is in the
interpretation of T
73
Table 4.1: Primitive typing rules of unqualified soundness.
FHE:(([d/a]m, [./c]n, b?
F ? Typ Ca E : (([./a]m, [o/c]n, b?
F H E : (([./a]m, [o/c]n, b?
F F Proj a cF : (([o/ajm, [./c]n, b?
F H E : ((m, [./c]n, b?
F ? Extr0 cF : ((m, [o/cjn, b?
F F E : ((m, [./cjn, b)
F F Extr1 cF : ((m, [./c]n, b?
F F F : ((m, [./c]?, Int?
FHRed+ cE:((m, $, Int?
FHE1:((mj,?,Int?,E2:((m?,?,Int?
F? + E1E2:((mjm2, Q), Int?
TYP
PROJ
EXTRO
EXTRi
RED+
BIN+
Pwoj Immediate from Theorem 4.3 and Equation 3.11, and the facts that I has all
types and typerr1 has none. EJ
4.3.2 Primitive typing rules of unqualified soundness
The procedure that led to a sound inference rule for Typ works as well for most of
the other svexp primitives. These are given in Table 4.1, which is titled unq?ta1ified
because the rules require no other considerations for semantic soundness. PROJ?
EXTRO, EXTRi, are consequences of (3.21), (3.22), and (3.23) respectively, as well
as Theorem 3.8. RFD+ is due to (3.24). It forces the input to be an inner vector of
integers along axis c with a map that is a single pair and Int for the argument base
type. The inner map for the output is empty, as the reduced output is inner-scalar.
74
Table 4.2: Axis typing rules of qualified soundness.
FHE1:((mi,[di/c]n,b),E2:((rn2,[d2/c]n,b?
l'HJoin cE1 E2 : ((mirn2, [./c]n, b?
F?B:((m?,?,Bool),T:((m?,n,b),F:((m?,n,b?
FFCondBTF:(m?m?m?, n, b)
JOIN
COND
4.3.3 Primitive axis typing rules of qualified soundness
The remaining primitives have semantics that restrict the inner dimensions of their
arguments. For "Join c", the inner dimensions of the arguments must match except
along axis c to satisfy conformable2 (Definition 3.9 and Equation (3.25)). For "Cond"
the true and false branch values must match exactly in their inner dimensions to
satisfy Q2 in Equation (3.30). We cannot code these constraints in the language
of axis types because they involve equality of dimensions that are not expressed at
all by axis types. As mentioned above, we solve this problem by deferring it to the
second pass. Otherwise, the axis typing rules for Join and Cond given in Table 4.2 are
derived in the same manner as TYP. The only important difference is that typerr2
is a possible result value. JOIN encodes the axis set constraints of conformabTh
given in Equation (3.25) and Theorem 3.10 by insisting that the inner axis sets of
both arguments and the result match except for axis c. The reader should recall that
axis c itself may properly be present or absent as the inputs are promoted. This is
reflected by pairing c with different adim variables in each input. On the other hand,
axis c is certainly present in the output, hence c is paired with 0 in the consequent.
The union of the inputs' outer axis sets in the result is coded by concatenating the
inputs' outer maps in the rule. Similar reasoning gives us COND from (3.30).
75
F H			: ((?, [./O] [o/1J$,Bk))			CONST
xo,o			xo,q?1
- Xp,o ...
4.3.4 Other program structures
F F v : (F v) (bound or function variable expression)
VAR
It is not hard to write rules for the rest of svexp that make sense in terms of our
understanding of user function application, environments, etc, There is, however, no
background like Theorem 3.8 to make soundness clear for these constructs. Happily,
they succumb to structural induction of the same sort used to prove soundness of
more conventional languages. Thus, the hard rules are done, but we are obliged to
be careful enough with the rest to achieve this next milestone:
Theorem 4.5 (Qualified semantic soundness of axis typing) If P is a svexp
program, then ? H P : T implies that either P?P? : T, or else ??P? = typerr2.
The remaining type rules are the axioms that form the leaves of derivation trees, given
in Table 4.3, and the rules for typing function variables and applications, Table 4.4.
The former are self-explanatory; the latter are straightforward, but we comment
briefly. ABS has a conjunction of requisites:
(a) The rule first requires that the function variables defined before fk in syntactic
order have (function) types F1 through ?k--H1 respectively.
Table 4.3: Type rules (axioms) for constants and variables.
(b) It next augments the type environment. It binds fk to Fk (with a generic tag
explained below) for 0 < k < i, and also fk to a "guessed" function type, and
76
finally fk'5 arguments to extents that match the function argument types, but
with empty outer axis sets. In this environment, E is required to have some
extent with an empty outer axis set.
The Fk in (a) must be proved with earlier applications of ABS. The empty outer axis
sets derive from Equation 3.15, where they correspond to the results of 5. Similarly,
the empty outer axis set of E derives from the check in Equation 3.32. The genernc
tag is a bit of additional type syntax that distinguishes calls to previously defined
functions from recursive calls. In the former case, we have already proved the final
type of the called function and so can freely instantiate its type by substitution
(rule INSI). In the latter, we cannot allow general instantiation without introducing
infinite types, and so have made a common restriction, allowing no instantiations at
all.
Formally, the type genernc(F) has the same interpretation as F. However, all its
variables are immune from substitution; i.e. for all substitutions 5, the interpretation
of (5 genernc(F)) is the same as that of genernc(P) and F.
The last necessity is a systematic treatment of environments. Values given by ?
and D depend on bound and function variable environments, and typing rules depend
on type environments. Intuitively, we are interested only in cases where variable and
type environments "match." Formally, we say P vespects F if for all variables x
bound in ?, x is also bound in F and (Px) : (Fx). With this, we prove the qualified
soundness of S and D.
Lemma 4.6 Let V' = ?[E'??p and suppose ? and,3 respect F. Then F F E' :
?mpl?es V': T' or else V' = typerr2.
Proof. By induction on the structure of E'. Possible cases are as follows.
E' is Typ c aE. Suppose 1' ? : T'. Since this can only be proven with typing rule
TYP, there must exist T such that F F E : T. From the hypothesis we then
77
Table 4.4: Axis type rules for general program structures.
? H fi : F1, .. ? H fk--Hi : Fk?i,
[genernc(F1 )/fi] ... [genernc(F??1 ) /fk--Hii
((nn, b?? H ((n, b?/fki
[((?,ni,bi)/xi] .. [((?,nn,bn)/xnJ H E :
?Hfk : ((fli,bi),...,((nn,bn)			(n,b? (for fk illfkXi...Xn =--HE)
FFf:generic(F)
F H f: F'
where F' is a substitution instance of F
F?f: ((n1,b1), .,((nn,bn? H
E1 : ((mi,ni,bi?,. ..,E? :
FFfE1...E?:((m1...m?,n,b)
?Ff1:F1,...,f?:F?,
[gener?c(Fi)/fi] .,, [generic(F?)/f?] FE:
Q)Fdeff?...fn...inE:((?mb)
ABS
INST
APP
PROG
78
have ?IE??P : T, and finally V': T' by Theorem 4.3.
E' is Proj acE, Extr? cE, Red+ cE, or + E1 E2. Follows from unqualified sound-
ness of these primitives and typing rules as in the former case. The equations
corresponding to (3.20) in the proof of Theorem 4.3 are (3.21), (3.22), (3.23),
(3.24), and (3.26) respectively. The typing rules corresponding to TYP are
PROJ? EXTRO and EXTRi, RED+, and BIN+ respectively.
E' is Join c E1 E2 or Cond EB ET EF. Proofs are similar to the former case, except
the semantic equation may also produce tuperr2, whereupon we have proven
the or else clause of the lemma. See equations (3.25) and (3.27) and typing
rules JOIN and COND.
E' is variable v. Then V': T', by one of three cases. If v is a bound variable, we have
= (Fv) from axiom VAR. Then V': T' by (3.13) and the fact p respects F.
Otherwise v is a function variable. Assume without loss of generality that T'
does not have a generic tag. Then T' was derived with VAR (the second case),
or else with VAR followed by INST (the third). In the second case, V' : r1?? from
(3.14) and because ? respects F. In the third case, (Fv) = generic(F) and T'
is a substitution instance of F. Here V' : T' as above with this additional fact
concerning types: If f : F, then f : (S F) for all substitutions s.
E' is f .... E?. From rule APP, we know there exists substitution S such that
T'=(S((mi...m?,mb?) and
F ? Ek : (S((m?,n?,b?)), 1< k <n and
F F f: (S((fli,bi?,...,((nn,bn)? ((n,b))
Thus ?[Eki?P : (S((m?,n?,b??) and ?[f??? : S(((fli,bi?,...,((nn,bn))
((n, b))) by hypothesis. From semantic equation (3.12) and Theorem 3.16 we
conclude V' : T' as desired.
79
This completes the cases. ?
Lemma 4.7 Suppose? H f? :T1,...,T? HT. Let [g?/f?] =D?jX1???x? =--H E?? as
in Equation (3.15), where ? respects F = [genernc(F1)/f1]... [genernc(F?)/f?] with
?Hf?:F?, 1<k<i. Then?Hfj:T1,...,TnHTimpliesgj:T1,...,T??T,or
else g? returns typerr2 on some input.
Proof. Let Mk, 1 < k ? n be matrices. By the interpretation of function types,
we must show that for all substitutions 5 and input matrices Mk such that Mk has
type (STk) for 1 ? k ? n we haveg?M1 ??M?: (ST). From the typing rule for f?,
we know
F[T1/xij... [Tn/xn][T1,...,Tn HT)/fj] FE :T
A simple induction proves that if F F E : T, then (SF) F E: (ST). Hence,
(SF[T1/x1]... [Tn/xn] [Ti,...,Tn T/fj]) FE: (ST)
Since F contains only generic types, it is not affected by substitution, and we have
F[(ST1)/x1]... [(STn)/xn] [(ST1,... ,T? H T)/fjj) FE: (ST)
Let V = StE]?[(ST1,...,Tn H T)/f?]I[Ai.M1/x1]... [Ai.Mn/xn]. We have by
Lemma 4.6 that for all substitutions 5, if Mk : (5 Mk), 1 <k <n, then (V 0): (ST)
or else V = typerr2. This yields the desired result by a conventional continuity
argument and the fact that if V : T and V' El V, then V': T. El
Proof of Theorem 4.5 for soundness of entire programs follows similarly from these
two lemmas and typing rule PROC. El
4.3.5 Automatic axis type derivations
We now have a set of rules that allow us to derive axis types for a program by
chaining together rule applications for each program substructure. Since some of
80
F= [(([./0J?, Int)/xj [(([./1]?, Int)/ [(([./0J?, Int), (( [./1]?, Int)H(( [./2]?, Int)/fJ
1' H x : (([o/9j?, [./0]?, Int)			F H : (([o/9]$, [./1j?, Int)
TYP			TYP
F ? Typ 09w : (([e/9]?, [o/0j?, Int) F F Typ 1 9y: (([./9j?, K/1]$, Int)
I?JAt?
F H t(Typ 0 9 w)(Typ 1 9 y) : ([?/91?, K/2]?, Int)
PRo?T
F F Proj 9 0( +(Typ 0 9 w)(Typ 1 9 ?)) : (([o/9]?, [./2]?, Int)
AB5
$ F f: (([./0J$,Int), (([./1]$,Int)			(([./2J$,Int)
Figure 4.1: Deriving an axis type.
the rules have conjunctive antecedents, chaining forms a dernvation tree. Consider a
derivation of the type of fin following definition:
fx ? Proj 92 (t (Typ 09w) (Typ 19
We derive $ F f: (([./0j$, Int), (( [./1]$, Int) H ([./2]$, Int) as shown in Figure 4.1.
The tree reads top to bottom as a proof of the type of f from axioms, constructed
by hypothesizing types for x and y and chaining forward to prove the type of f.
Alternatively, we can write the term to be typed at the bottom of the page and
apply rules backward by fitting a rule consequent to the syntax for the subterm to
be typed next.
The important question is this: How can a machine employ the rules to find an
axis type for any program P, deriving $ F P : T? Related tasks are very hard
a large community studies automatic theorem proving, where the space of possible
proofs is usually exponential or worse in the size of the theorem. In one sense, our
task is even harder. We would like not only to get some derivation, but to derive,
81
at every subterm, a most genenil or principal axis type, a type of which every other
valid type is a substitution instance. This makes the algorithm complet?there are
no type derivations under the rules that it will miss. Of course, we will also be
concerned for the syntactic soundness of the algorithm. When it returns a type, it
had better correspond to a derivation tree. Fortunately, our rules have a special
structure that makes derivations easy to come by. At most one rule applies to any
node in a proof tree; the derivation is precisely directed by svexp syntax. General
theorem provers do not have this luxury. The following sections are devoted to
constructing an efficient algorithm that exploits this structure to achieve sound and
complete type inference.
4.4 Axis Type Unification
In constructing the derivation tree by hand above, there is never any doubt about
the rule to apply next. The svexp syntax directs the derivation. The only ingenuity
required is, for each rule, to deduce a substitution that makes the rule antecedent
equivalent with respect to to a previously derived type. As long as we can find
such a substitution in each case, the derivation succeeds.
This section concerns a subroutine that replaces our ingenuity, finding the re-
quired substitution whenever one exists. There are two phases. In the first, we
construct the entire proof tree from syntax without regard to substitutions, noting
for each rule application where must hold for the derivation to succeed. These
notes form a set of axis type equations. Type inference is thus reduced to solving
such a set of equations, which is the purpose of the second phase. This division
of labor simplifies the soundness and completeness proofs. It is adapted from the
technique of fWanS7J.
Specifically, we say Q is a set of type equations if it is a set composed of elements
of the form Lj Rj, where Lj and Rj are axis types. We say S ? Q iff S is a
82
substitution, Q is a set of type equations, and for all equations Lj Rj in Q, we
have (5 Li) (5 Ri). In other words, 5 takes both sides of each equation to types
with the same interpretation. This is a generalization of the classical unification
of [Rob65], which is adequate for inferring types in ML.
Definition 4.8 Given substitutions 5 and 5', 5 is more general than 5', written
5 ? 5' if' there e?sts a third substitution T such that 5' T o 5. A substitution 5
is a unifier for the set of type equations Q if' 5 ? Q. It is a most general unifier if'
for all other substitutions 5' such that 5' ? Q, we have 5 ? 5'. ?
The usual purely syntactic definition of ? is inadequate here because many axis types
have the same interpretation. For example, consider the solution of Q fm =? ??
One most general unifier is the singleton map [?/mJ. However, [[o/O]?/mJ serves
just as well because the interpretation of m after one or the other unifier is applied is
the same. The essence of a most general unifier, then, is that it provides the largest
possible interpretation of all the axis types appearing in the equation set. This
interpretation must be unique; otherwise there is no most general unifier. However,
as in the case just shown, the unifler that induces it is usually not unique.
The skeleton of the unification algorithm is given in Figure 4.2. It processes
equations by cases based on their structure as described below. Processing builds up
the substitution 5, which eventually is returned as the answer. An equation that is
eligible for processing is called active. Equations can have several histories; they can
be:
o+ Processed once and discarded from Q.
o+ Processed once and replaced in Q with fresh, active equations.
o+ Modified and placed (inactive) back in Q, with a chance that some future
processing may cause reactivation.
83
Algorithm: Axis type unification
Input: A set Qo of axis type equations.
Outputs: Either the special value fail (no unifier exists) or a most general unifier for
Method: Let S be the identity substitution;
Let Q = Qo;
While there is an active equation e = (Li ffi Ri) in Q
finvarnanti
Remove e from Q;
Process e as described below;
(possibly adding to Q, extending S, or returning fail)
f Progress criterionJ
End while;
finvarnantj
Return S.
Figure 4.2: The axis type unification algorithm.
84
The following help to show correctness and termination:
Invariant. (a) For any substitution 5' such that 5' ? Qo, we have 5 ? 5'. In
particular, we can exhibit a unifier U of Q such that U o 5 5'. (b) For any
unifier U of Q, we have U o 5 ? Qo.
Progress criterion. Make a lexicographic count of three quantities in the following
priority order (highest first):
o+ Occurrences of map variables in equations active since the beginning, the
aiwaus-active equations.
o+ The number of distinct dimension variables in Q.
o+ The number of other symbols in active equations.
The criterion itself is that the count decreases at each iteration.
The invariant states (a) that 5 is more general than any unifier of Qo--Hincluding
the most general one and (b) that 5 monotonically approaches a unifier of Qo as Q
gets less active. The progress criterion ensures Q must finally be entirely inactive.
At this point, we shall see, any substitution unifies Q, including the identity. Thus
(b) becomes 5 # Qo; i.e., 5 is a unifier, and (a) makes it most general as desired.
Moreover, the contrapositive reading of (a) ensures that if there is no unifier for Q,
then there is no unifier for Qo, and fail is the correct answer.
It remains to give the processing cases for equations e and show that the invariant
and progress criterion are satisfied at each iteration. Initially, U = 5' satisfies in-
variant clause (a), and (b) holds trivially. To ease presentation, the cases are written
presuming that earlier ones do not apply.
85
Function types. If either Lj or Rj is a function type and the other is not, there
is clearly no unifier for Q, so return fail.4 Otherwise1 if Lj = ..... ., E? E and
Rj = ..... . Ent E' (i.e., both are function types), then add to Q the equations
E1 =? E1t,. ., E? =? Ent, E A
We now argue that invariant (a) still holds. For any 5', let U be the unifier
that satisfied the invariant previous to this processing step. Then U ? Q even after
processing by Theorem 4.1. Thus U o 5 = St because we haven't changed 5.
For invariant (b), suppose U unifies the new equations. Then it also unified the
old ones by Theorem 4.1. It follows that since we previously had Uo 5 # Qo, and 5
has not changed, we still have it.
This argument is simple, but representative of other cases to follow where the
interpretation of types guides us directly to a unifying step that does not modify 5.
As such, we refer to it as the constant 5 argument.
Progress is assured because a ??H1, is eliminated from the symbols in the active
equations of Q, and the other parts of the lexicographic count are unchanged.
Extent types. If either Lj or Rj is an extent type and the other is not, there
is clearly no unifier for Q, so return faiL Otherwise, if Lj = ((MA, Mc, T) and
Rj = ((MA', Mc', T?? (i.e., both are extent types), then add to Q equations MA =? MA',
Mc = Mc', and ?? ? Similarly for the short extents due to function types; just
= T.
omit MA and MA'.
The invariant is preserved by the constant 5 argument. Progress is assured be-
cause symbols "(( )" are eliminated from the active equations of Q.
Base types. If both Lj and Rj are the same ground type, we are done. If they
are different ground types return fail. Otherwise one is a variable b and the other is
4Recall function arity and argument counts are matched by the syntax of svexp,
so this mode of failure cannot occur, but we could easily have enforced it here instead.
86
some base type T (ground or variable), Replace 5 with 5 [T/b], and replace b with T
throughout Q.
If we have matching ground base types, then the constant 5 argument holds.
Otherwise, a variable b was encountered:
To see invariant clause (a), suppose Qiast is Q and 5last is 5 prior to this processing
step. Then, by assumption, for any 5' unifying E0 there is a unifier Utast for Qiast,
where Utast 05last = 5'. Since b appears in Qlast, and base type variables are replaced
in Q as soon as they are added to 5, we know (5last b) = b. Hence, we must have
(Ujast b) = T			(4.5)
or otherwise, Utast could not unify Etast. We conclude Utast ? 5last = Utast 0 5 Since
Utast also unifies E, it has all the desired properties of U in the invariant.
For invariant clause (b), suppose U ? Q. Then we must have
U [T/b] ? Qiast			(4.6)
Then, by assumption, U [T/b] ? 5last # Qo. This implies ? ? 5last [T/b] # Qo, but
5last [T/b] is 5.
This is the prototypical augmented 5 avgumenL To employ it elsewhere, we need
only give replacements for Equations 4.5 and 4.6 based on the interpretation of the
types Lj and Rj at hand.
Progress occurs here because an active equation is deleted from Q, and substitu-
tion for base type variables does not change the number of symbols.
If none of the cases thus far apply, then both Lj and Rj are maps.
Adim unification. Unifying maps often involves getting two adims to have the same
interpretation. For this, we define a simple subroutine for unification of adims D
and D' as follows: If either D or D`is a variable d and the other is some adim A
(variable or ground). Replace 5 with 5 [A/d] and replace d with A throughout Q.
87
If both D and D' are ground and match, we are done. If they do not match, return
fail for the top level unification of Qo.
This procedure maintains the invariant provided that (5 D) (5 D') is neces-
sary to unify the maps in which D and D' lie. Specifically, if no adim variables are
involved, then the invariant is trivially true because nothing has changed. If a sub-
stitution is performed, then the augmented 5 argument applies with Equations 4.5
and 4.6 replaced respectively by (Uiast d) A and U [A/d] # Qiast.
Redundant maps. If either Lj or Rj has two pairs [D/a] and [D'/a] (i.e., two
pairs in one map have the same axis number), delete the second pair from e and
return it to Q. Unify D and D'.
Clearly, the type with the redundant pairs has an interpretation as an axis set only
if we can achieve (5 D) (5 D'). This is established correctly by adim unification as
shown above. Moreover, we are safe in deleting one of the redundant pairs because
it does not change the interpretation of the affected type.
To check progress, note that if no variables are substituted, then the elimination
of the map pair is sufficient. If an adim variable d was substituted, however, the
count of symbols in always-active equations of Q may have increased. This is where
the lexicographic order of the progress criterion is important, for since d is now gone
from Q, growth in the number of symbols is overbalanced.
No extensible maps. If neither Lj nor Rj has map variables, then consider all the
axis numbers that appear therein, and partition them into three sets L, R, and B,
namely, those that occur only in Lj, those that occur only in Rj, and those that occur
in both. Unify the adims of all pairs with axis numbers in L with 0. Do the same
for all pairs with axis numbers in R. Then consider each axis number a ? B and the
associated pairs [D/aJ from Lj and [D'/aJ from Rj. Unify D and D'.
88
When this case is considered, there are no redundant map pairs in Lj or Rj, 50
the applications of adim unification are weWdefined. Then by the interpretation of
maps, we have that (S Li) =--H (S Ri) if and only if their adims are =--H for axes that
appear in pairs of both, and the adims of other pairs are 0. These adim relations are
established correctly by adim unification,
Two extensible maps, two map variables. If Lj and Rj each have one map
variable, say m and n # n) respectively, then form sets L, R, and B as above,
and unify adim pairs for axes in B just as in the previous case. Next, let p be a fresh
map variable, and let T = [Di/a1]... [Dm/am] be the pairs associated with the axes
in L from Lj and R = [D'1/a'1]... [D?'/a'?] be the pairs associated with the axes in
R from Rj. Then replace S with
5 [Rp/m] [Tp/n]
and make associated replacements for m and n in Q as well as e. Finally, put e back
in Q.
When this case is considered, there are again no redundant map pairs in Lj or Rj,
so adim unification on pairs with axis numbers in B is well-defined and is necessary
to achieve (5 Li) =--H (5 Ri). Thus the invariant is preserved to the point where adim
unification is complete. Let L'i and R? be Lj and Rj after the replacements are done
in adim unification, and let B be the pairs associated with the axes in B from, say,
Lt' (those in R? are identical except possibly for their order).
It remains to show that the chosen substitutions for m and n also preserve the
invariant. For this, we give names to the interpretations of various sets of pairs:
L? is the set of axes, a, that definitely are in the interpretation of L1i due to the
occurrence of [./a] in L. Conversely, L0 is the set of axis numbers definitely not in
the interpretation of L'i due to the occurrence of [o/aJ. R? and R0 are similar sets
for R and R'i Finally, R0 and B0 are the sets of axis numbers sure to be present and
89
absent respectively in both L'j and R'j due to ? and 0 pairs in B. Adim unification
ensures that B? and B0 are well-defined. We can now write the interpretations of L'j
and R'j explicitly as
L= ?A I A=L?uB'uA' where A' c (N--HL0 --HB0)? and
R= fA A=R?uB?uA' where A' c (N-R0-B0)?
i.e., the set of all sets of natural numbers that include at least the axes that must
be present (due to o+-s in the types being unified) and exclude the ones that must
be absent (due to 0-5 in the types being unified). Reicuing to the augmented s
argument, we see that for any 5", the interpretations of Utast Lj and Ujast Rj are
necessarily both the same subset of
L n =			I = R? u L0 U B? U I' where I' c (N --H L0 --H R0 --H B0)J			(4.7)
Otherwise, Utast could not unify Ejast. Then observe that the substitutions for m
and n in the processing step yield a permutation of
BLRp
for both sides of the transformed e. The interpretation of this type is Equation 4.7
as desired, with one caveat. If a later substitution for p adds some pair with an
axis in L, R, or B, then the corresponding adims must be unified; otherwise these
types are invalid. Fortunately, such substitutions merely introduce new redundant
pairs. The desired interpretation is assured by putting the transformed e back in Q,
where it may become active once more due to substitution for p. Thus part (a) of
the augmented 5' argument is proven. Similar reasoning yields
U [?p/m] [Tp/n] # Qiast
from which part (b) follows.
Regarding progress, it is easy to show that e must always have been active prior
to this processing step and that the equation placed in Q by the step is not active.
90
Thus the occurrences of m and n have been removed from the count. Moreover, new
occurrences of p in always-active equations each correspond to a disappearance of ni
or 71, so they do not affect the count.
Two extensible maps, one map variable. If both Lj and Ri have the same
map variable m (and no other) and at least one pair in Lj is different from every pair
in Rj or vice versa, then form sets L, R, and B as above. Also, let p be a fresh map
variable and T and R as above. Then replace S with
S [LRp/m]
and make associated replacements for m in Q. Make the same replacement in e, but
note this duplicates the pairs in T on the left and R on the right. Thus, delete one
of each duplication, and put the result back in Q.
The invariant is maintained in this step by a slight modification of the argument
above.
To see progress, note that the equation replaced in Q is inactive after the step.
Since e was always-active before the step, an occurrence of m in an always-active
equation is lost, and the count decreases.
One extensible map. If either Lj or Rj has a single map variable m and the other
has none, then assume without loss of generality that Lj has m (otherwise we can
swap Lj and Ri). Form sets L, R, and B as above, and, unify adim pairs for axes
in B just as ill the case with no extensible map. Say [Di/a1J... [Dm/am] are the
pairs associated with the axes in L from Lj and [D1,Ia'1]... [D?'/a'?] are the pairs
associated with the axes in R from Rj. Then unify each of ..... . D? with a and
replace S with
s [ [D1,/a'1] ... [Dnt/an?]Im]
and make associated replacements for m in Q.
91
The invariant is maintained by a conjunction of the arguments for ground maps
and two extensible map types. We shall not repeat them. Moreover, a transformed
e need not be placed in Q, because it contains no map variables that might be
substituted with redundant pairs in the future.
Progress occurs because e was active and its occurrence of m is now gone from
This completes the cases. There are two forms of equations not covered and that
therefore might be left inactive when the algorithm terminates:
(a) Both sides contain the same non-redundant pairs and the same, single map
variable.
(b) At least one side has more than one map variable.
Recall that for the algorithm to return a most general unifier as desired, we need
U ? Q for all substitutions U, where Q is the remaining set of inactive equations.
This is certainly true of type (a) equations, but not for type (b). Fortunately, we can
show that type (b) equations never remain in the particular sets we will generate.
4.5 Generating Axis Type Equations
The first phase of axis typing constructs a set of type equations that has a unifier if
and only if program ? has a type. A principal type of P itself and all its subterms
are computed as a side-effect.
The algorithm manipulates yet one more kind of syntactic object typings. A
typing is a triple: (F, E, T) where F is a type environment, E is a svexp subterm,
and T is an ax:is type. It represents an assertion that E has some type that is a
substitution instance of T in F, if it has any type at all. Figure 4.3 is the algorithm
to construct equations for a single svexp function. It maintains a set Y of typings
initialized with a single one that asserts the type of the entire function. In the
92
Algorithm: Axis type equation generation.
Inputs: A svexp function definition D = (fk ..... =--H
Outputs: A set Qo of axis type equations.
Method: Set Qo =
Set Y =
where F = [generz'c(F1)/f1] . [genernc(F??1)/f??1]
ni, bi?/x1]... [((?? fl?, bn?/xn]
[((ni,bi),.. ,(nn,bn) H ((n,b)/fj;
While Y ?
fInvar?ant.i
Remove typing a (F, E, T) from Y;
Execute the unique rule corresponding to the form of E
End while;
Return Qo
Figure 4.3: Axis type equation generation algoritlim.
example of Figure 4.1, this is like writing the root of the tree at the bottom of the
page. The algorithm then removes a typing from Y and uses the form of its subterm
part to pick an actzon rule, which implements the typing rule for the subterm. This
has two effects; it adds an equation to Qo, and it adds zero or more typings to Y.
The equations are the "notes" we referred to in Section 4.4 regarding =--H relationships
that must hold to derive a type. Typings in Y represent the current fringe of the
derivation tree. Eventually, Y consists entirely of typings that select the action rules
for type axioms. The axiom rules add no new typings to Y, so the algorithm soon
terminates. The typing rules are given in Figure 4.4.
4.5.1 Syntactic soundness and completeness
The goals of our careful development of axis type inference, namely soundness and
completeness, can now be written as conditions on the equation set output of the
algorithm:
93
(F, Typ c aE, T) ? T =? (([./a]m, [o/c]n, b) (F, E, (( [d/a]m, [./c]n, b?)
(F, Proj acE, T) ? T (([o/a]rn, [./cjn, b? (F, E, (( [./a]m, [o/c]n, b?)
(F, Extr0 cE, T) ? T A ((m, K/c]n,b? (F,E, (m, [./c]n,b?)
(F, Extr1 cE, T) ? T A ((m, [./c]n,b? (F,E,((m, [./c]n,b))
(F, Red t cE, T) ? T = ((m,?,Int? (F,E, ((rn, [./c]?,Int))
(F, +E1E2, T) ?
(F, JoincE1E2, T) ?
T = ((mi m2, ?, Int)
(F, E1, ((mi, ?, Int))
T =` ((m1m2, [./c]n,b)
(F, E1, ((mi, [di/cin, b))
(F, E2, (m?, ?, Int))
(F, E2, ((m?, [d2/c]n, b))
(F, CondBETEF, T) ? T A ((m?m?m?,n,b)
(F, B, ((m?, ?, Bool))
(F, ET, ((m?, n, b)) (F, EF, (m?, n, b))
(F,x,T) ? TA(Vx)
- XO,O Xo,q?1 -
Xp?1,o Xp?i,q?1
T A (??, [.10] [./1],Int)
(F, fE1???En, T) ? TA((mi...m?,n,b)
(V,E1,((m1,m1,b1)) ..
(F,j;((n1,b1),... ,((nn, bn) H ((n, b))
(F, f, T) ? if (F? = genernc(F) then
T ? ?t here F' is F with all variables
replaced by fresh ones
else
TA(Ff)
Figure 4.4: Function body action rules for axis typing.
94
Theorem 4.9 When the algorithm of Figure 4.3 terminates with inputs as given,
the following hold:
Soundness. For any substitution U, if U ? Qo, then
Ff: (U(m1,b1?... ((m?,b??
Completeness. For any axis type T such that $ K f: T, there exists a substitution U
such that U ? Qo and (U(m1,b1?... ((m?, b?? H ((m, b?) T
That is, every unifier of Qo yields a type that f actually has, and every type that
f has may be obtained from some unifier for Qo. These are sufficient because, as
already shown, pass 2 will compute a single unifier more general than all others, or
fail if no unifier exists.
Proof. The approach is straightforward and similar to the previous section's analysis
of unification we weaken the conditions above to achieve a loop invariant that may
be proven true both initially and after each iteration. Intuitively, the set of typings
represents work remaining do be done, and so is used in the invariant to account for
equations not yet generated. Hence we say of substitution U that U ? Y iff for each
typing (F, E, T)e Y, we have (UoF) H E: (U T). I.e. U is a witness that each typing
represents at least one axis type derivation for its subterm part. Then we have:
Invariant. The following both hold:
Soundness. For any substitution U, if U ? Qo and U ? Y then
H f:(U((m1,b1?... ((m?, b?? ((m, b?).
Completeness. For any axis type T such that ? F f: T, there exists a substi-
tution U such that U ? Qo and U ? Y and (U((m1,b1)?... ((m?,b??
T.
95
Progress criterion. Each action rule adds to Y only typings with smaller subterm
parts than the typing that causes it to be selected.
The progress criterion, easily verified by checking the action rules, ensures that the
typing set Y becomes empty after finitely many iterations. At this point the invariant
is a proof of Theorem 4.9 as desired.
The invariant holds trivially on entry of the algorithm loop. It is preserved by
iterations as shown in the next section. This completes the proof. E]
4.5.2 Soundness and completeness cases
There is one case for each action rule in Figure 4.4. Assume for each that the invariant
was true in the last iteration, where Yiast and Qotast were the typings and equation
set respectively and that the current iteration executes the action rule in boldface,
yielding Y and Qo.
Typ. To show soundness still holds, let U be a substitution such that U ? Qo and
U ? Y. We wish to show U ? Qoiast and U ? Yjast, for the desired result then
follows by hypothesis. U ? Qotast because Qo includes all the equations of Qotast It
remains to show U ? ?iast The only typing in Yiast and not in Y is that on the left
side of the rule. Thus we must prove (U oF) H Typ caF: (UT). For this, we have
0 F) F E : (U (( [d/ajm, [./c]n, b))
(4.8)
due to the typing just added to Y and our hypothesis U ? Y. With (4.8) as
antecedent, typing rule TYP yields
Finally, observe
0 F) F Typ c aE : (U (( [./a]m, [o/c]n, b)))
(4.9)
(UT) (U(([o/a]m, [o/cjn,b))
96
due to the equation just added to Qo, which with (4.9) yields the desired result.
Completeness is also preserved as follows. The inductive assumption implies that
for any T such that ? F f : T, there must be Ujast such that Ujast ? Qoiast and
Ujast ? Yiast We must exhibit a U such that such that U ? Qo and U ? Y and
(U((mi,b1)?... ((m?, bn))			((m, b))) =--H T
For this, note (F,TypcaE,T)E Y?ast, 50 (UiastF) H TypcaE : (UiastT). This
can only follow from typing with rule TYP involving a substitution S such that
(S (( [d/a]m, [./c]n, b)) =--H (Ujast T) and also (Ujast F) H E : (S (( [d/a]m, [./c]n, b))).
Let U = S 0 Uiast; then it is easy to show U has the desired properties. E]
Proj, Extr, Redt, +, Join, Cond, and User Function Application. Proofs
are similar to Typ using the respective typing rules.
Bound variables and constants. For soundness, we again must show U ? Qojast
and U ? Yiast. The former holds because Qo includes all the equations of Qotast, and
U ? Qo by hypothesis. It remains to show (U 0 F) H x : (UT), because the only
typing in ?ast that is not in Y is (F,x,T). We know (UT) =--H U(Fx) by the new
equation in Qo and the assumption U ? Qo. Then from typing rule VAR we have
UoFFx: (UoF)x ? UoFFx: U(Fx) ? Uol' Fx : (UT) (4.10)
For completeness, we assume the existence of Utast and must exhibit U as for Typ
above. Observe that Qojast differs from Qo only in the added equation T =? (F x).
Also, (F, x, T)? Yjast, so (Uiast F) H x : (U?t T). Since x can only be typed by
applying rule VAR, there exists substitution S such that
It suffices to let U = s o Ujast.
Constants are similar.
S(UtastT) = S(Uja5tF)x
97
Function variables. If (F f) has no generic tag, then the proof is as for bound
variables. Arguments for the case (F f) = generic(F) follow.
Soundness is the same as for bound variables, with the reasoning in (4.10) replaced
by the following, which uses typing rules VAR and INST and the definition of generic
on page 76.
UoF H f: (UoF)f ? UoF H f: (Ugeneric(F))
? U 0 F ? f: generic(F)
? UoFHf:F'
where F1is any substitution instance of F. We choose F' to be F with all variables
replaced by fresh ones. In particular, these variables are not in the domain of U.
Hence
U 0FF f:
We have (U F') = (U T) from the new equation in Qo and the hypothesis, so
U oFF f : (UT)
and U ? ??ast
For completeness, we assume the existence of ?ast and must again exhibit U.
Observe that Qotast differs from Qo only in the added equation T 4 (F f). Note also
that (F, f, T)? ?iast, 5? (Uiast F) F f: (Ujast T). Since f can by typed only by rules
VAR and JNST, there exists substitution S such that
S(UiastT) =--H SF'
where F' is F with all variables replaced by fresh ones. Let U = 5 0 Uiast.
4.6 Dimension Inference
The second type inference pass refines axis types to a new form that fully represents
valid extents. Axis types already encode axis sets and the names of base domains. It
98
remains to encode dimension functions. Consider these in the set-theoretic sense as
finite sets of pairs, one pair per axis set element with each pair containing an axis
number and a size. There is an immediate correspondence between these pairs and
axis type pairs containing o. A pair with 0 encodes a mapping from the axis number
it contains to some size in N, though we do not yet know what that size is. The job
of the second pass is thus to replace ? with a representation of size.
4.6.1 Dimensions
We require a language to encode sizes. Since a single svexp function will often be
applied to different-sized operands, numerals alone will not do. We need to encode
arithmetic constraints on dimension function range values, much as maps with adim
and map variables encode constraints on axis sets. This leads us to the simple
arithmetic expressions with which the closure theorems 3.8 and 3.12 are written.
The following syntax is adequately expressive:
(dim) ::= 0 112...
I (dim) + (dim) I (dim) --H (dim) min 1(dim)+) (4.11)
(dim varnabTh) (dependent-dim)( (dim)* ) 0
Thus (dim) includes numerals and arithmetic with the conventional interpretation
over N. In particular, a b denotes 0 if a < b and a --H b otherwise as in Theorem 3.8.
Dimension variables are written d or dk or djk by convention. A dependent dimension
denotes application of an unspecified function Dk : N... x N H N to dimensions.
Finally, there is the familiar 0, which denotes absent axes as for axis types. Dimension
variables cannot take this value.
Replacing (adim) in Figure 4.1 with (dim) yields full-fledged t?pes. The interpre-
tation of types specializes the axis type interpretation given in Section 4.2. A ground
map M now denotes an axis set--Hdimension function pair. The axis set is
A = fa [D/a] occurs in M and D $ o?.
99
The dimension function 0 is given for all a ? A by
= D if [D/a] occurs in M
The characterization of =--H for full types includes equality of dimensions in the obvious
sense; e.g.
[1 + 2/a] [o/a?i? = [o/ai] [3/a]?
A dimension is a gwund dirnension if it contains no dimension variables or dependent
dimensions. Interpretation of ground and non-ground types is then as for axis types.
4.6.2 Inference rules
The reader can easily verify that soundness of the type inference rules for svexp
primitives in Table 4.5 follow from Theorems 3.8 and 3.12 and the semantic equations
as the axis typing rules did, except soundness is now unqualified. That is, when
operands to Join and Cond are proper antecedents to one of the rules dJOINw or of
dCOND, then the result cannot be typerr2 (or typerr1).
Table 4.5 includes two rules for Typ and four for Join. These treat separately the
"diagonal" case of Typ (Example 3.5) and the possible promotions of Join arguments
(Section 2.3.5). At first glance, this presents a problem for the second inference pass,
which builds proofs just as the first pass (see Figure 4.1). On reaching a Typ or Join,
with which rule will the proof succeed? The inference algorithm consults the axis
types of arguments for the answer.
Constants and variable typing rules are given in Table 4.6. Rule CONST for
constant svals of any base type Bk is straightforward, but the separate rules for
bound and function variables merit discussion. Indeed, we could use B VAR for all
variables and achieve soundness, but many useful programs would then be untypable.
Example 4.10 Consider the following code to reverse components along axis 0 of
the input.
loo
Table 4.5: Primitive typing rules.
FFE : (([o/a]m, [d/c]n, b?
FHTypcaE:(([d/a]m, [o/c]n, b)
F F E : (([di/a]m, [d2/c]n, b?
F ?TypcaE: (([min fdi,d2J/ajm, [o/c]n, b?
F H E : (([d/a]m, [o/c]m, b?
FFProj acE: (([o/a]m, [d/c]n, b?
F ? E: ((m, [d/c]n, b?
F F Extr0 c E : ((m, [o/cjn, b?
F ? E: ((m, [d/c]n, b)
F F Extr1 c E: ((m, [d--Hl/c]n, b)
V F E: ((m, [d/c]?, Int?
FFRed+ cE:((m, ?, Int?
VFE1:((mi,?,Int?,E2:((rn2,?,Int)
FH t EiE2:((m1m2, ?, Int)
VHE1:((mi,[di/c]n,b),E2:(m?,[d2/c]n,b)
V F Join c E1 E2 : ((m1 m2, [di+d2/c]n, b)
VHE1:((mi,[di/c]n,b),E2:((m?,[o/c]n,b)
V F Join c E1 E2 : ((m1 m2, [ditl/cjn, b)
V?E1:(mi,[o/c]n,b),E2:(m?,[d2/c]n,b)
V H Join c E1 E2 : (mi m2, [ltd2/c]n, b)
VHE1:(mi,[o/c]n,b),E2:(m?,[o/cjn,b)
V?JoincEiE2:(niim2, [2/c]n, b)
dTYPa
dTYPb
dPROJ
dEXTRO
dEXTRi
dRED+
dBIN+
dJOINa
dJOINb
dJOINc
dJOINd
101
rev x = if (NumOf x 0 1) then x
else Join 0 (rev Extr01 0 x) (Extr00 0 x)
We "guess" the assumption that rev acts on d-vectors, i.e.
F = [(([d/0]?, b)/x] [(([d/0]?, b? ([d/0]?, b?/rev]
so that we can prove rev takes d-vectors to d-vectors, i.e.
rev : ([d/0J?, b?			(([d/0]?, b? as defined above.
But the proof succeeds only if we can establish, as an intermediate step, that the
recursive binding of rev has type
H
E
Thus we must be free to instantiate dimensions of the types of even non-generzc
function variables. Let S be any substitution whose domain includes no free map or
base type variables of T. Then (S T) is a dimension substitution instance of T. Rule
FVAR allows such instantiation of function types. The remainder of typing rules
are as for a?s types.
Theorem 4.11 (Semantic soundness of typing) Let P be an svexp program.
Then ? H P : T implies ??P? : T.
Proof. Almost identical to that for Theorem 4.5. The only significant difference is
that typerr2 is no longer a possible value for a term with a type because the rules
dJOINx and dCOND are sound without qualification. El
4.6.3 Type unification
Axis type unification (Figure 4.2) needs only a small modification to unify general
types. In addition to the substitution S, the modification constructs a set Z of
102
FH
Table 4.6: Constant and variable typing rules.
Xo,o			Xo,q?1
- Xp,o o+.
((?, [PlO] [q/1]?,B??
F H x: (F x) (bound variable expression)
F H f : F' where F' is a dimension substitution instance of (F f)
dCONST
dBVAR
dFVAR
dimension equality constraints, which are just assertions of equality over N, writ-
ten D1 = D2. We replace the adim unification "subroutine" on page 86 with the
following:
Dimension unification. Unify dimensions D and D' as follows: If D = D' = 0, we
are done. If either D or D' is a and the other is not, return fail for the top level
unification of Qo. Otherwise add D D' t Z
What of soundness and completeness of the modified algorithm? We are con-
cerned only about a weak form of these. The algorithm is sound in the sense that
if T is any substitution for dimension variables such that T ? Z, then T a 5 ? Qo.
It is complete in the sense that, for any 5' binding only map variables and T binding
only dimension variables, if 5' a T ? Qo, then there exists another substitition U
such that 5 a U =--H 5'. Intuitively, we are merely assuming the existence of solutions
for Z. This is reasonable because the assertions in Z can be verified by "checking
code" in the compiled program.5 Of course, we could say the same about Qo; the
whole point of type inference is to eliminate checking code. Alas, though much can
5The exception is dependent dimensions, which are discussed below.
103
be done to reduce Z so that checking code is minimal, some checks will be required.6
We can easily verify the weak forms of soundness and completeness by proving the
following for each iteration as in Section 4.4.
Invariant. Let T be a substitution with domain restricted to the free (dimension)
variables of Z such that T ?
Soundness. For any substitution 5' such that S' ? Qo, we have 5 ? S'. In particular,
we can exhibit a unifier U of Q such that U 0 5 0 T 5'.
Completeness. For any substitution U, if U ? Q, then U 0 5 0 T ? Qo.
4.6.4 Most general function types
We next consider how to implement the liberal dimension instantiation allowed by
rule FVAR for a function variable reference f. In the first type inference pass dis-
cussed in Sections 4.2 through 4.5, we instantiate generic function axis types by
replacing variables with fresh ones. This is still fine for generic types in the second
pass, those at non-recursive function calls. We merely replace dimension variables as
well as map and base type variables.
Things are more difficult if f is a recursive binding. In the first pass, we merely
assume (i.e. add to F)
f :((ni,b1)?,... , ((nn, bn)) H ((n, b))
(4.12)
by rule ABS in Table 4.4, where the important point is that the type is non-generic.
Thus, when inference is complete, all arguments of recursive calls to f have axis
types equivalent with respect to to the corresponding formal parameter type of f.
6
svexp as given actually requires no checking code, Because its only "data" are
constants, all dimensions can be inferred exactly if we make dimension constraints
part of function types and apply constraints symbolically to arguments at each func-
tion call. This is a desirable extension of the algorithm presented here. llowever, the
addition of a simple matrix input primitive to svexp makes checking code necessary.
104
Whlle this is fine for axis types, it is too strong for the current purpose. That is,
if f's formal parameter is an inner column vector of length d, the axis typing rules
for the first typing pass properly insist that arguments of all recursive calls to f are
also column vectors, but it would be over-zealous for the second pass to insist that
arguments of all recursive calls to f have length d. For instance, the recursive call in
(4.12) would be a type error in this case.
A remedy analogous to the way substitution of fresh variables is used to instan-
tiate generic types is to replace only dimension variables in the assumed type of f
(the guess in Example 4.10) with fresh ones at each recursive call. However, this
requires a prescient guess for the assumption. The type in (4.12), which made a fine
assumption for the type of f in pass one is no help; it has no dimension variables to
replace! Indeed, it seems we need to replace dimension variables in the very type we
are trying to infer.
This nasty circularity is another motive for two pass type inference, for a principal
axis type of f is sufficient to unravel it.
Definition 4.12 (Type assumption for f) Let ((M1,T1),.. ., (Mn, Tn) ((M, T)
be a principal axis type of f. For each axis map Mk, let M?dim be a dimension map
the same as Mk, except replace each pair [./cj with [dkc/C]. Let D be the dimension
variables thus introduced arranged in a canonical order and let Mdep be M with each
pair [./c] replaced by a pair with a dependent dimension [Dc(D)/c]. Then the type
(M??,Ti),...,(Mndim,Tn) H (MdeP,?)
is a type assumption for f. El
Recall that Dc denotes an unspecified function over N. Hence, this type is at least
as general as any type of f. Without affecting the power of the type system, we can
105
Algorithm: Type equation generation.
Inputs: A svexp function definition D = (fk .... . =--H
Outputs: A set Qo of type equations.
Method: Set Qo =
Set Y = f(F, E, ((?, Mdep, b?) ?
where F = [generic(F1)/f1].. [genernc(F??1)/f??1]
[((?, M1dim, T1?/X1] . [((?, M$?, m)/xn]
[((M1dim, T1?,..., ((M?d?? Tn)) ((MdeP, ?)/f];
(see Definition 4.12)
While Y #
? Invariant. ?
Remove typing a (F, E, T) from Y;
Execute the unique rule corresponding to the form of E
End while;
? Invariant. ?
Return Qo.
Figure 4.5: Type equation generation algorithm.
replace rule ABS with
F fi : F1,... ? F fk--Hi : Fk?1,
[generic(F1 )/f1] .. [genernc(F??1 )/fk--Hi]
[((M1dim, T1?,... , ((Mnd??, Tn)) ((MdeP, T))/fk]
[((?,M1dim,?1)/x1] . [(?,Mndim,Tn))/xn] F E :
H f?:(M1dim,Ti),..., (Mndim, m)) H ((MdeP, T)) (fk in fkxl... , Xn = ) dABS
4.6.5 Generating Type Equations
The algorithm for generating type equations for a svexp function fk is given in
Figure 4.5. It is the same as Figure 4.3 for axis types, except the initial assumption
of the function type reflects dABS. The corresponding action rules are in Figure 4.6
and Figure 4.7. These are derived from the typing rules of Tables 4.5 and 4.6. Note
106
(F, Typca(E:((M?,Mc,??), T)?
if [o/a] ? MA then
T (([d/a]m, [o/c]n, b?
else
(F, E, ((K/alrn, [d/c]n, b))
T (([min fdi, d21/aim, [o/c]n, b) (F, E, (([di/a]m, [d2/c]n, b')'))
(I?, Proj acE, T) ? T =? (([o/ajm, [d/c]n, b? (F, E, (([d/a]m, [o/c]n, b))
(F, Extr0 cF, T) ? T =? ((m, [o/c]n, b? (F, E, ((m, [d/c]n, b))
(F, Extr1 cF, T) ? T =? ((m, [d --H l/c]n, b? (F, E, ((rn, [d/c]n, b))
(I', Red + cE, T) ? T =? ((m, ?, Int')? (F, E, ((rn, [d/c]?, Int))
(F, +FlF2, T) ? T=? ((mim2,QInt?
(F, E1, ((m1, $, Int?) (F, E2, ((m?, $, Int))
(F, Joinc(E1 ((MAl,Mcl,Tl))(F2 ((MA2,Mc2,T2)), T) ?
if [./c] E Mci and [./c] ? Mc? then
T =? (mi m2, [d1 + d2/c]n, b) (F, E1, (mi, [di/c]n, b)) (F, E2, ((m2, [d2/c]n, b))
else if [./c] E Mci and [o/c] ? Mc2 then
T (mi m2, [d1 + 1/c]n, b) (F, E1, (mi, [di/c]n, b)) (F, E2, ((m2, [o/c]n, b))
else if [o/c] E Mci and [./c] ? Mc2 then
T (mi m2, [1 + d2/c]n, b) (F, E1, (mi, [o/c]n, b)) (F, E2, ((m2, [d2/c]n, b))
else
T ((m1 m2, [2/c]n, b) (F, F1, (mi, [o/c]n, b)) (F, E2, (rn?, [o/c]n, b))
Figure 4.6: Function body action rules for typing (part 1).
107
(F, CondBETEF, T) ?
T =? (m?m?mp,n,b)
(1', 1?, ((m?, ?, Bool))
(F, ET, (m?, n, b)) (F, EF, (m?, n, b))
(F,x,T)			?			TA'(Fx)
- xo,o xo,q?1
xp--H1,o			xp?i,q?i ,			? T =? ($, [p/O] [q/lj, Int)
(F,fEi,,En,T)			?			T=?((ml...mn,n,b)
(F, E1, ((mini, b1))			(F, E?, ((m?, fl?, bn))
(F,j;((n1,b1),... ,((nn,bn) H
f,			T)			?
if (Fl) = genernc(F) then
T A' F' where F' is F with all variables
replaced by fresh ones
else
T A F' where F' is F with dimension
variables replaced by fresh ones
Figure 4.7: Function body action rules for typing (part 2).
the rules for Typ and Join consult the axis types of arguments to select a case
corresponding one of the associated typing rules for these constructs, The proof
of soundness and completeness for this algorithm is almost identical to that for
Theorem 4.9, and we omit it.
4.6.6 Simplifying dimension equality constraints
After type equations have been generated and solved by unification, we are left with
the set Z of dimension equality constraints. It is not generally possible to solve these
in the complete way unification solves type equations, where a substitution exactly
encodes all solutions. However, we can simpify Z symbolically in several ways.
108
(a) Apply identities over N. In particular, fold constants and replace e.g. (d1 +
d2) d2 with di.
(b) Eliminate variables. If there exists an equation of the form d = D in Z
wherein d does not occur in D, then replace d with D in all equations and
types. Practically speaking, we want to eliminate all variables except those
introduced ill the type assumption (Definition 4.12 and Figure 4.5). Otherwise,
we may prove e.g. that
rather than the more useful
rev : (([d+ 1/O],b?			(([d+ 1/oJ,b?
rev : (([d01/0],b? H (([d01/O],b?
(c) Throw out tautologies. Equations of the form D1 = D2 where D1 = D2 (e.g.
3 = 3 and d1 + d2 = d2 + d1) are superfluous and may be removed from Z.
(d) Report fail for contradictions. If an equation such as 2 = 1 arises, the function
may return typerr2 and must be discarded as erroneous.
(e) Eliminate dependent dimensions. In general, equations in Z will contain subex-
pressions of the form Dc(E), where E is a sequence of (dim? expressions.
Such terms are the dimensions of recursive function call results. They cannot
be translated to checking code or evaluated symbolically unless the function
that Dc denotes is known.
Item (e) provides the last task for the type checker. It must derive Dc for each
axis c in the result type of f. The following theorem at once shows that this is always
possible (if f terminates) and suggests how to do it.
Theorem 4.13 Let f and its type assumption be as in Definition 4.12. Let Z be
the dimension equality constraints from typing f. Further suppose variables not in D
109
have been eliminated wherever possible in Z. Suppose f terminates for at least one
input (i. e. its semantic value is not I). Then for each Dc in Mdep. there exists in Z
an equation of the form Dc(D) = Ec, where Ec is a dimension expression with no
dependent dimensions and whose variables are all in D.
Equations Dc(D) = Ec are important because they are closed definitions for depen-
dent dimension functions. We can use them as checking code or, more fruitfully,
to eliminate dependent dimensions from Z altogether. That is, for all subexpres-
sions Dc(E) appearing in Z, we can substitute Ec wherein all variables in D have
been replaced by the respective expressions in E. It remains to prove Theorem 4.13.
Proof. A svexp term has the property useful if it satisfies the following inductive
definition:
o+ Bound variable references and constants are useful.
o+ Non-recursive user function applications and primitive applications excepting
Cond are useful if their arguments are useful.
o+ A Cond application is useful if its predicate expression is useful and at least
one of its branches is useful.
In particular, code including recursive function calls is not useful, unless the call is
embedded in a Cond branch, and the other branch is useful. It is straightforward to
show using the full semantics of svexp that any function f whose semantic value is
not 1 has a useful body. Operationally, this is equivalent to saying that f is free of
technical errors such as undefined variable references and that it terminates for some
input.
A typing is useful if its svexp subterm part is useful.
For convenience, we assume that the unification algorithm is applied to each new
equation as it is generated by Figure 4.5. Thus we can reason about the equations
introduced in Z by application of each action rule.
110
We show that the following invariant is true for each iteration in Figure 4.5.
Invarnant. One of the following holds regarding each dependent dimension func-
tion D?:
(a) A pair [Dc(D)/a] appears in a useful typing in Y
(b) Z (with variables eliminated) contains an equation Dc(D) = E, wherein E has
no dependent dimensions and each variable d is either in D or appears in a
pair [d/aJ in a useful typing.
Since no useful typings remain when the algorithm terminates successfully, this proves
the theorem as desired.
Before the first iteration, Z is empty and Y contains only the useful typing
(F,E,((?,MdeP,b))
which satisfies (a).
Iterations that act on non-useful typings do not affect the invariant.
Suppose the iteration selects a useful typing for Proj a c. Then the new typing
introduced to Y is certain to be useful. Consider
T = ([o/a]m, [d/c]n, b? and (F, E, (( [d/a]m, [o/c]n, b)?)
the new equation and typing introduced by the action rule. Unification of variables m
and n ensures all pairs in T except one for axis c also occur in the new typing after
unification. Thus the invariant is preserved for all these pairs. Regarding the pair
for axis c, suppose it is of the form [Dc(D)/c], i.e. it supported invariant (a) in
the last iteration. Call this an (a)-pair. Then unification introduces the equation
Dc(D) = d into Z, but d occurs in a pair [d/aj in the new, useful typing in Y, so
the invariant is preserved. Similarly, suppose the pair for c in T is of the form [d'/ c],
where d' appears in the right hand side of an equation Dc(D) E in Z. I.e. this
111
pair supported invariant (b) in the last iteration. Call it a (b)-pair. Then unification
introduces the new equation d = d'. By variable substitution, we replace d' by d
in E, and the invariant is again preserved.
Similar reasoning establishes the invariant for all primitives and non-recursive
user function calls. For Cond, we need the additional fact, due to the definition of
useful, that at least one of the branch typings introduced in Y is useful, and this
typing preserves the invariant.
At a bound variable reference, each pair in T unifies with a pair whose dimension
is a variable in D. Thus if a pair in T is an (a)-pair as above, unification introduces
an equation in Z, where d c D; thus the invariant stands. Similarly, if a pair in T
is a (b)-pair, unification introduces an equation d' = d, where d ? D. Variable
elimination replaces d' with d in E, and the invariant holds again. Constants are
similar, introducing equations in Z of the form Dc(D) = k, k a numeral, for (a)-pairs
and d' = k for (b)-pairs. E]
For Example 4.10, axis typing yields rev : (([o/Ojn, b? H (([./O]n, b?. Hence the
type assumption for rev is (([dio/O]n, b) H (([D0(d10)/O]n, b?. After type checking
and variable elimination, we have Z = ?Dd(d10) d10, D0(d10) `1 + D0(d10 1)J.
Using the first equation twice, we obtain Z = Idio d10, d10 =1+ (d10
The first equation may be immediately eliminated as a tautology. The second is true
whenever d?0 > 0. We compile this as a run time check.
Chapter 5
Code Generation and
Improvement
In this chapter, we set out to compile svexp programs to an intermediate form
that translates easily to machine code for a variety of single and multiple processor
architectures. As an example, we give details for translation to the C language
[KR78], which serves well as an abstract assembler code.
Compilation of svexp, as with most functional languages, is easy if efficiency is
unimportant and difficult otherwise. Chapter 1 pointed out that storage manage-
ment and avoiding needless copies form a large part of the efficiency issue. This
is exacerbated for svexp by the domain of problems where s-vals are handy; i.e.,
many interesting matrix problems are notoriously large; a single array may comprise
nearly all of a program's storage requirements. Such problems are often deliberately
scaled to squeeze into available memory resources.1 To be taken seriously in this en-
vironment, a compiler cannot generate code that gratuitously makes copies of arrays,
TThe author was reminded of this recently on overhearing a conversation of some
scientific programmers. They were lamenting that a certain implementation of FOR-
TRAN disallows array declarations larger than 512 megabytes. We look forward to
the day when the cost of copying such a data structure is not important!
112
113
though this would be the propensity of a naive one.
5.1 The Standard Compilation Model
The usual approach to compilation of a language L is this:
o+ Define the target language T along with its semantics.
o+ Give a compiler as an inductively defined function C[ ? that accepts a term
from L (in the brackets ? ?) along with n following arguments, which describe
the form of compilation desired, and returns the compilation into T
o+ Use the semantics of L and T to show in some desirable sense that for any
program P and parameters Pi,?? , p? for the desired form of compilation, the
meaning of C[Pjp1 . p? in the semantics of T is equivalent to the meaning
of L. This establishes the correctness of C. There are several choices for the
semantics, but a common one is denotational semantics for L and operational
semantics for T
o+ If the result from C is not efficient, we may define an additional function that
transforms C[P?p1... p? to another term in T that is more efficient. This is
optimization.
Example 5.1 Let x range over names for integer storage locations and
T ::= loadx storex addx 1 T;T
with the usual semantics of an accumulator-based architecture. Suppose L has a
binary t operator. Then we might try the following for part of the compiler:
C[E1+E2?=C[E1?; storetmp; C[E2?; addtmp
and prove it correct by showing that if the code returned by C[E1? and C[E2? leave
the correct values in the accumulator, then so will C[E1 + 2? In the process we will
114
learn that the value of tmp cannot be changed in whatever code is returned by C?E1?
or cIE2?, e.g. we should use a fresh temporary at every application of the rule. ?
We will follow part of this compiler-building regimen, but depart from it in im-
portant ways. The first is the nature of C. The code returned by C as described
above can depend only on what has been "passed down" in the parameters pi ... pn
(we have n = 0 in the example) and on the top level structure of the term to be
compiled (+ in the example). The difficulty with this approach in our case is that
the best compilation for a svexp subterm may depend on deep internal structure.
Example 5.2 Consider the term Red t 0 E. If E consists of several (typical) col-
umn vectors Join-ed together end to end, then we want to compile the reduction
into several reducing loops whose results are added. The alternative is just one loop,
which needs a tree of conditionals (hence run time for comparisons) in its body to
accumulate a unique element from among the column vectors for each iteration. Yet
writing a general compilation rule to express several loops is difficult because the
structure of E is not apparent. ?
There are several possible approaches to this problem, which appears in myriad
forms in our compiler. We choose to define code accumulations. The language of
code accumulations is just the target language T augmented with active sites. An
active site has the form
?E,p1,... ,pnl			(5.1)
where E is a subterm of L and .1... p? are parameters that tell how to compile it,
the same ones that would be passed as arguments in a traditional compiler.
Thus a code accumulation with active sites comprising some of its leaves is a
partially compiled program. Specifically, the rewrite function 1? accepts a code ac-
cumulation, picks (arbitrarily) one of its active sites, and returns a rewritten code
115
accumulation that is in some sense "more compiled" than the original.2 Rewnting
depends only on the top level structure of the chosen active site, but may entail the
entire accumulation, in particular the path in the syntax tree from the active site to
the root. The compiler C(( ? just applies ? repeatedly until no active sites remain.
We use a new kind of brackets (() to remind the reader that the argument is a code
accumulation, rather than an element of L. Let (inactiveE) return true when code
accumulation E has no active sites, false otherwise. Then we have
C ((P? = inactive P H ((P), C(9? ((P?)
(5.2)
Any traditional compiler can be expressed with code accumulations. Every com-
pilation rule gives rise to a rewrite transformation which merely replaces an active
site and does not change the rest of the accumulation. For instance, the rule in
Example 5.1 yields
1Z((??E1 t E2?? = (??E1?; store tmp; ?E2?; add tmp)?
(5.3)
Here ? denotes the context (ancestor nodes in the syntax tree) of the active site
?i +E2j.
Returning to Example 5.2, the compilation of Red+ 0 E will initially be a reducing
loop whose body is an active site containing E;3
n--H1
i=O
where n is the inner length of the vector value of E obtained from its type. Thereafter,
if E turns out to consist of Join'ed vectors, i.e. E = Join 0 E1 E2, then ? splits the
2The definition of "more compiled" involves the sizes of L-subterms in active
sites. A simple count of active sites is not useful here because rewriting e.g. a binary
operator application transforms one active to two.
3The actual target language uses a less standard but more convenient notation
for summation ranges than that used here.
116
reducing loop into two loops:
ni--H1			n--H1
i1=O			i2=nl
where ?i is the inner length of the value of E1 taken from its type.
5.2 The Target Language
The target language for our compiler is called proc because its highest level construct
is the pwcedure. It is given in Figure 5.1. Not surprisingly, the values in proc
programs are arrays (not matrices). An s-val VA;C is represented in proc as an
n-dimensional array with n < A + Cj. The inequality occurs when some axis is
represented as the iterations of a loop. Consider for instance
fxy=--HPr?70Red+ 1Typ07(Join1x?
which, given x and y are column vectors, produces their vector sum. The value of
(Join 1 x y) has Al + ICI = 0 + 2 = 2. Yet only i-dimensional arrays, for x, y, and
the result value, appear in the compilation; the iterations of a summation effectively
represent the missing dimension. We compile each k-ary svexp function into a
procedure of k + 1 arguments; the extra one is a reference to a result array. Hence,
the svexp program body compiles to a procedure of one argument, the array for the
program's answer. For convenience, we adopt the convention that a svexp function
and its compilation as a procedure have the same name.
The proc procedure named f, obtained by compiling the svexp user function
definition for f, has two useful interpretations. The first is an zmperative procedure
that accepts arrays a1,... ,a? as arguments and stores a result into array r, as in (5.4).
In this case, we interpret the = in (5.5) as an assignment operator; thus the term may
be executed by a computer. This is simply the compilation we are after. The second
interpretation is a declaration of the value of r in terms of the array values a?. Here
117
(proc) ::= (proc-name)(a1,... , a?; r) ? (assign)
(assign) ::= (subarray) = (epr)
(proc-name) ( (subarray)*; (subarray)) I
V (assign)
kE(rng)
(assign) A (assign)
if (expr) then (assign) else (assign)
let (array-id) ? (assign) in (assign)
(ewpr)			(subarray)
(epr) I
kE(rng)
(expr) + (expr) I
if (expr) then (ewpr) else (erpr)
let (array-id) ? (assign) in (expr)
(rng) ::= to:size
(subarray) : : = [(perm)J (array-id) [(indexer) ?]
(indexer) ::= (integer) (rng)
(perm) ::= (integer)*
Figure 5.1: The target language proc.
(5.4)
(5.5)
(5.6)
(5.7)
(5.8)
(5.9)
(5.10)
(5.11)
(5.12)
(5.13)
(5.14)
(5.15)
(5.16)
(5.17)
(5.18)
(5.19)
118
we consider the = in (5.5) to be a declaration of equality. For this interpretation to be
valid, the compilation must define exactly one value for each element of array r (and
any intermediate results defined by let); otherwise the declaration is contradictory
or incomplete. The corresponding effect in the imperative interpretation is that each
array element in r (and in let-introduced temporaries) is assigned exactly one value
during the procedure's execution.
With this single assignment property, it is not hard to see that the interpretations
are related in an important way: If we can show that the declarative interpretation
matches the semantic value of f under some one-to-one relation between arrays and
matrices, then we know that the imperative interpretation always produces the cor-
rect answer; the svexp and proc programs are just different ways of writing the
same function. On the other hand, the "if" above is a rather big one. It means we
must develop a way to reason about the equivalence of semantic values of subterms
of a given svexp function and the declarative interpretation if its compilation.
5.2.1 Semantics of proc
Let us describe proc programs in more detail. Rule (5.4) declares the procedure
formal parameters and result. Valid procedures use parameters ak only on the right
hand side of =, and r only on the left hand side. In the imperative interpretation,
then, parameters ak may be passed by reference (i.e. as pointers), and the storage
for the result may be allocated by the caller and also passed by reference.
Array variables in proc are bound to arrays with a recursive structure, i.e. an
ntl-dimensional array is an array of n-dimensional ones.
We consider an n-dimensional array to have n axes, just as matrices. The essen-
tial difference is that the axes of an n-dimensional array are always numbered zero
through n --H 1. We write n(a) for the number of axes of the array represented by
a and d?(a) with k > 0 for the bound on the k-th dimension of a. We also write
119
a d0 x ... x dn(a)?1 to assert the structure of a. To reference an array element, we
use brackets, i.e. ako,... , ?n(a)--H1i? where the Zk range over natural numbers. Also
consistent with our matrix terminology, we speak of index element ?k interchangeably
as occurring in the k-th index position or the k-th axis of a. An array a with n(a) = 0
contains only one value. We adopt the common practice of identifying 0-dimensional
arrays with the values they contain. Thus a + a makes sense; we are adding scalar
values to get a new scalar.
Rule (5.5) asserts the equality (resp. assigns the value of) the expression on the
right hand side of = and the array reference. The (subarray) is restricted to be a
0-dimensional one; only one array element's value is defined (resp. assigned) at a
time.
Rule (5.6) describes an assertion of the form
f(A1,... ,A?;R)
where Aj and R are subarrays. It declares that R has the value assigned by the body
of the declaration of procedure f as in (5.4) when its parameters ..... . , a? have
values A1,... , A? respectively.
The imperative interpretation is a procedure call, where subarrays are passed
both as arguments and for the return value. In contrast with the simple assignment
of (5.5), a procedure call can effectively assign many scalar values with no V, as
in (5.7). Call-by-reference semantics are assumed for all array passing. Hence the
compiler must allocate (with a let) fresh memory for arguments and return values
whenever these are not subarrays of the parameters or result of the procedure in
which the call occurs.
Rule (5.7) asserts that the included equality holds (resp. iterates the included
assignment) over a range of subscripts.
Rule (5.8) asserts the equalities on both sides of the A (resp. executes both as-
120
signments in parallel). The left and right hand sides may both use any given value,
but, due to the single assignment property, they cannot both de?ne (resp. assign)
the same array element.
Rule (5.9) conditionally asserts the equality (resp. evaluates the assignment) of
either its then or else branch based on the value of its boolean predicate.
Rule (5.10) defines an intermediate value (resp. declares a temporary array) a
with its first (assign) term, then asserts (resp. evaluates) the second (assign), which
may refer to a on the right hand side of =. Since no proc code outside the let can
properly refer to a, its storage may be freed when execution of the second (assign) is
complete.
Rule (5.11) states that a subarray is a valid expression, as is a vector reduction
(5.12) and scalar binary t (5.13).
Rule (5.14) conditionally returns the value of either its then or else branch
based on the value of its boolean predicate. The compiler generates only i? with
0-dimensional values.
Rule (5.15) is the same as (5.10), except it returns the value of the expression in
its in part. As a practical matter, the compiler restricts the value to be the result
of ?, +, or i? which are all 0-dimensional. Hence a stack suffices for allocating the
storage for let-defined arrays in proc programs.
Rule (5.16) gives a uniform syntax for index ranges in proc. The range lo:size
contains the natural numbers from lo to lo + size --H 1 inclusive. Thus k E lo:size is
equivalent to to < k ? lo+size. The names lo and size represent natural number
expressions, which have the same syntax as (dim) in Figure 4.11.
Rule (5.17) describes subarrays, which are the subject of the next section.
121
5.2.2 Subarrays
We consider first the [(indexe??] part of the subarray syntax, then the [(perm)] part.
A subarray of a must have n(a) indexers as described in rule (5.18). A simple
indexer e is just an expression over natural numbers; it selects a single component
of a. A range indexer selects a range of components. The following is a direct
consequence: Let a be an array and b be a subarray of a with s simple indexers and
r range indexers. Then we have n(b) = r = n(a) --H 8.
Example 5.3 Suppose a : 3 X 4 X 5 and let b = a[1:2,3,3:1]. Then n(b) = 2; and
we have b : 2 x 1. Let c = b[1,0:1]. Then n(c) = 1, and c : 1 (i.e. c is a vector of
one element). Furthermore c = a[2, 3,0:1]. Finally, d = c[0] is 0-dimensional, and
In general, a[0:d0(1),...,0 : dn(a)?i(a)] = a. Thus if a is 0-dimensional, we have
= a. This is consistent with 0-dimensional arrays as values as discussed on
page 119.
We now consider the effect of the [(perm?] part of a subarray of array a, which is
a permutation of the range 0:n(a). If .0.... .Pn(a)--H1] is such a permutation, then
is also an array related to a by
a' = [po.... .Pn(a)?i]a			(5.20)
a'[i0, . . . .?n(a)--H1] = a[i?0, . . . `?Pn(a)--HI]
(5.21)
I.e. the permutation specifies a generalized transposition. If Pk = k for 0 < k <
n(a)--H1, then no transposition occurs. For convenience, we can omit the [(perm?J
part of the subarray notation when this is the case. Thus the array element reference
notation a[.J we have been using is just a subarray with all indexers simple and no
transposition.
122
Example 5.4 Consider once more the dot product. Assume that a1 and a2 are
i-dimensional arrays with do(ai) d0(a2) = d. Then we have
dot(a1a2; r) ? r[] = ? (ai[k] >c a2[k])
kEO:d
and for the transpose of matrix multiply (ai a2)T, we have:
mmt(a1a2; r) ? V			V dot(a1[i,O:dij, a2[O:di,jJ; [1,Djr[i,j])			(5.22)
iEO:d0jEO:d2
with the understanding that al : d0 x d1 and a2 : d1 x d2. Note how the transposition
[1, 0] is applied at the dot product result. L]
5.3 Compiling proc to C
The reader may properly raise an eyebrow at the definition of proc, in that some of its
features subarrays and transposition, for instance--Happarently require temporary
space and copying to implement. Hence before undertaking the task of translating
svexp to proc, we consider the simpler job of showing that proc can indeed be
translated to machine code that is quite efficient.
5.3.1 Dope vectors
Dope vector is a classical term for a run-time structure that encodes referencing
information for an array, typically the sizes of each dimension. Some form of dope
vector is required to represent an array (in addition to a block of storage for its
elements) whenever the array size is not known at compile time.
In most language implementations the dope vector for n-dimensional array a is
really a record. It contains a pointer a.p that points to the start of the element
storage block and a vector a.d of n integers. The vector elements store the size of
each array axis. These may be used for "range checking" index values as well as to
reference array elements. For the latter, a typical convention to find the address of
123
element a[ko, k1,... , kd?1] is:
a.p+k0+
1 (ki ?k?1oadj)
This effectively interprets indices as a lexicographic order, with elements farther to
the right receiving higher weight, a so-called column-major order. This is a suitable
practice for languages where transposition is not a primitive. However, it is not
sufficient for proc, where it would require copying to effect the axis permutation of
(5.17).
Hence we generalize dope vectors to achieve the following features:
o+ An array may be transposed by manipulating only its dope vector.
o+ A subarray may be formed by creating only a new dope vector whose pointer
points into the existing block.
o+ Given the dope vector and the address of any array element a, the address of
any element adjacent to a (along any axis) may be computed with one addition.
Fortunately, there is little overhead for this generality. The dope vector for an
n-dimensional array a in our case has this structure:
o+ Pointer a.p. This points to the first location of the element storage block as
before.
o+ Stride vector a.s with n elements; a.sk gives the spacing between elements along
axis k.
o+ Size vector a.d with n elements. Element a.d? gives the size of the k-th axis.
5.3.2 Element addresses and transposition
Rectangularity constrains the values of a.s. The constraints are founded in an implicit
permutation of the axis numbers 0 ? k ? n(a). Let Pk be the k-th value of this
124
permutation. Then if a is an array in contiguous storage (i.e. it is not a general
subarray as described below), we have
b1 a.d?? for 0 ? k ?
j=O
From this it is easy to see that our generalized dope vector encodes a lexicographic
order on index elements just as the conventional one, though here the order is fixed
only at run time. The element address formula is
d--H1
ap + ?			a.sk
k=o
5.3.3 Subarrays
Consider the proc subarray expression
.Pn(a??iia[Io,... In(a)?1]
(5.23)
as in (5.20) with a represented as a dope vector. Then the value above is just
another dope vector a'. Let R be the subsequence of ....... , 1n(a)--H1] consisting of
range indexers 1k = lok:sizek; the rest are simple indexers. Extend lo so that 10k = 1k
if 1k is simple. Let ....... , r IRI--H1J be the subsequence of [po...., Pn(a)--Hi] such that
1Pk ? R. Then
n(a)--H1
ap = apt Thp??a.s?
a 8k = ?.Sr? for 0 < k ? RI (5.24)
= a. S?Z?r? for 0 < k ? RI
Simple algebra shows that (5.23) applied to the subarray thus constructed addresses
elements as described for rule (5.17) above.
Most importantly, the construction of a' involves no copying of the storage block
of a into fresh storage. It requires only O(n(a)) time and space.
125
Table 5.1: Dope vector al as parameters.
Dope vector			translation
n
al.p			al
al.s0			alsO
al.s1			a1s1
al.d0			d4
al.d1			d3
5.3.4 Strength-reduced stepping
We have shown that copying can be avoided in translating proc to machine code, but
there are still some nagging expenses. Consider the result subarray r' = [1, O]r[i, ji
passed to dot in (5.22). This involves the computation
r .p = ?pti r.s1 +j r.s0
for each call to dot. This is expensive on most machines due to the multiplications. If
in practice the V loops are executed sequentially, efficient machine code will compute
the address of [1, O]r[i,j] initially for i = j = 0, then update it by addition for
successive iterations. In particular, incrementing i is equivalent to adding r.s1 to
the previous address; incrementing j is the same as r.s0. This transformation is
just a composition of two well-known optimizations induction variable elimination
and strength-reduction [ASU86J. Experience such as [Mey88] shows that it speeds
array computations considerably. Yet it is non-trivial to apply in general imperative
languages because the structure of iterations must be gleaned by the compiler; proc
presents no such problems. For illustration, Figure 5.2 shows a compilation to C
of dot and mmt. Dope vectors are represented in parameter lists of C functions.
For example, the dope vector for al, the first parameter to rnmt is represented as in
Table 5.1. The names d4 and d3 were created by the dimension inference algorithm
of the compiler implementation.
126
void dot(al, alsi, dl, a2, a2s0, d2, r)
FLOAT?PTR al; int alsi, dl;
FLOATPTR a2; int a2s0, d2;
FLOAT?TR r;
f
FLOATPTR a2p0, alpi;
FLOAT to;
int i0;
tO = 0;
for (a2pO = a2, alpi = al, i0 = 0;
jO < dl;
a2pO t= a2sO, alpi t= alsi, +tio)
tO t= (*alpl) * (*a2pO);
*r = to;
void mmt(al, also, d4, alsi, d3, a2, a2s0, d5, a2sl, d6, r, rs0, rsl)
FLOAT?PTR? al; int als0, d4, alsi, d3;
FLOAT2TR a2; int a2s0, d5, a2sl, d6;
FLOAT??TR r; int rs0, rsl;
f
FLOATPTR? rp0, rpl, a2p2, alp3;
int i0, ii;
for (rp0 = r, alp3 = al, i0 = 0;
i0 <d4;
rp0 += rsl, alp3 t= als0, ttio)
for (rpl = rp0, a2p2 = a2, ii = 0;
ii <d6;
rpl += rs0, a2p2 += a2sl, +til)
dot(alp3, alsi, d3, a2p2, a2s0, d5, rpl);
Figure 5,2: Example of proc translation to C.
127
5.4 A Correctness Criterion
We now undertake the task of compiling svexp user function definitions to proc.4
The first step is to form the initial code accumulation to pass to the compiler C as
in (5.2). For a function definition
fxi ...			E : C,?			(5.25)
where E: C, ? denotes that type inference yields an inner map that denotes axis set C
and dimension function ? for the function body, the corresponding accumulation is
,a?; r) ? ?E, =,r,O00,Oc,Ac.((qcc),(?c)?,?) (5.26)
This is a just a procedure whose body is an active site containing E, the function
body from (5.25). The extra objects in the active site are parameters to direct the
rewrite function as in (5.1). We discuss them in detail as we give 7Z below.
A correct compilation of a svexp function named f to a proc procedure named f
can now be characterized intuitively in terms of a one-to-one "translation" between
matrices and arrays as follows.
o+ Consider the semantic value f bound to the name f in Equation (3.15).
o+ Consider matrices M1, ... ,M? of suitable type to be arguments off. Translate
them to arrays ..... . ,a? and consider these to be the values of the parameters
of procedure f.
o+ Determine the array value r asserted by the fully compiled body of (5.26).
o+ Procedure f is correct if the translation of the matrix
fM1 ...
to an array value is equal to r for all .1.... ,M? of suitable type.
4The "def" body of a svexp program is just a nullary function application, so we
will not discuss it as a separate case.
128
Formalization of this idea begins with the notion of "translation."
Definition 5.5 Suppose MA,Q and let n = lAl. Also let q : A (O:n) be a 1-1
axis map (thus q?' exists). The translation of M in proc, denoted (T M q), is the
n-dimensional array a: ? (q?1 0) x ... x ? (q?1 (n --H 1)) such that
,?n--HiJ = MO ko/(q?1 0)] . . [i??i/(q?1 (n--H1))] (5.27)
Similarly, if we are given array a and inverse axis map q?1 then (5.27) also
describes the translation of a in svexp, denoted (T--H1 aq?1).
Finally, if q is constrained to be monotonic with respect to the order of natural
numbers, then it is unique for a given axis set A. We call this the canonical axis map
for A and denote it q?. The translations it induces are canonical translations. The
canonical translation of MAin proc is thus (T M q?), which we shorten to (T M)
for convenience. Inversely, the canonical translation of a in svexp is (T--H1 a
which we can shorten to (T--H1 a A), since A determines ??1 uniquely. El
Example 5.6 Suppose MA,a where A = f2, 5?, ? 2 = 3 and ? 5 = 4. The canonical
map q? has q?2 = 3 and q?S = 4. Thus (TM) :3 x 4. El
With this, we are ready to give a precise notion of what it means for a procedure to
be correct.
Definition 5.7 Let ? be a svexp function variable environment wherein f is bound
to an n-ary function. Let procedure f be a proc procedure that results from compiling
the definition (5.25). Recall that a procedure call f(Ei,... , E?; E) is an assertion of
the value of E in terms of the values ?.,... ,
We say procedure f is correct with respect to ? if for all matrices ?1... ,
the following holds:
M = (?f) M1 ...			? Wt			(for svexp function f) implies			(5.28)
f((TM1),... , (TM?); (TM)) (for proc procedure f)
El
129
That is, all calls to procedure f on the translations of M1,..., M? assert result values
that match the return value of svexp function f `in ? on ..... . , M?, whenever
that return value is not I or a type error.
Rule (5.4) states that the form of procedure f is
f(a1,. . . ,a?; r) ? B			(5.29)
it is not hard to show that a sufficient condition for procedure f's correctuess with
respect to ? is the following property of its body.
Property 5 For any matrices M1,..., M?, let
and in B of (5.29) let
p = i [Ai.Mi/x1J.. [Ai.Mn/xn]
al = (TM1),. . .,a? = (TMn)
If V = ?IE1?P ?Wt, then B asserts thatr has array value T(VO). ?
Thus our approach to compiling correct procedures is this:
o+ Extend T to active sites, where it returns a fragment of proc.5
o+ Extend T again to entire code accumulations. A translated accumulation is
the same as the original except that all active sites are replaced with their
translations.
o+ Show that the translation of (5.26) is correct by verifying Property 5.
o+ Derive 1? so that if ? is a code accumulation with an active site and (T ?) is
correct, then T (? ?) is correct.
5in fact it is notationally simpler to use hybrid fragments that are part proc and
part svexp semantic expression, and this is the course we take.
130
It is important to stress that T and the value it returns are ephemeral tools for
establishing the correctness of compilations in progress. Though T returns proc
code, T is not part of the compiler, and its return value is useful only for its declar-
ative interpretation, which stands for the semantic value of an uncompiled fragment
of svexp.
5.4.1 Active sites
The active sites in Example 5.3 are simple, containing only a fragment of source
code to compile. As shown in (5.26), however, svexp is more complex. Active sites
must contain enough information to tell the rewrite function ? how to go about
rewriting. In particular, this information much be rich enough so that 7? can make
good decisions 011 where to emit two key structures:
o+ The lets which allocate storage resources at run time.
o+ Loops (V, ? and II) that, in a manner of speaking, allocate computation re-
sources and generally determine the asymptotic running time.
In this case, procrastination is a virtue. The best course is always to delay emitting
lets and loops as long as possible, until it is certain they are needed, and then to emit
them in the best possible place. For instance, we generally want lets as far outward
ill the nest of loops as possible, so that values are not recalculated unnecessarily in
the imperative interpretation of the result.
There are two classes of active sites those that are rewritten into proc (assign?
subterms and those that are rewritten into (expr)s. We consider these in order.
5.4.2 Active Assignments
The general form of an active assignment site is
=, tgt, tgt-base, val-base, size, i-vars?
(5.30)
131
To understand the components of this structure, it is helpful first to extend the
notion of range indexers of subarrays backward into semantic domains to matrices.
We did this in a specialized way by extracting all but the first row in the semantic
equation for Extr (3.4). The generalization is straightforward; the submatrix of M
from lo to lo + size --H 1 along axis a is just
Ai.(ia) ? size H Mi[(ia) + lo/a],rangerr
This notion extends naturally to s-vals; we can extract subranges along outer axes
as well as inner ones to obtain sub-s-vals.
Another term: we say that a loop is ancestral to an active site if the active site
is within the loop body.
Then the components of (5.30) have commonsense meanings as follows.
E The svexp subterm to be compiled.
= A flag denoting that this is an active assignment. Active expression sites have?
instead.
tgt The array, called the target, into which assignment is to be performed. In fact,
the assignment is to a subarray of tgt is in (5.5). The rest of the subarray is
encoded in other fields.
tgt-base Encodes the values 10k as in (5.24) of the target subarray.
val-base Encodes lo values for a sub-s-val of the semantic value of E. This gives
the values assigned to tgt and is called the source.
size Encodes size values for both the target subarray and the source sub-s-val, but
includes only axes for which ancestral loops do not already exist.
i-vars Encodes loops ancestral to the active site.
132
It remains to note the form of all these "encodings." They are simply maps that take
axis numbers to other values tgt-base takes target array axes to symbolic arithmetic
expressions derived from types, and va1-base takes s-val axes to similar expressions.
The map size takes s-val axes to pairs of the form (array axis, arithmetic expression?.
Finally, i-vars takes s-val axes to the names of loop variables, which conveniently
serve as identifiers for the loops themselves. While the inputs and outputs of user
functions are matrices, intermediate values in the function body are s-vals. Thus the
domains of val-base, size, and i-vars may include both inner and outer axis numbers.
For this reason, we assume that axis numbers are "tagged" so that same-numbered
inner and outer axes are distinct.
Such maps are easily implemented as a sequence of pairs; e.g. patching and
application modify and search the sequence respectively. The shorthand (size1a)
denotes the array axis part of the pair mapped from a by size. Similarly (size? a)
gives the expression part. The domain of a map m is dom m, and ran m is its range.
Other operations are introduced as required.
We are now in a position to describe the contents of the active assignment in
(5.26):
r The target array, which is the result value.
00:ICI (tgt-base) A map taking the axes of the target array to zero. These are
the lo values as in (5.24) of the target subarray.
Oc (val-base) A map taking axes of the function body s-val to zero. These are the
lo values as in (5.24) of the source sub-s-sval.
Ae.((qcc), (? c)? A map taking every s-val axis of the function body to a pair
consisting of the corresponding canonical array axis and the function result
size obtained by type inference.
133
The empty map of loops already emitted.
The translation of an active assignment
T ?E, =, tgt, tgt-base, val-base, size, i-varsl
(5.31)
is based first on the idea that size represents V loops necessary to fill in the target.
Thus we construct a loop nest with one loop for each element of the range of size.
Correct source values come from the obvious place the semantic value of E. Let ?
and P be environments as assumed in Property 5, page 129. Recall that ran m is the
range of map m. Suppose ran size = t(bi, ....... , (bm, em)Y and vA;c =
where A = ....... , apl and C = ....... , Cq1. Finally let the target be n-dimensional.
Then the desired translation (5.31) is
v
?6m VEo:em tgt[i0 + (tgt-baseo),... ,???1 + (tgt-base(n --H 1))] =
v 0 [(Mai)/ai]... [(M a?)/a?] 0 [(Mci)/ci].'. [(M cq)/cq]
where with a an s-val axis (inner or outer)
Ma =
ib + (val-base a)			if (sizei a) = b
(i-varsa) + (val-base a)			if a ? (dom i-vars)
(va1-base a)			otherwise
(5.32)
(5.33)
We have adopted the convention of using ib for the V variable to iterate over axis b of
the target, whether that variable is code already emitted or in a translation (5.32).
Loop variables ib on the left side of the equality that are not ?bk for any k (i.e.
b ? (ran size1)) must be in (ran i-vars). That is, they are variables of ancestral loops.
This relationship must be preserved by ?. Similarly, the cases in (5.33) must be
disjoint and well-defined.
We can now readily verify that the translation of the initial code accumulation
in (5.26) is correct. llere we have (dom size) = ....... , c?? = C, where C is the axis
set of the function result. Assume without loss of generality that ci < C2 < ... < C?.
134
Then the translation is
v			V			r[i0,. ,			= V 0 0 [io/cij'.. [in--H1/cn]
which satisfies Property 5 (page 129) by inspection.
5.4.3 Active Expressions
The general form of an active expression site is this:
(5.34)
lE,?, vat-base, i-va?j
The ? is just a flag to denote that this is an active expression; vat-base and i-vars
serve the same purpose as in active assignments. An active expression is, roughly
speaking, the source value of an active assignment. Thus the absence of a tgt field
is expected. In addition, there is no size field because the translation of an active
expression is always a 0-dimensional value. Thus the translation is a subarray with
simple indexers and a trivial permutation part. Let V be as above; then
T lE,?, vat-base, i-varsl =
(5.35)
v 0 [(Ma1)/ai].' [(M a?)/ap] 0 [(Mci)/c1]... [(M cq)/cqj
where
Ma= (?-varsa) t (vat-base a) if a ? (dom i-va?)
(vat-base a)			otherwise
5.5 The Rewrite Function
The machinery is now in place to derive ? so that it maintains correctness. Most of
the remaining work is to keep track of details as derivation proceeds by straightfor-
ward algebraic reasoning.
5.5.1 Rewriting active assignments
Proj. Consider the code accumulation
((??Proj acE, =, t9t, tgt-base, vat-base, size, i-varsj))
(5.36)
135
Let VAC --H ??Pr?acE??P = Aij.V'i[(jc)/a]j, where ? and P are as in Prop-
erty 5, page 129, and v?A';c' = S[E? ?P. The translation of the active site is then
v			o+.			V			tgt[io + (tgt-baseo),. ,???1 + (tgt-base(n --H 1))]
?b1 EO:ei
= (Aij.V'i[(jv)/a])O[(Ma1)/a1].. [(M ?)/?J O[(Mci)/ci].. [(M cq)/cq]
--H V' 0 [(M ai)/ai] . [(M ?)/?] [(M c)/a] 0 [(M ci)/ci] [(M cq)/cq]
(5.37)
with all the relevant auxiliary definitions from (5.32). Our goal is to show that this is
the translation of some active assignment that is an easily computed transformation
of (5.36). Chapter 3 showed Proj to be merely a renaming of outer axis a of the
operand to make it inner axis c in the result. Thinking about this relationship in
reverse operand axis a as renamed result axis ?leads us to try
lE, =, tgt, tgt-base, (remap vat-base Ca), (remap size Ca), (remap i-vars c a)? (5.38)
where f' = (remap f x g) is the same as f except f' y = f x and f' x is undefined.
We know from Theorem 3.8 that
C?C', a?A, C= C'ufcj, andA'=Aufal			(5.39)
With these it is a substitution exercise to show that the translation of (5.38) is equal
to (5.37). Thus the desired rewrite equation is
7?((?iProi acE, =, tgt, tgt-base, vat-base, size, i-vars??
= ((??E, =, tgt, tgt-base, (remap vat-base Ca), (remap size Ca), (remap i-va?s c
Typ. It is tempting to conclude immediately that
C a E, =, tgt, tgt-base, vat-base, size, i-varsj ?
= ((?[E, =, tgt, tgt-base, (remap vat-base a c), (rernap size a c), (rernap i-vars a c)j)
(5.40)
because Proj often acts as the inverse of Typ; but, it is not exactly an inverse. If we
construct a translation similar to (5.37) for Typ c a E, then (5.40) is correct (again
136
by simple substitution) only if
c ? C, a ? A', C' = C u fcj, and A = A' u ?a?
(5.41)
This is not always the case, however. The following function computes the determi-
nant of a triangular matrix xt.
tdet xt = Red*0 Proj7O TypO7 Typ17 xt
At some point in compiling tdet, the code accumulation is of the form
((??TypO7Typ17xt,...j?
Hence, v?A' --H S?Typ 1 7 xt? ?p and V = ??yp 0 7 Typ 1 7 xt? ?P. From the typing
rules for Typ, we have 7 ? A'. This is a counterexample showing a ? A' in (5.41) is
not necessarily true. Fortunately, a correct rewrite results if we replace r?map with
a slightly different transformer in this case. The full rewrite is thus
?((??Typ ca(E : A'),=, tgt, tgt-base, vat-base, size, i-vars??
= a ?
=, tgt, tgt-base, (augmt vat-base a c), (augmt size a c), (augmt i-vars a
=, tgt, tgt-base, (remap vat-base a c), (remap size a c), (remap i-vars a
where (augrnt f x ?) = f [(f x)/y]--Haugmt is like remap except it does not remove
any element from dom f.
Extr. Similar derivations yield rewrites for Extr.
7?((?iExtr00 cE, =, tgt, tgt-base, vat-base, size, i-vars??
= ((??E, =, tgt, tgt-base, vat-base[O/cj, size, i-vars??
= ((?[Extr01 cE, =, tgt, tgt-base, vat-base, size, i-vars?)
=, tgt, tgt-base, vat-base [( vat-base c)+1/c], size, i-vars1?
137
Formal parameter references. Consider a code accumulation where the active
assignment is for a reference to the ?th formal function parameter.
(?Ixe, =, tgt, tgt-base, vat-base, size, i-va?s?))
(5.42)
Suppose for the moment that size = $. The translation of the active site is then just
tgt?0 + (tgt-baseo),. ,?m--Hl + (tgtbase(m--H1))] =			(5.43)
(Pxt) 0 0 [(Mc1)/c1]... [(Mcq)/eq]
where NI is specialized to
Mc --H			(ivars c) + (vat-base c)			if c ? dom i-vars (544)
vat-base c)			otherwise
By the assumptions of Property 5 on page 129, we can rewrite the right hand side of
the equality (resp. assignment) in (5.43).
(Aj . Mj)O 0 [(M Q)/ci].? [(M cq)/cq] = Mj 0 [(M ci)/ci]... [(M cq)/cq]
If we assume without loss of generality that ci < C2 < ... ? Cq, then this is equal to
ae[(M ci),... , (M cq)]
by Definition 5.5. We conclude that the proper rewrite is
((? tgt[io + (tgt-base 0),... , ?q?1 + (tgt-base(q --H 1))] = ae [(Al ci),... , (Al cq)])
in the case where size =
When size $ ?, we can make progress by picking one element (bk, e?) in ran size
and emitting a V loop for the target axis bk. Here we need to consider the translation
of the entire accumulation, not just the active site of interest. In particular, the full
translation of (5.42) is if the form
v			. v			v			v			.			v			tgt[?=Vp
i?1EO:ei ib??iEO:ek?1 ib?EO:ek ib?+iEO:ek+1
138
where p, and ? are fragments of the translation. This can be rewritten with the ?bk
loop outermost.
ib?EO:ek (i?1Y0:ei			v			v			...			V			tgt[?] = Vp)
? v ib??iEO:ek?1 ib?+iEO:ek+1
At this point we merely stop thinking of the ?bk loop as part of the translation and
emit it as code. The resulting rewrite is called strnpping because it strips this V from
the ephemeral translation and makes it compiler output. It is
((? V ?Xj, =, t9t, tgt-base, va1?base, (rem bk size), (add ?bk size i?vars)?)
?bk EO:e?
where f = (rem x size) is the same as size except (f a) is undefined if (sizei a) =
and g = (add ?x size i-vars) is the same as i-vars except for all s-val axes a such that
(sizei a) = x, we have (f a) = ix. Intuitively, rem removes mappings in size that
take s-val axes to pairs containing x; add puts corresponding mappings to ?x in i-vars
to record the newly emitted loop. This rewrite arises so often that it is worth stating
as an auxiliary function. Assuming (bk, ek? C ran size, we have
strnp(b?, (?[E, =, tgt, tgt-base, val-base, size, i-varsJ?) =
((? V [E, =, tgt, tgt-base, val-base, (rem bk size), (add ?bk size i-vars)?)
?bk EO:ek
Putting all this together, we have the complete rewrite for formal parameters. Let X
be given by (5.42). Then
= (size =
((? tgt[io t (tgt-base 0),... , ?q?1 + (tgt-base(q --H 1))] = a[(M ....... , (A' cq)])
(strip bk X)
where (bk, e?) ? ran size and A' is as in (5.44). The choice of bk is arbitrary.6
6As a practical matter, one would typically choose bk to provide e.g. closely spaced
memory accesses. Though such second-order performance matters are important and
interesting, they are beyond the scope of present discussion.
139
Reductions. The accumulation
((??Red+c(E : ?),=,tgt,tgt-base,val-base,size,i-vars?) (5.45)
submits to an analysis similar to that for formal parameters. If size = ?, then the
translation of the active site is
tgt[i0 + (tgt-baseo),. ,?m?l + (tgt-base(m --H 1))]
--H z Vii[klc])O[(Mai)/ai]... [(Ma,)/?]o
k=o
(Aii
=			VO[(Mai)Iai].. [(M(Lp)/%]O[k/c]
At this point it is clear that the rewrite can only be an assignment with a ? loop on
the right side; but what about the loop body? Equation (5.35) shows that an active
expression is appropriate. The desired rewrite is
(? tgt[i0 +(tgt-baseo),...,Zm?l +(tgt-base(m --H 1))] =
? ?E,?,val-base[O/c],i-vars[k/c]??
k=O:(?c)
When size is non-empty, we merely strip. Let X be the accumulation in (5.45). Then
the full rewrite equation is
= (size =
((? tgt[io + (tgt-baseo),			,?m?l +(tgt-base(m --H 1))j =
? [E,?,val-base[O/c],i-vars[k/c]J),
k=O:(?c)
(strip bk X)
where bk E (ran sizei).
Conditionals. Consider next the accumulation
(?[if (B : A) then T else F, =, tgt, tgt-base,vat-base,size,i-varsj)) (5.46)
Note A is the outer axis set of the predicate value. This set is significant because,
intuitively speaking, loops whose variables index the array value of the predicate
140
must also enclose the code for both branches--Hit would not be sensible for a loop to
churn through elements of the predicate array value without evaluating the branches.
Since the inner axis set of the predicate is empty, the only such loops are those
corresponding to outer axes. Thus the strategy for rewriting is to strip target axes
in the image of A n (dom size) under size1, because we have no choice. Then, as
usual, we look for an easily computed transformation of (5.46) for the base case,
when A n (dom size) is empty.
Beginning with this base case, suppose A n (dom size) = ? and let VB, VT, and
VF be the semantic values of B, T, and F in (5.46) with respect to ? and P as in
Property 5, page 129. Then the translation of the active site is
i?mV?o:em t9tko+ (tgt-baseO),.. ,in?1 + (tgt-base(n --H 1))]
i?1EO:ei
--H (Aij.(VBij) (VTij), (VFij))
0[(Mai)/a1].. [(Ma?)/?J 0 [(Mci)/ciJ... [(Mcq)/cq]
--H VB0 [(val-baseai) +(i-va?a1)/a1j.. [(va1-basea?) + (i-varsa?)/a?] 0
VT0 [(M ai)/ai]... [(M ap)/%] O[(Mc1)/ci]'.. [(Mcq)/cq],
VF 0 [(Mai)/aij... [(Map)/%] 0 [(Mci)/c1J... [(Mcq)/cq]
The intuitive reasoning above is thus verified. The value of the predicate of the
conditional does not depend on any V loop of the interpretation. We can move
these loops into the branches of the conditional. In the process, the functional
becomes a proc conditional assignment, but the meaning is unchanged.
ifVB0[(va1?basea1) + (i-va?a1)/a11 [(va1-basea?) + (i-varsap)/?] 0 then
v			..			V			tgt[i0 + (tgt-base0),... , in?1 + (tgt-base(n --H 1))] =
?b1 EO:e1			?6m
V?O[(Mai)/aij . [(Map)/??] 0[(Mc1)/c1] [(Mcq)/cq]
Vi61?o:ei Vi?m?o:em tgt[io + (tgt-baseO),. ,i??? +(tgt-base(n --H 1))] =
VF0[(Mai)/ai]... [(Map)/a,1 0[(Mc1)/c1].. [(Mcq)/cq]
else
141
The final step is to employ (5.35) on the predicate value and (5.31) on each of the
branches to obtain the rewrite we might have predicted.
if ?B,?, vat-base, i-vars?
then [T, =, tgt, tgt-base, va1-base, size, i-varsj
else [F, =, tgt, tgt-base, vat-base, size, i-vars?
When A A (dom size) # ?, we strip to make progress. Let X be the accumulation
in (5.46). Then the complete rewrite is
= (An (dom size) =
((? if ?B,?, vat-base, i-va?s?
then ?T, =, tgt, tgt-base, vat-base, size, i-vars?
else ?F, =, tgt, tgt-base, vat-base, size, i-va?s??,
strip(b, X)
where b = (sizei a) for some a ? A A (dom size).
Join. We next turn to Join accumulations:
((??Join c(E1 : C,?) 2,=, t9t, tgt-base, vat-base, size, i-vars??
(5.47)
Here no assumptions are necessary on axis sets; no stripping is performed by the
rewrite function. However, rewrite cases arise in a different fashion. Consider the
interpretation of the active site above:
v			.			V			tgt[io + (tgt-baseo),. ,???1 + (tgt-base(n --H 1))]
?b1 EO:ei			?6m
= (Aij . (j c) < (?i' c) (V1 ij), (V2 ij [(j c) --H (7 c)Ic]))
0 [(Ma1)/a1]... [(Ma?)/a??] 0 [(Mct)/c1]... [(Mcq)/cq]
= (M c) ? (7 c)
V1 0 [(Ma1)/ai]			KMap)/a,] 0 [(Mc1)/c1]... [(Mc)/c]... [(Mcq)/cq],
V2 0 [(Mai)/ai]... [(M ap)/%j 0 [(Mc1)/c1]... [(Mc)/c]... [(M cq)/cq]
(5.48)
where
?= ?[1/c] o$Wrw)se
(5.49)
142
to account for promotion of the value of E1. The appearance of (M c) in the condi-
tional indicates that the nature of the rewrite depends on which of the three cases
of the definition of Al in (5.33) apply to c. We repeat the definition here with the
cases lettered for reference.
ib + (vat-base a)			if (sizei a) = b			(a)
Ala =			(i-varsa) + (vat-basea)			if a ? (domi-vars)			(b)
(vat-base a)			otherwise
(5.50)
When c is in dom sizel, then it satisfies (a) and (M c) = ib + (vat-base c) where
b = (sizei c). This implies that the ib loop is in the translation of the active site
rather than in ?, the code already emitted, and the condition (Al c) < (?` c) in (5.48)
is satisfied for the loop iterations where ib + (vat-base c) < (?` c) or equivalently
(recalling from Chapter 4 that a b = a --H b if a> b and zero otherwise) when ib <
(?` c) (vat-base c).The condition is false in the other iterations. This allows a rewrite
by "splitting" the ib loop of the translation into two V loops connected with A and
having disjoint ranges covering the range of the original. Let 5 = (?` c) (vat-base c).
Then the desired subranges are 0:min [(size?c), s? and s:(size2c) 5.
Suppose X = ....... ,Xnl and define f [?/xJ = f [y/xi]?? [`j/xn]. Then the
desired rewrite of (5.48) is
=, tgt, t?t-base, vat-base, size [(b, min [s, (size2 a)i)/a], i-vars?A
aEA
1E2, =, tgt, tgt-base[(tgt-baseb) + s/b], vat-base [(vat-base a) s/aJ,
aEA
size [(b, (size2 a) s)/a], i-varsj))
aEA
(5.51)
whereA= fx (sizeix) =b1.
When c e (dom i-vars), then case (b) is satisfied; an ib loop already exists in ?.
?bEa:e
where A is V or ? and ib = (i-vars c). Referring again to (5.48), we see the condition
(Al c) < (?` c) is satisfied for just the ib loop iterations where i?+(vat-base c) < (?` c),
143
that is when ib < (?` c) (val-basec). Here we exploit the code accumulation to split
the ib loop, rewriting the context ? as well as the accumulation. One way to do this
is as follows. Let s = (?`c) (val-basec). Then we have
=,tgt, tgt-base,val-base,size,i-varsl II			(5.52)
=,tgt,tgt-base,va1-base,size,i-vars?)?
where is A if A is V and + if it is ?. Another approach is to patch tgt-base, and
val-base on the right of and start the corresponding V range at zero. But then we
must also modify all references to ib in the code of ?.
This leaves case (c), where e is in neither (dom sizei) nor (dom i-vars). Thus
occurs for instance in compiling
Extr00 0 (Join 0 v1v2)
Here the condition in (5.48) reduces to (val-basec) < (?`c). Since it contains no
references to loop variables at all, we can change the conditional in the interpreta-
tion into a proc conditional that picks either the (val-basec)-th component of E1 if
(val-basec) < (?`c), or else the (val-basec) --H (?`c)-th component of E2. Expressing
this idea as a rewrite, we obtain
(? if (val-basec) < (?`c)
then lEi,=, tgt,tgt-base,val-base,size,i-varsl (5.53)
else 1E2,=, tgt, tgt-base,val-base[(val-basec) --H (?`c)/c],size,i-vars?)
Combining the three cases yields the full rewrite rule for Join assignments. Let X
be the accumulation in (5.47). Then we have
= (size1 c= b)			(5.51),
? dom i-vars) H (5.52), (5.53)
User function calls. Perhaps the most interesting rewrite problem is the code
accumulation for a function call.
(??(f(E : C')) : A,?; C,?, =,tgt, tgt-base,val-base,size,i-vars?)
(5.54)
144
The translation of the active site is
tgt[i0+ (tgt-baseo),.. ,???1 +(tgt-base(n--H1))j
?b1EO:ei			?bmEO:em
=(Ai.(?f)(?IE??Pi)) 0[(Mai)/ai]. [(Map)/apj 0[(Mci)/ci]...[(Mcq)/cq]
=(?f)(?IEj?p 0 [(Mai)/ai]... [(Map)/ap]) 0[(Mci)/ci]''.[(Mcq)/cq]
(5.55)
where a1,...,a? ? (dom size). Assuming without loss of generality that cl ? c2 <
... ? C1C1, we can rewrite this as
v			.			V			tgt[i0 +(tgt-baseo),...,???1 +(tgt-base(n--H1))] =
?b1EO:ei			?bmEO:em			s[(A?ci),...,(?cici)]			(5.56)
where the value of 5 is established by a procedure call f(t;s). If a correct (by
Definition 5.7) compilation of f is available, then we have that (5.56) assigns correct
values as long as
t = T??E? ?p 0[(Mai)/ai]...
This equality is the translation of the code accumulation
=`t,00:IC'I,O,C,Ac.((qcic),(? c)?, ?			(5.57)
as long as (dom size) n A = ?. The derivation is similar to (5.34) for the function
body accumulation (5.26). Combining these observations in a single rewrite yields
((?let t ? lE,=,1,00:IC'I,OC',Ac.((qc?c),(?c)),$?
in let 5 ? f(t;s)
in			V			V			tgtko+(tgtbaseo),...,i??1 +(tgt-base(n--H1))] = (5.58)
??1 EO:ei			?bm EO:em
s[(Mc1),.. .
If (dom size) n A $ ?, we strip to make progress. Let X be the accumulation
in (5.54). Then the complete rewrite rule is
7?X = (domsizen A = ?)			(5.58), (stripbkX)			(5.59)
where (size1a) = bk for any a ? (dom size) n A. This discussion generalizes easily
to function calls with multiple arguments.
145
Optimizing procedure calls. While (5.59) is correct, it contains important op-
portunities for improvement. For a simple example, consider the svexp function
fx = gx
where x is known to be a vector. For this we obtain
f(x;r) ? let t ? iYO:d t[i] x[i]
in lets ?g(t;s)
in iYO:d r[ij = s[i]
when an obviously correct and more efficient alternative is
f(x; r) ? g(x; r)
The remainder of this section is concerned with transformations on proc code that
make procedure calls more efficient.
SimpTh copy elimination. proc code of the form
let t ?			V			...			V
i0EO:d0(t)			in(t)?iEO:dn(t)?i
in B
t[i0, . . . `			= X[ip0, . . . `
with Po, ?Pn(t)--H1 a permutation of .... . , n(t) --H 1 is called a simple copy. It may
be replaced with the let body B, wherein all occurrences of t are replaced with an
appropriately transposed x:
,Pn(t)?1]X
This is immediate from (5.20) and (5.21).
Another form of simple copy occurs when the target of a procedure call is at least
as big as the procedure call result. In this case there is no need for a let to allocate
fresh storage for the result. Consider lets of the following form, which can result
from (5.59):
146
lets ? f(t;s)
in			V			...			V
i60 EO:d0(s)			?bn(8)?1 EO:dn(8)??
tgt[Io, . . . , 1n(tgt)--H1] = skb0, . . . ` ?bn(s)?l] (5.60)
where 1k for 0 < k ? n(tgt) are indexers and the order of V loops may be arbitrarily
permuted. Let ....... 1?n(s)-1 be the subsequence of 16,.. , 1n(t9t)--H1 where 1q, is of
the form ?bk + Bq,, that is, those indexers that include one of the loop variables ?bk,
o < k ? n(s) and an arithmetic expression arising from tgt-base during compilation.
The other indexers contain variables of ancestral loops or have no loop variables at
all. As an optimization, we can replace (5.60) with
f(t; [po, . . . ,Pn(tyt)?l]t9t[Io',			` 1n'(tgt)--Hi])
where
and
Bj:d?(s) if I? = Bj + ?bk
Ii			otherwise
q? if j = qe for some ? and I? = Bj + ?bk
Pi =
j otherwise
That is, rather than allocate a fresh array for the procedure call result, we arrange
for the call to deposit its result in the correct portion of the target.
5.5.2 Rewriting active expressions
The principles for deriving active expression rewrites is the same as for assignments,
and the algebra is somewhat simpler. We present the results without elaboration.
Proj.
?(?iPr? acE,?, vat-base, i-vars??
= ((??E, ?, (remap vat-base Ca), (remap z-vars c a)])
?((??Typ ca(E : A'),?, vat-base, i-vars])
= a e A' (??E, ?, (augmt vat-baseac), (augmt z-varsac)]),
((?IE,?, (remap vat-base a c), (remap ?-vars a
Typ.
147
Extr.
1?((?[Extr00 cE,?, val-base, i-varsj? = ((??E,?, val-base [Ole], i-vars?)
?((?[Extr01 cE,?, val-base, i-vars?)) = ((?[E,?, val-base [(val-basee)+1/c], ?-va??)
Formal parameter references. Let M be as in (5.44). Then we have
1? ((?I(xe : C),?, val-base, i-vars?) = ((? a'[(M ....... ,(M cq)]?
where without loss of generality we assume C = [c1 ? c2 ? ? Cq1.
Reductions.
((?[Red+ c(E : ?),?, val-base, i-varsj? = (? ?[E,?, val-base[O/c], i-va?[k/c]j)
k=O:(? c)
Conditionals.
((??f B then T else F,?, val-base, i-varsj) = ((? if ?B,?, val-base, ?-vars?
then ?T,?, vat-base, i-va??
else ?F,?, vat-base, ?-vars?)
Join.
1?(??Join c(Ei : C, ?) E2,?, vat-base, ?-varsj? = c ? (dom i-vars) (5.61), (5.62)
The two cases (5.61) and (5.62) are as follows. First, we have c ? (dom i-va?); thus
t6=q:c
Let 5 = (?`c) (val-basec) with ?` as in (5.49). Then the rewrite is
A ?[E1,?, vat-base, varsl II A ??E2,?, vat-base, i-vars?)) (5.61)
i?Eq:s--Hq			i?Es:e--H(s--Hq)
For the other case, vat-base selects a value from E1 or E2.
((?ff (vat-basec) < (?`c) then ?E1,?, vat-base, i-vars?			(562)
else [E2,?, vat-base[(vat-basec) --H (?`c)/c], ?-vavs])
148
User function calls.
:			: C,?, ?,val-base,i-vars?) =
((?et t ? ?E, =, t, 0O:IC'I? O,c, Ac.((qci c), (? c)?, ?
in let s ? I(t; s)
in s[(Mc1),.. ,(Mcc)J
where again C = ?c1 <C2 .... <cjc??.
5.5.3 Loop nest optimizations.
Though the compiler as given emits code only when necessary, some of the loop nests
it constructs are inefficient. One example is the following.
Theorem 5.8 The size of C (?? as given is o(2?I)
Proof. For convenience, assume
JoincE1?..E? = JoincE1(JoincE2(...(JoincE??1E?)...))
Then consider the function
fviwi			= Proj7l Red+ 0 Typi 7
Join 0 (Join 1 vi wi)
(Join 1 V? Wn)
It is straightforward to show that this function contains 2? loops. E)
This behavior is not worrisome in that it appears only for pathological inputs. More-
over, it is straightforward to derive replacements for the loop splitting rewrites that
cause the problem. We leave this to the reader; the essence of the revised rules is
in (5.62) and also [Bud88]. Practically speaking, the rewritten rules generate some-
what slower code than splitting, so a compiler implementation would do well to split
loops unless the resulting code would be too big.
A more important problem is that invariant calculations may be repeated need-
lessly within loops. For this, we apply the principle of the classical optimization loop
invariant removaL
149
Example 5.9 Consider the following function
which compiles to7
fooxy = Proj7O
Red+ O(JoinoRed+ Ox
TypO7y)
foo(x, y; r) ?			V			r[i0] = y[i0] + ? x[k]
ioEO:do(r)			kEO:d0(x)
We need compute the summation only once:
foo(x,y;r) ? let t ?			= ? x[k]
kEO:d0(x)
in			V			r[i0] =y[io]+t
i0EO:d0(r)
E]
In general, if we have a proc (ass?gn? term of the form
Aj P where p contains ? E
k
If E does not contain references to i, then we can write this as
?lett ?t[j=Einp'
where p' is p with instances of ?k E replaced by t.
We need not defer this optimization until after compilation. We can modify
the rewrite rule for Red+ to inspect and transform its input code accumulation as
desired.
Generalizing once more, we can modify all rewrite rules that potentially generate
code for expensive operations that can be moved farther out in the loop nest. Another
example with clear profit is to transform procedure call code
??Plett ?E1 inlet s ?f(t;s)inE2
7We have applied some algebraic transformations, i.e. eliminating loops that ex-
ecute exactly once.
150
If i does not appear in E1, and E2 is an assignment we can rewrite this as
?plet t ? E1 in lets ? f(t;s) in VkE2
Other examples include choosing evaluation order of procedure call arguments to
obtain efficient nestings and moving the evaluation of conditional predicate code
outward. The latter may involve changing a conditional expression to a conditional
assignment.
5.6 Perspective
Let us take stock of the discussion thus far. Our compiler translates the high level
manipulations of typical component spaces in svexp to a single assignment language,
proc. The techniques given here code accumulations and algebraic compiler devel-
opment using translations--Hare fundamental for s-vals in the same sense that e.g.
closures and various virtual machine models are fundamental to compiling general
functions. Thus the result of this chapter is not the compiler itself, but the method-
ology that led to it, which should be useful for any language based on s-vals.
Chapter 6
Evaluation In Place
Previous chapters described the functional matrix language svexp and gave a corn-
piler that accepts type-annotated svexp programs and returns compilations to a
single assignment array language called proc. In turn, proc programs are easy to
transform to machine codes for a variety of architectures. They are memory efficient
in certain respects. However, many compiled proc programs still take asymptot-
ically more storage (and copying time) than required for the problems they solve.
This inefficiency and an optimization algorithm to eliminate it are the topics of this
chapter.
6.1 Overview
We first consider an instance where the compiler of Chapter 5 yields code as efficient
as an imperative language compiler. This success stems from two desirable properties
that careful design of the compiler has provided:
o+ It allocates storage only for a single inner matrix of each user function call
argument and result.
151
152
o+ It detects opportunities to eliminate these allocations when the storage serves
as an unnecessary copy of a portion of the caller's argument or of the result
value.
For example, these provide good results with the following function to reverse a
vector:
rev x = if (NumOf x 0 1) then x
else Join 0 (rev Extr01 0 x) (Extr00 0 x)
This compiles to the following proc code. Assume x :
rev(x; ?) ? if n = 1 then
jEYn ?k]=xk]
else rev(x[1: n?1]; r[O: n--H1]) A r[n--HlJ = x[O]
Recall that we can pass arguments by reference (that is, as pointers) in proc due to
its single assignment property. Further, the recursive side of A is tail recursive if we
evaluate it last. When tail recursion is removed and n = 1 propagated through the
true branch of the if, what remains is an efficient and concise code for reversing x
into r. Using a C representation for arrays as in Figure 5.2 and Table 5.1, we obtain
void rev(x, xs, di, r, rs)
? start:
if (d1 == 1)
r[Oj =
else f
r[rs * (d1 - 1)j = x[Oj;
x = x t xs; dl = dl - 1;
goto start;
11
This example is noteworthy because comparable code to reverse lists is well-
studied, owing to its notorious inefficiency. For instance, this Common Lisp code
requires 0(n2) time and space.
153
(defun rev
(if (null (cdr x))
x
(append (rev (cdr x))
(list (car x)))))
Replacing append with the destructive equivalent nconc gets the space requirement
down to 0(n), but time is still 0(n2). Moreover, the programmer must then remem-
ber that the original (unreversed) value of x is destroyed by the call (rev x).
6.1.1 The remaining problem
Unfortunately, the compiler as it stands is still not good enough. Many svexp
functions compile to code that is asymptotically less efficient in space and run time
than natural imperative code for the same purpose. The problem is rooted in the
single assignment property of proc that until now has made things simpler.
Example 6.1 Consider the following function.
gx --Hif NumOfxOO then x
else Join 0 (Extr00 0 x)
Proj7O(tl
TypO7(?Pr?7O(s1
Typ 0 7(Extr01 Ox)))
This accepts an inner fl-vector x and laboriously adds 2k to element k for 0 ? k < n.
With the compiler and optimizations of Chapter 5 we obtain
[zi
g(x;r) ?ffn=0then iNn r[? =xkl
elser[0] =x[0]Alett ? iEoVm?l tkj = 1+x[i+1]
in let s ? ?(t; s)
in V r[? + 1] = 1 + ?kJ
iEO
154
What has gone wrong? There are no simple copies in this code. The compilation
executes n --H 1 recursive calls, each with a computation into 0(n) fresh storage. Thus
the total cost of storage is 0(n2). The simple copy elimination of Chapter 5 was a
step in the right direction, but we must replace it with something more powerful.
6.1.2 A solution
Example 6.1 shows that the single assignment property of proc is an obstacle to
efficiency that must be overcome. The crux of the problem is that the asymptotic
storage performance of a proc program with no reductions and where simple copy
elimination does not apply cannot be better than the time performance of the un-
derlying algorithm. For instance, a 0(n2) sort algorithm coded in proc needs 0(n2)
storage. The example above shares this malady.
On the other hand, if multiple assignments to the same array element were al-
lowed, we could rewrite Example 6.1, hence many other procedures, to compute
the same function as before rewriting, but using less storage, 0(n) in the example.
This observation is the basis for the approach presented here for improving storage
efficiency of proc programs. We proceed in outline as follows:
o+ We lift the single assignment restriction of proc by extending proc to a very
similar language, proc2, which allows two or more subarrays to share the same
storage. An explicit ass?nment order is necessary to make proc2 useful.
o+ llence the goal is to transform a proc procedure f to a proc2 procedure f' that
is extensiona11? equivalent to f (we say f =--H f'). That is, the call f(a1,... , a?; s)
terminates if and only if f'(a1,. .. , a?; s') terminates, and s=s',
o+ The assignment order for proc2 admits a succinct sharing condition that de-
scribes when two arrays in f can share storage without changing the result
computed by f.
155
o+ A test of the body syntax of f conservatively indicates if the sharing condition
will certainly be met for a given array variable a and subarray b. llence we can
decide in many useful cases at compile time that a and b can share storage.
o+ The sharing condition also leads to an algorithm to deduce a small set of pairs
as above that are likely pwspects for storage sharing. On finding a pair
that passes the syntactic test for the sharing condition, f in proc or proc2
can be transformed to f' E proc2, f' = f, where f' has storage performance
generally better than f.
6.2 proc2 sharing syntax
We define proc2 procedures to be transformations from proc procedures:
(sharek f)
proc2 = L)
k>O
fEproc
where, roughly speaking, (share f) is the set of procedures where two variables in f
share storage.
More precisely, consider all pairs [bIa] where a is an array variable in f and b is
a subarray in f of the same dimension as a. Then share(f) contains all programs f'
constructed from f by replacing a with b and f with f' throughout f. Further, if a
happens to be a let--Hdefined variable1 then replace
let a ?
with simply			let ?
This shows that the let no longer allocates a fresh array, but redefines elements of b
instead. We call this a sharing transformation and write it
= f [b/a]
Example 6.1 provides a useful illustration of sharing transformations. First, let g' =
9 [x/rj to obtain the proc2 program
1We assume without loss of generality that let-defined variables are distinct.
156
V x[i]=x[iJ
g'(x;x) ?ffn=0then ?EOm
else x[Oj = x[Oj A let t ? iE(k--H1 t[i] = 1 + x[i + 1]
in let s ? g'(t; s)
in V x[i + lJ = 1 + s[i]
iEO:n--H1
(6.1)
When a procedure's return array shares storage with a formal parameter like this,
we say the procedure operates in place. When e.g. g' operates in place, a procedure
call g'(t; s) where s ? t writes its result in array t and implicitly copies it to s. This
behavior is also consistent with sharing of the return array with a proper subarray of
an parameter, e.g. if procedure p has a parameter x : m x n and return array ? :
sharing such as p [x[O, O:nJ/rj is appropriate, and the call p(t; s) computes the result
into t[o, 0:n], then copies this subarray to s.2
For code generated by the svexp compiler, calls to in--Hplace procedures are always
opportunities to share the call's result array with the respective argument subarray
by transformations. In this case, let g" = g' [t/s] to obtain
g"(x;x) ?ffn=0then jEVon x[iJ=x[i]
else x[O] = x[O] A let t ?
V
iEO:n--H1
in let ?11(t;t)
in iEoVm?l x[i+1]=1+t[i]
t[i] =1+x[i+1j
(6.2)
For calls of the hypothetical p above, p(t; s), the transformation is [t[0, O:n]/s].
With these, the implicit copies described above are avoided.3 Finally, let g"' =
g" [x[1:n --H 1]/t] so we have
2However, consider the inverse situation: procedure q has an n-vector parameter
and an m x n result. Then it is desirable to share a parameter with a portion of the
return array, i.e. q' = q [r[O, O:nJ/xJ. A call q'(t; s) implicitly copies the argument t
to s[O, O:nJ, then computes the result in 8.
3However, consider the similar transformation [s[O, 0:n]/t] applied to any pro-
cedure calling q' as in the previous footnote. Since such calls are invariably of the
form let t ? ? let 8 ? q'(t;.)?...., we find the sharing transformations defined
if n = 0 then
157
V x[iJ=xki
iEO:n
elsex[0J =x[0]Alet iE($n--H1 x[i+1] =1+x[i+1]
in let 9m(x[1:n --H 1];x[1:n --H lj)
in V xk+1]=1+xk+1]
iEO:n--H1
(6.3)
The underlined expressions are simplifications of x[1:n --H 1J[i]. Simplifying further by
the elimination of "dead code", we obtain
g"'(x; x) ? if n = 0 then ?do nothzngl
else let iEoV:n?l x[i + 1] = 1 + x[i + 1]
in let g"'(x[1 : n --H 1]; x[1 : n --H 1])
in iEoV:n?l xk+1J=1+x[?+1J
(6.4)
What do these transformations achieve? In the interpretation of proc2 given
below, this procedure computes the same result as the original, but the answer ap-
pears in the storage originally provided for the argument. Rather than the 0(n2)
storage needed by the original procedure, this one uses only 0(n). We contnved this
example to show that the strong link between run time and storage requirement for
proc discussed in Section 6.1.2 has been broken. Indeed, the transformed program
still has the 0(n2) run time of the original. However, there exist many other codes
whose run times are improved by sharing transformations because sharing reduces
overhead due to copying.
here to be inadequate. In the example case we obtain let ? [s[0, O:nj/t] in let s ?
q'(s[O, O:n];.)?..... Thus the outer let assigns to array .9 outside the let that de-
fines s. For a remedy we could extend sharing transformations to shift the allocation
of s to the outer let as necessary, i.e. let s ? ? [s[O, O:n]/t] in let q'(s[O, O:nj; .......
This kind of sharing is important and useful in practice. For simplicity, however, we
refrain from carrying the extended transformation through succeeding discussions.
158
6.3 Assignment ordering
The interpretation of proc2 programs is identical to proc in both declarative and
imperative senses, except for one serious complication. The single assignment prop-
erty of proc leaves no doubt as to the value of each array element. We need only
identify the single assignment that defines the element's value an "=?? construct,
procedure call, or the implicit definition of formal parameters. This imposes a partial
evaluation order on proc terms--Han array element must be assigned before its value
is used; a conditional predicate's value must be known before the value established by
the branches. Such implicit orderings are conventionally called data flow constraints.
Conversely, as e.g. (6.4) shows, proc2 allows multiple assignments to the same array
element. We must determine which assigned value is "seen" by each reference.
Example 6.2 Consider the proc procedure
h(x;r) ? let t ?			V			tk] = x[i] +1
iEO:d0 (x)
in			V rki = t[iJ
iEO:d0(x)
An evaluation order for array elements satisfying data flow constraints, (after the
implicit initial assignment of values to x when h is called) is
t[O] = x[O] t 1, r[OJ = t[o], t[1] = x[1] t 1, r[1] =
simply because the value of each right hand side of = is defined. Any data flow
order leads h to return the same answer because every use of an array element value
corresponds to a single assignment.
llowever, consider h' = h [t/x], i.e.
h'(x;r) ?let V x[i] =xfij+1
iEO:d0 (x)
in			V r[i] = x[iJ
iEO:d0(x)
159
Here the assignments corresponding to the above are
x[OJ = x[Oj + 1, r[OJ = x[O], x[1] = x[1J + 1, r[1J =
and this order yields the same result as the original procedure. However, datafiow
alone permits other orders that do not yield the same result. For instance the alter-
native order
r[O] = x[OJ, x[O] = x[O] + 1, r[1J = x[1], x[1] = x[lJ + 1, ...
has all right hand sides defined, but yields a different result array r. L]
Thus data flow is not sufficient to order assignments in proc2 programs as it is for
proc.
We eschew a deeply formal treatment of this difficulty, which involves complete
operational semantics for proc2, e.g. rewrite rules for terms or a machine model.
From these we could ascertain which assignment determines the value of each array
element as evaluation proceeds. Instead, we specify assignment order directly in
terms of proc2 syntax as a fait accompli. It is a tedious but straightforward exercise
to show this order is representative of many underlying operational models.
A few terms, definitions, and conventions are required. We write a[? for a scalar
element of a; ? stands for an appropriate sequence of natural numbers, and 1(a) is
the set of all such sequences. Thus ?a[?] I ? ? 1(a)) is the set of all scalar values
in a. Further, ?, p, ? continue to denote fragments of proc (and now proc2) so that
[?a[p] denotes some subarray of a. For a generic subarray we write [.]a[.]. Recall the
leading brackets are for a permutation part, which is omitted when unnecessary. A
target reference to array variable a is a subarray [.]a[.], on the left hand side of =
or serving as the return value array of a procedure call or as a formal parameter
in the header of a procedure definition. We assume target references have distinct
labels and write a superscript over the array variable, e.g. ak, to distinguish one from
160
another. References maintain their labels through sharing transformations. E.g.
akki transformed by [[?t[?]/a] becomes [?lk[?][zj. Formal parameters all have the
label 0 by convention. Target references assign array element values. For convenience,
formal parameters in the procedure header stand for the assignments to arguments
performed by the procedure caller. Each element assignment is considered to be an
entity called a definition or def, written ak[o?] to denote the def of a[o?] at target
reference with label k. Symmetrically, all subarrays that are not target references
are source references, including the result array in the procedure header, which has
the label ? by convention. Source references use element values defined by target
references. The result array in the procedure header stands for all uses of the result
by the caller. Each use is considered to be an entity, written a [?] to denote the use
of the value a[?] at the source reference labeled ?.
We can now characterize the sets of defs and uses that can possibly arise from a
proc subterm.
Example 6.3 Consider the proc (or proc2) subterm
The defs arising here are
The uses are
E]
iEV3m tk[i] =a?[i]+am[i?3]
a?[n +2], am[O],. . . , am[n --H 1]
Figures 6.1 and 6.2 more formally describe defs and uses of general proc2 subterms.
The functions defs and uses each transform a subterm to an expression whose con-
ventional interpretation is the desired set. We use (defs ?) interchangeably for the
expression and (when ? has no free variables) the set it denotes. Moreover, the
161
deft(f(a1,...,ar) ?? = defs(?u a47?j??I(a?),1<z'?<n
defs(iY6:e? = defs(?)
defs(let ? in ?) = defs(?) u defs(?)
defs(if ? then ? else ??) = defs(?) u defs(?) u defs(?3)
defs(+??) = defs(?)udefs(?)
defs(A??) = defs(?)udefs(?)
defs(?? = U defs(?)
iEb:e			iEb:e
defs([.]a[.]) =
defs([?ak[p] = ?) =
defs(g(t1,...,tn;sk)) =
Figure 6.1: Defs for proc2 subterms.
(scalar source reference)
udefs(?)
162
uses(f(a1,.. ,a?;r) ? ?) = uses(?) u fr7i I a ?
uses			= U uses(?
?Eb:e			iEb:e
uses(let ?1 ?fl ?2) = uses(?) u uses(?2)
uses(if ?i then ?2 else ??) = uses(?1) u uses(?2) u uses(?3)
uses(+?1?2) = uses(?1) u uses(?2)
uses(A??) = uses(?)Uuses(?)
uses ? = U uses(?)
iEb:e			iEb:e
uses([?ae[p1) =
uses([?a[pJ = ?) = uses(?)
uses (g??,,..,1??fl;s)) =			t?[a] a ? iye.i) 1< i < n
Figure 6.2: Uses for proc subterms.
(scalar source reference)
163
sets are presumed closed for subarray equivalence, e.g. if ak[2] is in (defs?), then
so are a[1:d0(a) --H 1][1J and ak[2:do(a) --H 2][o]. If bk[2, 1] is in (defs?), then so is
[1, o]bk[1, 2]. The equations take the union of defs and uses for the branches of con-
ditionals, so defs and uses represent a superset of the array elements actually defined
and used when the program runs. Similarly, uses includes all elements of procedure
call arguments when, in fact, the called procedure may ignore some or all.
Defs and uses also express the actions of the procedure caller. The first equation in
Figure 6.1 gives the implicit defs of formal parameters by the caller. The first equation
in Figure 6.2 gives the implicit uses of the procedure result. For succinctness, we
often write e.g. (uses!) instead of uses(f(a1,...,a?;r) ? ?). When we discuss inter-
procedural sharing in Section 6.8, we will write e.g. usesf(t1,...,t?;s) to denote
the set of uses (and respectively for defs) that arise from f when the parameters
..... . , a?,r are bound to arguments ..... . , t?,5. It is important to realize, however,
that def and use sets do not depend on the values of these arguments, but only on
their dimensions.
To specify which of multiple defs is "seen" by a given use, we next impose a
partial order ?? (omitting the subscript when it isn't important) on the defs and uses
of f. Figure 6.3 gives a function 13 that transforms a function definition f without
procedure calls to a set expression for the <?f relation. It uses the abbreviation
(ud?) for (defs?) u (uses?). Further, if x and Y are sets containing defs and uses,
then X ? Y stands for fx y x ? X,y ?
Intuitively, ? characterizes precedence of uses and defs during evaluation. Equa-
tion (6.5) closes the relation for the body and puts implicit defs of formal parameters
before uses and defs in the procedure body, which in turn are before implicit uses
of the procedure result. Equations (6.6) and (6.10) gather precedence pairs due to
loop iterations. They impose no order between different iterations. Equation (6.7)
164
B(f(ai,...,a?;r) ? = (B(? u (faffi7? ? ? I(a?),1 <i < nJ ?
u (ud(?)  fr->?[? ? ? I(r)1))* (6.5)
u B(?)			(6.6)
iEb:e
 ud(?)) U L?(?) u B(?)			(6.7)
?(let ? in ?) =
B(if ? then ? else ?) = (ud(?) <? ztd(?, ?))
U B(?) U 13(?) u 13(?)
= B(?)uB(?)			(*is+orA)
?			=			u ????
iEO:e			iEO:e
= ?) =
(scalar source reference)
?
Figure 6.3: Intra-procedural use and def ordering for proc2.
(6.8)
(6.9)
(6.10)
(6.11)
(6.12)
165
puts all uses and defs in the head of a let before those in the body. Equation (6.8)
puts all uses and defs in the condition before those in the branches. Equation (6.9)
accumulates pairs from its arguments. It imposes no order on argument evaluation.
Equation (6.11) shows that source references impose no order. Equation (6.12) puts
all uses on the right hand side of assignments before the def on the left.
Similarly, if v1, v2 ? (ud f), we say vi ?f v2 if v1 and v2 arise in opposite branches
of the same conditional.
We now state a condition sufficient to give useful semantics to a proc2 proce-
dure f.
Definition 6.4 The def a4-k[c] defines the value of the use a%\ in f if (and onlg if)
both the following hold:
(a) Definition occurs before use, i. e. ak [ot] ?
(b) Other defs of a[ot] are not interposed between the def and use of interest, i.e.
for aUa?m[ot] ? (defsf), m $ k, we have
4-			4-
am[ot]ak[ot] or am[ot]*ak[ot] or a?[o]?am[ot] or aC[ot]*am[ot]
When this is true, we write
fflj?T?t&j
Thus for Example 6.2, (6.7) implies that each assignment of form x[i] = x[i] + 1
defines the value of the use in r[i] = x[i]. From (6.5), the implicit def of formal
parameters defines the use of x[i] = x[i] + 1 but not r[i] = x[i], so the problematic
evaluation order
r[O] = x[O], x[O] = x[OJ + 1, r[1] = x[1], x[1] = x[1J ......
no longer has right hand sides defined.
166
6.4 Extensional Equivalence
The next step is to distinguish correct sharing transformations from incorrect ones.
Definition 6.5 Let f be a proc2 procedure f(a1,... , a?; r) ? ?, and let ft = f [b/a].
We say f' is extensionally equivalent to f (i. e. f' = f) iffor all arguments ..... . , tfl
of suitable dimensions, the procedure call f(t1,... , t?; s) terminates if and only if
t?; s') does, and on termination 5 = 5'. Furthermore, if f' =--H f, then
f [b/a] is correct.
Note f' may change the value of an argument array that f does not, as in (6.4).
Unsurprisingly, a sharing transformation is correct if it preserves the definitions
seen by all uses.
Theorem 6.6 (Sharing correctness) Let f' = f [b/a]. Then f' = f if the defs
of a and b elements that define values of uses in f define the values of the same uses
in f', i.e.
VQk,?
VQk,?.b4\?ffi?
and
implies 41L' %j (6.13)
implies b<m[? % ffi%J
Proof. The proof is by elementary structural induction over the code of f and f'. El
6.5 The Sharing Condition
We can now write a sufficient condition for correct sharing transformations.
Theorem 6.7 We have f = f [b/a] if all the defs and uses of each element of a are
complete before any defs of the corresponding element of b; i. e.
E (udf),b<m[? E (defsf) . (a%\ <?f b%\ or a%\ *j b<-?[?])			(6.14)
where a%\ is any use or def from (udf). Alternatively, f = f [bla] if all defs and
uses of each element of b are complete before any defs of the corresponding element
167
of a:
vb4\?(u?n,a%\?(?efs?).(?4\1a<m?ajor?<Th>toi*ja+Utai?			(6.15)
Proof. We will show only that (6.14) implies f = f [bla]; (6.15) is symmetric. It is
sufficient to show (6.14) implies (6.13), where f' = f [b/a].
Consider the first clause of the conjunction in (6.13). Part (a) of Definition 6.4
requires that
<? fflaj when a% ?
This follows from an easily verified property of the definition of :
?(f [b/a]) = ?(f) [b/a]			(6.16)
Part (b) of Definition 6.4 requires that if for all a4-m[a] c (defs f), m $ k,
4-			4-
am [a] ?? ak[a] or am[a] *? ak[a] or a' [a] <?f am [a] or aw] *j am [a]
then it is also true that for all b4-m[a] ? (defs ft), m $ k,
4-			4-			4-			4-
bm[a] ?i bk[a] or bm[a] ?f, bk[a] or b'[a] ??, bm[a] or b'[a] *?` bm[a]
(6.17)
Consider that b4-m[a] c (defs f') fall naturally into two classes:
1. Those where a4-m[a] ? (defs f), i.e. those defs of b's elements in f' corresponding
to defs of a's elements in f. Here, (6.17) follows by (6.16).
2. Those where b4-m[a] ? (defs f), which are defs of b's elements in f' that are also
in f. Here (6.14) implies
Vb%\ ? (udf),b%j ? (defsf') . (b%\ ?? b%\ or b4\ *ji b4\) (6.18)
again by (6.16), which yields the desired result because it implies
&a]?bTha]
168
The second clause of the conjunction is similar to the first. Part (a) of Defini-
tion 6.4 requires
b?[?] when
which is immediate.
Part (b) of Definition 6.4 requires that if for all b? ? (defs f), m $ k,
#1 ? bffi? or 4] *f ffiaj or bffl] f yl or 4? *1yi
then it is also true that for all b?7j ? (defs f'), m $ k,
bm[Q] ?f1 bk[? or b??] *fl bk[c] or b'[? fl b?[? or b'[? *?1 b?[?
The same two classes of defs b? E (defs f') are important here. For 1. above,
(6.18) implies be[?] *?? b?; for 2. the result follows from (6.16). c
We need a stronger version of this theorem for the general case of a proc2 pro-
cedure f(fl) obtained by n sharing transformations of a proc procedure f(O). If x is
some variable occurring in f(fl) (and therefore f(O)), let ancestors(x) be the set of
variables in j(O) sharing storage with x in f(n) (including x). Further let labeTh(y, f)
be the set of reference labels associated with variable y in f.
Corollary 6.8 Let f(O) be a proc procedure and for any n > 0, let f(i+l) =
f(i) [bj/a?] for all 0 ? i < n, where each of these sharing transformations is cor-
rect by recursive application of this corollary. Finally let f(n+l) f(fl) [b/a]. Then
f(n+l) = f(n) if for all variable pairs (x, y) such that x ? ancestors(a) and y ?
ancestors(b) we have
Vk ? labels(x,f(0)), ? labels(y,f(O))
va%\??u??),?4\e??e?s??. (a%\?b%\ora<m[o?]*b4\)
or else
Vk ? labels(x, f(O)), ? ? labeTh(y, f(0))
Vb? ? (udf),a%\ ? (defsf) . (4? ? a+V[? or 4Qj *
(6.19)
(6.20)
169
Proof. The proof is similar to Theorem 6.7 with the additional observation that if
#4 #4
a[?] a [a], then k and C must label the same variable in f(O). Moreover, if
b<m'[a] ??(n) b [a] then k' and ? again label the same variable in f(O), but that variable
is different from the one labeled k and ?. L)
This corollary removes constraints on the order of successive sharing transformations
imposed by Theorem 6.7. For instance, (6.3) is correct by Corollary 6.8, but not by
Theorem 6.7. The problem in the latter case is that x is defined by both the first and
last operations (with respect to ?) of the procedure body. Hence Theorem 6.7 is
inherently powerless to prove correctness of sharing involving x. On the other hand,
Corollary 6.8 exploits that fact the all uses and defs of t are "between" (again with
respect to ?) all the uses and defs arising from x in 9 and all those arising from r
in 9. Thus defs of t cannot interfere with any values seen by uses of x and vice versa.
6.6 A Syntactic Sharing Condition
Corollary 6.8 leads directly to an algorithm. Let S be a predicate on a pair of
syntactic references with labels k and ? such that
s(ak,b$f) implies va%\ ? (udf),b%\ ? (defsf) . (a4\ ?			or a$\ ? b4\)
That is, S employs the syntax of a pair of references to establish, insofar as those
two references are concerned, that sharing storage for a and b is correct. To fully
establish correctness, the algorithm of Figure 6.4 considers all pairs of such references.
It returns correct if the sharing transformation f [bIa] is correct, and fail otherwise.
It remains to develop a useful S. Since sharing is advantageous, it should be as
strong (true on as many arguments) as possible.
Theorem 6.9 (Disjoint ranges) Consider the sets
Ik?taIa#4[a]?ud(f)y and			uses(f)J
170
Algorithm: Correctness of sharing transformation f [bIa]
Inputs: proc2 procedure f, array variable a, and subarray b in f.
Output: One of icorrect, fail 1
Method: If for all variable pairs x, y such that
x ? ancestors(a) and y ? ancestors(b) we have
V k ? labels(x),  ? labels(y) . S(ak, be, f) or
V k E labels(x), ? ? labels(y) . S(be, ak, f),
then return correct.
Else return faiL
Figure 6.4: Algorithm for sharing correctness.
Jf1kfl1e??, then S(ak, be, f).
Proof. In this case the quantifier in the definition of S is vacuously true. E]
We can determine when this theorem holds by examining the syntax of references k
and C and the iteration ranges of loops whose variables occur in the references. This
often involves submatrix algebra. For instance, suppose
and f contains references
be --H [1,0Jte[1 : n,0 : nJ
t?[i0,l] and ak[i1,i0]
where e' is a target reference. We can rewrite ? as
([1, 0]t?[1:n, 0:n])[0, i0]
so we have S(ak, be, f) as long as the iteration range of i1 does not include zero. The
simplest cases of the theorem are summarized in the following.
Corollary 6.10 (Irrelevant references) Suppose k ande' are reference labels in f,
and a and b are subarrays of array variables in f. Then if reference k is to an array
171
variable other than that of a or if reference  is to an array variable other than that
of b, or if ? labels a source reference, then S(ak, be, f).
The other cases arise from the definition of Figure 6.3. In particular, we seek
theorems of the following flavor,
Theorem 6.11 (Let) Suppose the body off is of form
?let?inp
If k is a reference label in ? and  a target reference label in p and ? contains no
loops with variables occurring in these references, then S(ak, be, f).
Proof. From (6.7), if the references involve no variables of surrounding loops, then
all uses and defs of the let header are related by <? to all defs of the body. ?
Likewise, from (6.12) we obtain the following due to the order imposed by
Theorem 6.12 (Assignment) Suppose the body of f is of form
where ? = ?. Then S(ak, be, f).
Proof. Follows from (6.12). ?
Applying this theorem may again involve elementary submatrix algebra. For in-
stance, suppose be --H [1 oJte[l : n, 0 n] and f is of the form
..t?[io,ii + 1] = ak[ii,ioj
This is the same as
([1,o]t?[1:n,o:n])[ii,io] = akki,ioJ
so we have S(ak, be, f).
The definition of * leads us similarly to the following.
172
Inputs: proc2 procedure f, subarrays a and b, a reference labeled k, target reference
labeled ?
Output: A value suitable for use as S(ak, be, f).
to determine if the conditions of Theorem 6.9,
are met. If so, return true. Otherwise, return
Method: Test the body syntax of f
6.11, 6.12, 6.13, 6.14, or 6.15
false.
Figure 6.5: Algoritlim for S(ak, be, f).
Theorem 6.13 (Conditional) Suppose the body off is of form
?ff?thenp?elsep?
If k and  label a reference and target reference respectively, where (a) k occurs in PT
and ? in PF or vice versa, or else (b) k occurs in ? and C occurs in either PT or PF,
and neither reference involves variables of loops in ?, then S(ak, be, f).
Proof. If references k and e, do not depend on surrounding loops, then all pairs of
uses and defs arising from them, one from each, are related by * in case (a) and <?
in case (b) by (6.8). E]
Another fundamental case is this.
Theorem 6.14 (Parameter--Hresult) For all reference labels k and subarrays a
and b, we have S(a0, bk) and S(ak, b0).
Proof. Follows from (6.5). ?
The algorithm of Figure 6.5 employs Theorems 6.9, 6.11, 6.12, 6.13, and 6.14 (also
6.15 given later) to compute a suitable 5.
173
6.7 Strengthening ?
The definition of  given in Figure 6.3 is consistent with parallel evaluation of e.g.
the left and right hand side of A. In general, we obtain more opportunities for sharing
if the proc2 machine operates sequentially. For instance, the svexp fragment
Join 0 (Extr00 0 Extr01 Ox)
(Extr01 Ox)
might compile to (with x : n and r :
r[Oj = x[1] A ViEl:n--H1 r[i] = x[i]
(6.21)
Sharing of r and x is not correct by the definition of  in Figure 6.3 and resulting
definition of S. The right hand side may define element 1 of the shared vector before
the left side uses it. However, if we strengthen ? so that uses and defs of the left
hand side of A are before (with respect to ?) the right hand side, then sharing is
correct. Similar opportunities for sharing are exposed by ordering the iterations of +,
V, and ?.
There is in general no a priori reason to prefer one order over another in these
cases. For instance, a symmetric variant of (6.21) admits sharing only if A evaluates
its right operand first. In outline, then, a useful extension of the algorithm presented
here determines evaluation order as a side-effect of applying S. That is, we modify s
to return both its predicate value and a caveat that imposes an order of operand
evaluation on an occurrence of +, A, V, or ?. Caveats are incorporated in the final
compiled code, which is necessarily an extension of proc2 with sequential versions
of these operations.
Of course, mutually contradictory caveats can arise. Consider an alternative
vector reverse code that swaps the end-most elements and calls itself recursively to
swap the rest.
174
This compiles to
rev x = if (Num0f 0 x) < 1 then x
else Join 0 (Extr01 Ox)
Join 0 (rev Extr01 0 Extr11 Ox)
(Extr00 0 x)
rev(x;r) ? if n ? 1 then jEYn r[i] = x[ij
else r[O] = x[n --H 1] A rev(x[1:n --H 2]; r[1:n --H 2]) A r[n --H 1] = x[O]
where x : n. There is no order of A operand evaluation that will satisfy the caveats
resulting from r[O] = x[n --H 1] and r[n --H 1] = x[O] to allow sharing of x and r.
However, observe that sharing of these vectors is possible at the cost of only a scalar
copy. That is, the compiler can eliminate the use of e.g. x[O] in the last A operand
by copying x[O] to a "temporary:"
rev(x;r) ? ifu < 1 then jY0n r[i] = x[iJ
else let t ? t = x[O]
inr[O] =x[n--H1]Arev(x[1:n--H2];r[1:n--H2])Ar[n--H1]=t
Now a caveat that r[O] = x[n --H 1] be evaluated before r[n --H 1] = t is sufficient for
correctness of the following sharing transformation:
rev(x;x) ? if n < 1 then ?do nothzngj
else let t ? t = x[Oj
in x[O] = x[n--H 1] A rev(x[1:n--H 2];x[1:n --H 2]) Ax[n--H 1] = t
If the compiler notes that the recursive call to rev can be executed last to make the
procedure tail-recursive, then this code is as good as any we can write.
6.8 Interprocedural Sharing Analysis
Recall that in Figure 6.3 we omitted the definition of ? for uses and defs arising
respectively from the arguments and result of a procedure call. Thus the discussion
175
above is restricted defacto to intra-proceduralsharing: if a and b appear anywhere
in f as argument and result of the same procedure call, we are not yet equipped to
decide if f[b/a] is correct.
Fortunately, there is nothing in the sharing condition and associated algorithms
that impairs them for inter-procedural sharing. We need only specify ? for procedure
calls. Trivially, we could supplement Figure 6.3 with
13(g(tkil,...,tknn;sC)) =			(6.22)
but this says only that the sharing transformations involving any of the tj and s
are incorrect. As in Section 6.7, we want to make ? as strong as possible to allow
sharing as `often as possible.
6.8.1 The nonrecursive case
What canbe said of the order of uses and defs arising from a procedure call? Begin
with the the non-recursive case. Since proc2 arguments are passed by reference, the
arguments tj and result s of a call as in (6.22) are identified with parameters a? and
return array r respectively in the body of g. Thus we can usefully say (and given
operational semantics for proc2 we would prove) that
-4
tkii[at] ??
whenever all uses of a?[ot] are before all defs of r[c?] in g, i.e.
Va?[ot] ? usesg(ti,...,tn;s),??[os] ? defsg(ti,. .
a?? [ot] ?g T?[os] or a?? [0t] *g retos]
The appropriate supplement to Figure 6.3 is thus
13(9(1k1i,.
,tnkn;s)) =
f1? [0ti <?j
Ms]
176
vaTh?[?t] ? usesg(ti,...			? defsg(ti,... ,t?;s)
,tn;s),r?[Qs]
atk.i[?t] 9 re[os] or atk.i[oLt] *9 rC[Qs]1			(6.23)
In essence this says that an argument array of a procedure call in f is sharable with
the corresponding return value array in the same manner that the corresponding
parameter in 9 is sharable with 9's result array. This provides a useful extension
for 5.
Theorem 6.15 (Argument--Hreturn) Suppose the body off is of the form
?g(... ,ak[?a],. ..
with g defined as g(... ..... . r) ? ., where parameter x corresponds to argument a.
Further suppose there exists a subarray functional [?] such that
a[?] [? = a[?]			(6.24)
Then S(ak, be, f) if Vk,?. S(xk[?],r',9). Similarly, if there exists a subarray func-
tional [p] such that b[?][p] = b[?a], then S(ak,b?,f) ifVk,?. S(xk,r'[p],g).
Proof. If Vk,C. S(xk[?],r',9), then by (6.23), S(ak[?][?J,b?[?],f) and from (6.24),
S(ak[?], b?Kb], f). Since this relates all use--Hdef pairs a%\, b%\ in ud(f), we also
have S(ak, be, f) as desired by the definition of 5. The other case is symmetric. El
6.9 The Recursive Case
It is fortuitous that equation (6.23) and Theorem 6.15, which we obtained by con-
sidering non-recursive calls, remain useful in the recursive case, where 9 = f. Here
we have procedure f defined by f(a1,... a?; r) ? . containing calls of the form
f(t1,... ,t?; s). Examining Equation (6.23) and Theorem 6.15, we find they remain
meaningful except when for some i, tj is a subarray of a? and also 5 is a subarray
of r, which makes them circular.
177
The limited nature of this exception is important because the proc compiler,
sans the simple copy elimination optimization of Section 5.5.1, never returns code
where this occurs. That is, every procedure call is of the form
let t1 ? ? A ... A tTL ?
in let s ? f(t1,...
in ...
where the tjand sdo not appear elsewhere in the procedure. Thus, the final algorithm
orders transformations so that sharing of let-defined argument and result arrays like
the tj and s is performed before sharing of these arrays with parameters and return
arrays. In particular, if f can be transformed to operate in place as in (6.1), we can
immediately transform all recursive calls to f so that the respective argument and
result arrays share storage as in (6.2).
6.10 A Sharing Optimization Algoritlim
Where do we stand? The algorithm of Figure 6.4 conservatively indicates when the
sharing transformation f[b/a] is correct. Furthermore, we have an algorithm to com-
pute a usefully strong S, the required subroutine. A complete sharing optimization
algorithm need only compose correct sharing transformations. Let a sharingprospect
for f be a pair [bla] where a is an array variable and b is a subarray of the same
dimensions involving an array variable in f. Suppose we have an algorithm P that
constructs a set of sharing prospects Pf from the syntax of f. Then the function
optimizein Figure 6.6 is an algorithm that returns the set of all correct compositions
of sharing transformations induced by P.
6.11 Deducing prospects
It remains to give an algorithm P for deducing a set Pj of useful sharing prospects
from the syntax of a proc2 procedure f. Here "useful" means "likely to be correct."
178
Inputs: proc2 procedure f and sharing prospect algorithm P
Output: Set of all sharing optimizations of f possible with P.
Method: Return opt?mize(f, P) defined as follows:
optim?ze(f, P) =			= ?)			ff1,			U optim?ze(f [b/a])
[b/ajEC
where C = f [b/a] E Pf Fig. 6.4 returns correct on f, a, bi
Figure 6.6: Algorithm returning all optimizations of f given P.
We restrict the search to in--Hplace prospects, those that could result in sharing of
a source array and the target array of some = or some procedure call in f. More
precisely, we seek prospects of the form [[.]t[]/s] or [[.js[J/tJ such that s and t appear
somewhere at least once on the right and left side, respectively, of the same =, or
in the argument and result, respectively, of the same procedure call. While it is not
clear that this is the only way to proceed, there do not appear to be useful classes of
prospects that are missed by this approach.
6.11.1 Memory allocation
In deciding whether a prospect is useful, we must first decide if it is feasible. The
footnote on page 156 discusses one case where a prospect that is otherwise desirable
may be ruled out by concerns other than the ordering of uses and defs. Another
example is the where the transformation f [b/aJ is correct by use and def order, but
there is no a priori relationship between the sizes of the arrays involved in a and b.
For example, consider
sum(x,y) Join 0 [0]
Proj 7 0 (( Typ 0 7 x) + (Typ 0 7 y))
179
For x : m and y: 11, the inner vectors of the result are of length min fm, ?1 t 1, the
vector sum over the common elements of the inner vectors of the arguments, each
with a zero prepended. The compiler returns
f(x,y;r) ?r[O]=OA			r[it1]=x?]ty[ij
(6.25)
Here, as an implementation choice, we can share either x or y with a subarray of r
as in the footnote 011 page 156, or we can arrange for one of the parameters to be
allocated with enough space to receive the result (i.e. one extra element), or we can
give up on sharing altogether. We abstract such details of the way allocation affects
sharing: the predicate feasible(a, b, f), is true only if a can possibly share storage
with b given the memory allocation scheme in effect.
6.11.2 Prospects due to --H
Suppose f is of the form
tk[p] --H			(6.26)
Then subarrays [?]t[?] such that s(se, [?]tk[?]) by Theorem 6.12 (and symmetri-
cally [?3]$[?4] such that s([?]se[?4], tk)) are likely to yield the only correct sharing
transformations involving variables s and 1. Intuitively, this is because the theorem
applies only if each assignment to a target element uses only the source element at
the same index position to compute the assigned value. If the array and subarray
elements do not match in this manner, S is false and the sharing algorithm returns
fail. The sole exception is when (disregarding the symmetric case until we give the
final algorithm) the set
I sY\ c uses(f)J fl fc			? defs(f)J
is provably empty, whereupon Theorem 6.9 applies. This says that the assignments
to [?JtK2] miss the uses of s completely due to disjoint iteration ranges.
180
We are left with the problem of deducing the desired prospects from particular
assignments. From the discussion thus far, we see that the Disjoint Iteration Theorem
(Theorem 6.9) provides no help. However, Theorem 6.12 allows us to deduce, by
simple submatrix algebra, exactly the prospects [[?Jt[?]/sJ such that 5(8C1 [?]tk[?])
by the theorem. Consider the case where s and tin (6.26) are vectors, so p and ?
are each a simple indexer. If one contains a loop variable, then the other must have
the same for the theorem to apply. In particular, suppose we have
?k[T+i] = $t[stiJ
where T and S are the constant parts of the indexers. Then it is easy to see that
we want the prospect [t[T --H S:d0(s)]/s] if T > S and if feasibTh(s,t[T --H S:do(s)])
because
t[T --H S:d0(s)][S + i] = t[O:d0(s)][T + i]
Note that feasible will have to check that t is long enough, i.e. d0(t) > T--HS+d0(s), or
else arrange fort to be allocated with length at least T--H S+d0(s). An example with
T = 4, S = 1, and d0(s) = 5 yielding [t[3:5]/s] is shown in Figure 6.7. We also want
the symmetric prospect [s[? --H T:d0(t)]/tj if 5' > T and if feasible(t, s[S --H T:do(t)])
We can generalize this thinking to 8 and t with arbitrary numbers of dimensions.
Consider a source and target reference of the form
s[Io,. .,In] and t[J0,.. ,Jm]
where the 1k and J?? are arbitrary simple indexers. Intuitively, to find in--Hplace
prospects of the form [[?iJt[?]/s], we must find permutations ? and correspond-
ing indexer lists ? that make 1k compatible with ?k in the same sense as the vector
indexers above for 0 < k ? n. For this we construct the binary relation oc o+,
where p oc q only if
o+ Ip = Sp + i and Jq = Tq + i where Sp and Tq are constants such that Sp < Tq
and i is a loop variable or else
181
proc2 code:
V t[4t?] =s[lti]
iEO:3
t
t[10J
s
S = 1			s[OJ
T=4			3=4-1
t[o]
Figure 6.7: In--Hplace prospect [t[3:5J/s].
o = Sp and Jq = Tq where Sp and Tq are as above.
This makes precise the notion of "compatibility" above. Let z be a 1--H1 map on 0 S
p < n such that p oc z(p). Thus z is a bipartite matching in oc that covers all the
axes of s, and the range of z is a (possibly improper) subset of the axes of t. Let
Let qo, .1.... q? be the range of z as a sequence in ascending natural order. Then
elements of the desired permutation ?1 = .0.... .Pm are given by
z(? if for some ? k =
Pk --H
k otherwise
and the indexers ? = ..... ,Km are given by
Kk = Tz(e) --H Se:de(s) if for some ? k =
otherwise
This are easily verified by submatrix algebra.
The number of bipartite matchings z is liable to be exponential in the number
of matrix axes. However, since real programs rarely deal with matrices of more
than four or five dimensions, this is not an important problem. Moreover, practical
assignments tend to have sparse oc relationships, hence few matchings.
182
6.11.3 Prospects due to non-recursive calls
At a non-recursive call ..... , be,. .. ; ak), where a and b may be subarrays due to
earlier transformations, we wish to find opportunities to share storage for the ar-
rays named in a and b. That is, want to establish which, if any, ?I, ? satisfy
s([?]be[?], a?, f).4 From Theorem 6.15, a sufficient condition for this is that g
operate in place. That is, for any transformation in optimize(g, P) of the form
o+, where x is the formal parameter corresponding to b
above, we can be sure s([?]be[?], ak, f) at a call of 9(i), Hence [[?]b[?2]/a] is an
in--Hplace prospect if a is an array variable. When a is a proper subarray, we solve for
the desired prospect by submatrix algebra much as for = in the previous section.
6.11.4 Prospects due to recursive calls
In applying the reasoning of the previous section to recursive calls it should come as no
surprise that we encounter a circular relationship: we must know that f transforms
to operate in place in order to find prospects so that Figure 6.6 can eventually
transform f to operate in place.
Once again, there is a way to unravel the circularity. The key is to note that if f
terminates, then along one or more conditional paths through the program, the return
array is defined in terms of the parameters by only = and non-recursive procedure
calls (and not recursive calls) assigning values. If the non-recursive path (or paths)
can be transformed to operate in place, i.e. to correctly evaluate the return value
into (a subarray of) one of the parameters, then Theorem 6.15 tells us that at each
recursive call, the return array can share storage with the corresponding argument
(subarray) .5
To exploit this observation, we apply opt?mize in Figure 6.6 to f using only
4If the sharing described in the footnote of page 156 is permissible, then we are
also interested in s(be, K3]ak[?41, f).
5The same non-recursive path(s) provided Theorem 4.13.
183
Inputs: proc2 procedure f.
Outputs: Set Pj of in--Hplace prospects for f.
Method: Let Pf = EUNUR where E is the set of in--Hplace prospects due to =, (Sec-
tion 6.11.2), N is the set of prospects due to non-recursive calls (Section 6.11.3),
and R is the set of prospects due to recursive calls to f (Section 6.11.4).
Figure 6.8: Algorithm P to find in--Hplace prospects.
prospects that have nothing to do with recursive calls. I.e. let P be the algorithm
that returns for procedure f the set of in--Hplace prospects not involving argument or
result arrays of recursive calls. These we already know how to find from occurrences
of = and non-recursive procedure calls as in Section 6.11.2 and 6.11.3 respectively.
Then for any transformation in optzmize(f, P--H), of the form
we can be sure s([?jb'[?2], a?, f) for any recursive call f(i)(... .... . ; ak).
6.11.5 Efficiency
To simplify exposition, the algorithm of Figure 6.6 is written as a brute force enu-
meration; it generates a set of size generally exponential in the size of f. This section
considers how optimize can be implemented with more reasonable efficiency.
First, there is no need to represent the set optimize(f, P) explicitly. It can be
enumerated one element at a time.
Next, it is not necessary to execute P "from scratch" for each new transformation.
Post hoc analysis of algorithm P in Figure 6.8 reveals that P1 [bla] is the same as
wherein all occurrences of a are replaced by b and all resulting pairs with the variable
of b occurring on both sides discarded. Denote this transformation Pf [b/a] (i.e. it is
a theorem that ?f[b/a] = P1 [b/a]).
184
Inputs: proc2 procedure f and sharing prospect algorithm P
Output: Set of all sharing optimizations of f possible with P.
Method: For all permutations U of P7
let f' = f
while U $ ?
remove the first prospect [b/a] from U
if Figure 6.4 returns correct on f', a, b
replace U by U [bla]
replace f' by f' [b/a]
if f' operates in place, i.e. f'(.. ``a.... , [?]a[?]) ? then
for all respective argument--Hresult pairs t, s of recursive calls to f
replace f' by f' [[?]t[?]/s].
for all permutations V of the set of remaining prospects [b/a] in P1
while V $ ?
remove the first prospect [b/a] from V
if Figure 6.4 returns correct on f', a, b
replace V by V [b/a]
replace f' by f' [b/a]
output f'
Figure 6.9: Faster algorithm returning all optimizations of f given P.
Finally, as already noted, it is desirable to consider prospects in a sequence that
"unravels the recursive knot." From the previous discussion, such an order is
o+ Prospects not involving argument or result arrays of recursive procedure calls.
o+ Prospects involving argument--Hresult pairs for the same recursive procedure
call which are certainly correct because the procedure has been transformed to
operate in place.
o+ Other prospects.
Then a less brutish variation of Figure 6.6 is given in Figure 6.9. These are not
formally identical, but we conjecture that the latter misses no useful optimizations
returned by the former.
185
In a practical setting, it remains only to pick the "most efficient" optimization
from those generated by Figure 6.9, but several difficulties remain. For one, choosing
the one with best absolute run time is an undecidable problem. Stated briefly, if we
could decide whether some arbitrarily expensive copy operation is actually evaluated
when a compiled svexp program runs, then we could also decide whether an arbitrary
Turing machine halts. Moreover, static efficiency measures are not helpful: sharing is
a generalization of register allocation problems that are known to be NP-hard [GJ79].
That is, even our improved algorithm may output exponentially many (in the length
of f) optimizations of f, and we will generally have to check exponentially many of
them to find the one with the best static efficiency measure (unless P = NP).
How can we tame this poor worst case behavior? A promising course is to pre-
process Pf, removing all but one prospect due to each = and function call in f. For
example, in (6.25), we choose to share r with either x or y, but not both. This
treatment is in fact an implicit assumption of previous work in this area [01189]
and [B1o89]. While it still admits an exponential number of optimizations in the
general case, we have empirical evidence that for the special case of procedures like
that of Example 6.1, where all let-defined variables can be shared with parameter or
return arrays using the prospects of P1 preprocessed as above, the order in which
prospects are considered is irrelevant.6
Figure 6.10 exploits this observation by considering only a single permutation of
prospects. Moreover, for procedures where total sharing as in Example 6.1 is not
possible, we obtain good results with a greedy heuristic: label each prospect with a
static estimate of payoff for the sharing it represents. Consider prospects (in sets U
and V of Figure 6.10) in descending order of payoff.
With reasonable assumptions, this algorithm runs in polynomial time. If the
6A proof would be extremely unwieldy with the current notations and algorithms.
Better expository and proof methods are a topic of future research.
186
Inputs: proc2 procedure f and sharing prospect algorithm P.
Output: Set of all sharing optimizations of f possible with P.
Method: Let U = P7 and f' = f
while U $ ?
remove the first prospect [b/a] from U
if Figure 6.4 returns correct on f', a, b
replace U by U [b/a]
replace f' by f' [b/a]
if f' operates in place, i.e. ???... , a,... , [?]a[?]) ? then
for all respective argument--Hresult pairs t, s of recursive calls to f
replace f' by f' [[?]t[?j/s].
let V be the set of remaining prospects [bla] in Pf
while V $ ?
remove the first prospect [b/a] from V
if Figure 6.4 returns correct on f', a, b
replace V by V [b/a]
replace f' by f' [b/a]
output f'
Figure 6.10: Final algorithm returning all optimizations of f given P
length of f is n, then it considers 0(n) prospects. The algorithm of Figure 6.4
checks all pairs of references to variables in the prospect, hence its run time is 0(n2).
Assume that types are 0(n) in length. Let C(?) be the cost of symbolic comparison
of two dimension expressions of total length ? to establish S by the Disjoint Itera-
tion Theorem (Theorem 6.9). Useful comparisons are possible in 0(n) time for the
average case and O(n log n) time in the worst case. However, optimal comparison is
NP complete: min and t can simulate boolean and and or in a reduction from SAT.
Thus the cost of comparing two references is dominated by the C(n) comparison cost
in any reasonable implementation. We conclude the total cost of the algorithm is
0(n3). C(n). This time bound is discussed further in Chapter 7.
Chapter 7
Related Work and Conclusions
This chapter describes where the work presented thus far fits amid the work of others.
It draws conclusions related to practical applications. Finally, it gives directions for
future study.
7.1 Related work
The Alex paradigm is related to a broad range of computer language research.
7.1.1 Array semantics
The denotational semantics of arrays receives sparse treatment in the literature. Stoy
[Sto77] mentions them in passing. Gordon [Gor79J gives a prototypical definition ori-
ented toward recursively defined arrays. Though our notion of infinite tuples is com-
mon in logic, there apparently was no need for them in semantic domains until s-vals.
Mou and Hudak [M1188] propose algebraic vector operators for expressing parallel
divide and conquer algorithms, but these are defined only for the one-dimensional
case. Compilation is not discussed.
187
188
7.1.2 Type inference
Our extension of [Mil78] to s-vals with map unification for axis types is strongly
related to work in type systems for so-called object-oriented A-calculi. In partic-
ular, [Wan87] uses record types called rows, which are maps from labels to types.
These superficially resemble our maps from axis numbers to dimensions. Wand also
generalizes unification along the same lines as our map unification. Two essential
differences are that the order of pairs in rows is significant, and he has no "erasing"
value for a field in a record like our a. None the less, Wand's exposition of complete
type inference by reduction to most-general unification is the model for our presen-
tation of axis inference. Unfortunately, Wand makes an error in proving his unifiers
most general [Wan88]; the system is incomplete.
Concurrently, [R89] adopts a similar approach to record typing, including a
a-equivalent value. However, this system omits row variables (our map variables).
Rows consist of a fixed number of pairs, one for each of the set of labels that occurs in
the entire program. The set must be gathered in a pass over the program preceding
type inference.
Finally, [Wan89J expands on Remy's work in two ways. First, he uses it to fix the
error of [Wan87] and achieve a non-standard completeness result: He proves there
are no principal types in his system, but that most-general types can be coded as a
disjunction of type schema. Unfortunately, this disjunction can be of size exponential
in the size ofthe program, even for rather innocent-looking terms, a problem our types
do not share. Second, he reintroduces row variables under the name of extension
variables. This leads to rows that closely resemble maps. He outlines an associated
unification algorithm, different and more complicated than ours, whose correctness
and efficiency are not examined. We conjecture that our technique for map unification
is adaptable to this case.
189
None of this work involves extents or dimension inference in a second pass. Instead
of dimensions, record fields have types of their own, which are inferred in the same
pass with labels.
7.1.3 Compilation
MIMD vector operations. The work of [CBF91J deals with generation of efficient
vector codes for MIMD processors using, as input, functional data flow graphs with
vector tokens and vector operators as nodes. Part of the optimizer is an algorithm
for deducing vector lengths from what is known about primitives. It is aimed at
determining when the loops internal to primitives have matching bounds and non-
interfering access patterns so they can be fused by the compiler into a single loop for
efficiency.
APL. There is a large, old literature on compiling the array language APL that
confronts some of the same issues as we do in compiling svexp. The state of the
art is characterized in [Bud88]. In particular, the compiler discussed there uses a
generalization of dope vectors similar to ours, and the overall goals of code generation
are similar, despite the fact that APL has assignment and is dynamically typed.
Budd points out that APL is a prime candidate for generating good vector code,
and [CX87] bears this out. This bodes well for our eventual ability to generate good
parallel code from svexp.
Conversely, code accumulations could be put to good use in compiling APL prim-
itives. For instance, Budd encounters much the same problem we introduced in Ex-
ample 5.2 while compiling the APL catenation primitive, but adopts a less efficient
solution because his compiler cannot rewrite code it has already generated.
Single assignment languages. Single assignment languages similar in princi-
ple to proc have been proposed as source languages. The most widely known is
190
SISAL [McG85]. The (mostly) functional language Id [Nik88] has similar qualities,
though it was designed specifically for research in datafiow machine architectures.
Compile-time garbage collection. The general problem of copy elimination in
functional languages subsumes the array copying in proc and has been studied since
at least 1977 [Bar77]. Of modern treatments, [JM89] is representative in its use of
abstract interpretation [CC77J for interprocedural analysis. In [JM89] it is used to
predict when list cells can be safely reused as a program runs in order to avoid calling
an allocator and associated garbage collector.
Abstract interpretation is a broadly applicable method based on evaluating the
source program with respect to so-called nonstandard semantics over a finite, par-
tially ordered domain. The domain abstracts values in the standard denotational
semantics. Its properties ensure that the least fixed point of a recursive function
abstraction can be computed exactly and in finite time by iterative approximation.
Thus the "recursive knot" we referred to in several contexts is unraveled in a very
general way. Proofs of correctness for algorithms based on abstract interpretation
are so painless that they are seldom given in the literature.
Abstract interpretation's downfall is run time potentially exponential in the num-
ber of arguments of functions. Moreover, poor run time behavior tends to arise in
practice. The problem of deciding whether a function parameter is certainly not used
in the course of call-by-name evaluation (i.e. strnctness analysis) is inherently simpler
than compile-time garbage collection. Yet it was found too costly to solve with an
abstract interpreter for real programs in [You88]. To be fair, this analysis included
limited treatment of higher order functions; hence, more experiments are needed to
understand average performance for purely first--Horder languages like svexp, but the
outlook is grim.
191
Aggregate updates. We gave a brief history of the aggregate update problem in
Section 1.3 and cited [HB85] and [Blo89]. They concentrate on the comparatively
narrow problem of deciding when Hudak's updt operator can be compiled as a scalar
assignment, which is conceptually similar to compile-time garbage collection, though
the payoff of a successful algorithm is much greater. In contrast, [GH89] treats the
more general problem of in--Hplace function evaluation. Since all these are based on
abstract interpretation, however, they share its potential performance problems. Our
final algorithm of Figure 6.10 appears to subsume the capabilities of [GH89] (when
generalized to handle sets of mutually recursive functions), which makes the otherwise
discouraging O(n3)C(n) performance described in Chapter 6 a step forward.
7.2 Alex
It is appropriate here to describe in broad terms the Alex pictures that are the basis
for svexp.
7.2.1 svexp subterms as pictures
An Alex window on the graphic screen represents a function definition. Boxes in
Alex windows are s-vals. A fresh value is created by "picking up" an s-val box from
a palette of tools with the mouse (i.e. clicking the mouse over the tool) and dropping
it (i.e. releasing the mouse button) in a function window. Dropping a value close
to the top or bottom of the window causes it to "stick" to the edge, making it
a parameter or return value respectively. The user makes two values identical by
clicking in the box of one and releasing in the other; on release, both boxes assume
the same randomly chosen (but editable) color to show the connection. For instance,
the identity function results from creating a parameter and return at the top and
bottom of a fresh Alex window, then identifying them in this manner.
Primitives are applied by selecting other tools from the palette. For instance, the
192
s-valx			Drop Extr tool here
Befove tool application
Extr00 0 x
Extr01 0 x
Afte? tool application
Figure 7.1: Typical element operations in Alex.
user applies the Extr tool to an s-val by picking it up and dropping it along a matrix
edge to be "cut" by the operation as shown in Figure 7.1. The Typ operation shown
in Figure 7.2 is similar, except it adds a colored lobe that stands for the outer axis of
the typical components. Another Typ operation can be applied with respect to the
same outer axis by selecting the lobe and dropping it along the appropriate edge of
the argument s-val. Thus Typ 0 7 Typ 1 7 x looks like Figure 7.3.
User function calls are represented by boxes containing the name of the called
function with rows of small "knobs" along the top and bottom to denote argument
and return values respectively. One creates a new function call by clicking on the
window of the definition (including the window where the call is wanted if it is to
be recursive) and dropping it in the desired place. Alternately, there is a library
menu for functions previously defined and saved; the user can pick up one of these
as well. Function call argument values are defined by identifying the respective knob
with a parameter or the result value of some previous computation as describe above.
Outputs are put to use similarly.
193
s-valx			Drop Typ tool here
Before tool application
Lobe color represents axis 7
After tool application
Figure 7.2: Extract operations in Alex.
Lobe color represents axis 7
After second tool application
Figure 7.3: Multiple Typical element operations.
Conditionals are subwindows in the function window. A subwindow depicts either
the true or false branch of the conditional at any one time; the user can toggle between
them by clicking on the predicate knob at the top of the subwindow. Tliis knob
must also be identified with the desired boolean conditional value. There are output
knobs on the subwindow's bottom edge to ensure that both branches define the same
values. Subwindows may be nested to arbitrary depth to denote nested conditionals.
Drawing constraints (e.g. keeping the cursor inside the conditional subwindow while
copying a value defined therein) enforce scopes of conditional branches.
194
Type inference is an integral part of interactions. The data structure underlying
the screen graphics is essentially a forest of abstract syntax tree fragments. When the
function definition is complete, it forms a single tree. The type inference algorithm
assigns a fresh type variables for the type of each value box not yet identified with a
value of known type, then infers the type of each tree fragment. As long as unification
succeeds, the graphics represent a potentially type-correct program. Interactions that
cause inference to fail are refused. The vertical and horizontal screen axes represent
inner s-val axes 0 and 1 respectively so that the convention of column and row
vectors used in this text carry through to the appearance of values on the screen.
The hyperboxes of [AC91j are useful for s-vals with more inner axes. The inference
algorithm automatically yields the correct shape for the value of a function output
once all its inputs are defined. Until then, it gets a rounded shape.
7.3 Generalizations
We invented svexp to explore s-vals and typical component operations and the paral-
lelism they so naturally express. We have built a prototypical compiler that generates
sequential C code from svexp to test these ideas. The early results are encouraging,
but of course svexp in its current form is not a generally useful language. Thus we
enumerate some logical directions to make svexp (and Alex) useful for other than
data--Hparallel problems.
Computed partitions. Whatever the advantages of s-vals for expressing data--H
parallelism, a general purpose matrix language needs a conventional indexing
operator and Hudak's updt primitive discussed in Chapter 1 for many practi-
cal problems; [ANP89j discusses the computation of histograms. Happily, we
195
can incorporate both these into svexp with simple additions. I.e. define
?iPart1 c? V1 V2 =--H when V?Al?Ql;cl,? has c E Ci
it's when V2k,a2;C2,? has C2 =
it's Aij.Viij[(jc)t(V2iO)/c]
Thus Part1 accepts two parameters a general s-val and an inner-scalar that
serves, roughly speaking, as an index. Part1 replaces each inner matrix of its
argument V1 with a submatrix that is the same except the first n components
along axis c are missing, where n is the respective inner scalar value of V2. We
obtain an s-val version of matrix indexing by composing Part1 with Extr00:
Extr00 c Part1 c x i
where x and z are s-vals. In fact, this operator offers more than simple array
indexing. If p is an inner permutation vector, then the following computes the
respective permutation of each inner column vector of x:
Proj 70 Extr00 cPart1 cx (Typ O7p)
If we also define the symmetric variant Part0 of Part1:
riParto clV1 V2
when Vft1Q1;c1?? has c ? C1
it's when VA2,oL2;c2,? has C2 =
it's Aij.(jc) < (V2iO) (V1ij), ran?e?r
then we can also fabricate an s-val version of updt. For e.g. column vectors:
updt v i x a=q Join 0 (Part0 0 v
Join 0 x
(Extr01 0 Part1 0 v
The compiler and optimizer accommodate these new operations as natural
extensions; no new principles are needed.
Computed dimensions. We showed in Section 3.4.6 with (3.29) how to achieve
s-val closure even if conditionals are allowed to produce different-sized outputs
196
on their branches. This policy makes types as we have defined them undecid-
able. This has an important consequence: we cannot in general predict the size
of a user function call result based on the size of inputs at compile time. Hence
the procedure caller cannot allocate space for the result array. It remains to be
seen how well the top--Hdown compilation scheme using code accumulations of
Chapter 5 and the optimizer of Chapter 6 can be adapted to this environment.
Higher order functions. Introducing s-vals into e.g. a general typed lambda cal-
culus with function values opens many problems. For instance, the semantics
of s-vals whose inner matrices contain functions (from e.g. s-vals to s-vals) re-
quire careful consideration. S-vals may of course be bound in the environments
of function values with indefinite lifetimes during program execution. Thus
the stack allocation of svexp is inadequate; a heap and garbage collector are
required. This expense makes in--Hplace evaluation all the more desirable, but
similar analyses of higher order functions using abstract interpretation have
proven intractable in theory as well as practice. Clearly, much work remains
before higher order s-vals can be useful.
7.4 Summary
We have given some formal underpinnings for the Alex paradigm for representing
matrix functions and have shown that Alex functions can be compiled to efficient
imperative codes.
We began with an unconventional definition of matrices, upon which we built
denotational semantics for the Alex-representative calculus svexp. We showed that
svexp semantics are closed for s-vals, the matrix values that svexp primitives trans-
form.
We described extents, the shapes and sizes of s-vals, and gave syntactic types to
197
represent them. We constructed a sound and complete inference algorithm for extent
types. For each subterm of a svexp program, the algoritlim determines an extent
type that describes all possible shapes and sizes the subterm can take at run time.
Size constraints may be undecidable at compile time, but can be enforced by efficient
run time checks in the compiled code.
We showed that the standard method for developing code generators is not ideal
for svexp and proposed code accumulations as a way of letting the code generator
transform code it has already emitted to avoid unnecessary storage allocation and
copying in the compiled program. The result is compiled code that allocates storage
only for only one inner matrix of each user function call argument and result.
We found that the last obstacle to performance lay in deciding when a compiled
function could be safely called with a subarray of one of the parameters serving
also to hold the return array or vice versa. We used pairwise syntactic analysis of
references to determine when this was permissible, and we developed an algorithm
to find a small set of good prospects for such sharing. Overall run time is O(n3)C(n)
under reasonable assumptions, where n is the input length and C(?) is the cost of
comparing two symbolic dimension arithmetic expressions of total length ?.
Bibliography
[AC91]
[ANP89]
Bowen Alpern and Larry Carter. The hyperbox. Unpublished manu-
script, April 1991.
Arvind, R. Nikhil, and K. Pingali. I-structures: Data structures for par-
allel computing. ACM Transactions on Programming Languages and Sys-
tems, 11(4), October 1989.
[App87] Andrew W. Appel. Garbage collection can be faster than stack allocation.
Information Processing Letters, 25(5):275--H279,1987.
[ASU86j A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Tech-
niques, and Tools. Addison-Wesley, Reading, MA, 1986.
[AT89]
[Bac78]
Andrew W. Appel and Jim Trevor. Continuation-passing, closure-passing
style. In Proceedings of the 16th ACM Symposium on Principles of Pro-
gramming Languages, pages 293--H302, January 1989.
John Backus. Can programming be liberated from the von Neumann
style? A functional style and its algebra of programs. Communications
of the ACM, 21(8):613--H641, August 1978.
[Bar77] J. M. Barth. Shifting garbage collection overhead to compile time. Com-
munications of the ACM? 20(7), July 1977.
[Blo89]
Adrienne Bloss. Update analysis and the efficient implementation of func-
tional aggregates. In Proceedings of the 4th International Conference on
Functional Programming Languages and Computer Architecture, pages
26--H38, September 1989.
[Blo90j Bard Bloom. Course notes, CS611, Advanced programming languages.
Department of Computer Science, Cornell University, November 1990.
[Bud88] Timothy Budd. An APL Compiler. Springer-Verlag, New York, 1988.
[CBF91]
Siddhartha Chatterjee, Guy E. Blelloch, and Allan L. Fisher. Size and ac-
cess inference for data-parallel programs. In Proceedings of the SICPLAN
`91 Conference on Programming Language Design and Implementation,
pages 130--H144, June 1991.
198
199
[CC77]
[CX87j
[F1188]
[GH89]
[GJ79]
P. Cousout and R. Cousout. Abstract Interpretation: A unified lattice
model for static analysis of programs by construction or approximations
of fixpoints. In Proceedings of the 4rd ACM Symposium on Principles of
Programming Languages, pages 238--H252, January 1977.
Wai-Mee Ching and Andrew Xu. A vector code back end of the APL37O
compiler on IBM 3090. ACM APL Quote Quad, 18(2):69--H76, December
1987.
Anthony J. Field and Peter G. Harrison. Functional Programming.
Addison-Wesley, Reading, MA, 1988.
K. Gopinath and John L. Hennessy. Copy elimination in functional lan-
guages. In Proceedings of the 16th ACM Symposium on Principles of
Programming Languages, pages 303--H314, January 1989.
Michael R. Garey and David. 5. Johnson. Computers and Intractability:
A Guide to the Theory of NP-completeness. W. II. Freeman & Company,
New York, NY, 1979.
[Gop88] K. Gopinath. Copy Elimination in Single Assignment Languages. Ph.D.
dissertation, Stanford University, March 1988.
[Gor79] Michael J. C. Gordon. The Denotational Description of Programming
Languages. Springer-Verlag, 1979.
[GVL83] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The
Johns Hopkins University Press, Baltimore, MD, first edition, 1983.
[HB85]
[JM89]
Paul Hudak and Adrienne Bloss. The aggregate update problem in func-
tional programming systems. In Proceedings of the 12th ACM Symposium
on Principles of Programming Languages, pages 300--H314, January 1985.
Simon B. Jones and Daniel Le Metayer. Compile-time garbage collection
by sharing analysis. In Proceedings of the 4th International Conference
on Functional Programming Languages and Computer Architecture, pages
54--H74. Addison-Wesley, September 1989.
[KR78] Brian W. Kernighan and Dennis M. Ritchie. The C Programming Lan-
guage. Prentice-Hall, Englewood Cliffs, NJ, 1978.
Dexter Kozen, Tim Teitelbaum, W. Chen, J. Field, W. Pugh, and B. Van-
der Zanden. Alex An alexical Programming Language. In 1987 Work-
shop on Visual Languages, pages 315--H329, Linkoping, Sweden, August
1987. Also in Visual Languages and Applications, ed. Ichikawa, Jungert,
and Korfhage, Plenum Press, 1990,147--H158.
[KTC+87]
200
[McG85]
J. McGraw, et. al. SISAL: Streams and iteration in a single-assignment
language, language reference manual. Technical Report M-146, Rev. 1,
Unversity of California, Lawrence Livermore National Laboratory, March
1985.
[Mey88] Alan Meyer. An efficient implementation of LU decomposition in C.
Advances in Engineering Software, 1O(3):123--H130, 1988.
[MH88] Zhijing G. Mou and Paul Hudak. An algebraic model for divide-and-
conquer and its parallelism. Journal of Supercomputing, 2:257--H278,1988.
[Mil78] Robin Milner. A theory of type polymorphism in programming. Journal
of Computer and System Sciences, 17:348--H375,1978.
[MTH9O] Robin Milner, Mads Tofte, and Robert Harper. The Definition of Stan-
dard ML. The MIT Press, Cambridge, MA, 1990.
[Nik88] Rishiyur 5. Nikhil. Id (version 88.0) reference manual. Technical Report
CSG Memo 284, M.I.T. Laboratory for Computer Science, March 1988.
[PJ87] Simon L. Peyton-Jones. The Implementation of Functional Programming
Languages. Prentice Hall, 1987.
[R89]
Didier Remy. Typechecking records and variants in a natural extension
of ML. In Proceedings of the 16th ACM Symposium on Principles of
Programming Languages, pages 77--H87, January 1989.
[Rea89] Chris Reade. Elements of Functional Programming. Addison-Wesley,
Reading, MA, 1989.
[Rob65] J. A. Robinson. A machine-oriented logic based on the resolution princi-
ple. Journal of the ACM, 12:23--H41,1965.
[5571] D. Scott and C. Strachey. Towards a mathematical semantics for com-
puter languages. In Symposium on Computers and Automata, volume 21.
Polytechnical Institute of Brooklyn, 1971. Microwave Research Institute
Symposia Series.
[5to77]
[Ung84]
J. E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Pro-
gramming Language Theory, volume 1 of MIT Press Series in Computer
Science. The MIT Press, Cambridge, MA, 1977.
David Ungar. Generation scavenging: A non-disruptive high performance
storage reclamation algorithm. A CM SICPLAN Notices, 19(5): 157?1677
1984.
Mitchell Wand. Complete type inference for simple objects. In Pro-
ceedings of the Second Annual IEEE Symposium on Logic in Computer
Science, pages 37--H44, June 1987.
[Wan87]
201
[Wan88]
[Wan89]
Mitchell Wand. Corrigendum: Complete type inference for simple ob-
jects. In Proceedings of the Third Annual JEFE S?mposium on Logic in
Computer Science, June 1988.
Mitchell Wand. Type inference for record concatenation and multiple
inheritance. In Proceedings of the Fourth Annual IEEE Symposium on
Logic in Computer Science, pages 92--H97, June 1989.
Jonathan Young. Theory and Practice of Semantics-Directed Compiling
for Functional Programming Languages. Ph.D. dissertation, Yale Univer-
sity, 1988.
[You88]
