BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1295
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Periodic Updates in Processor Networks
AUTHOR:: Sundaram, Sridhar  
DATE:: July 1992
PAGES:: 19
ABSTRACT::
In implementing parallel scientific applications, the crux is often efficiency 
in communication. So, it becomes important to abstract out and study 
frequently occurring patterns of communication. We describe one such 
abstraction, periodic update, which arises in such diverse areas as 
computational molecular dynamics, distributed systems and n-body simulations.

Consider a synchronous network of processors. Every processor needs to know 
the status of every other processor. The status keeps changing and it is more 
important to know the correct status of nearer processors. The work done in 
maintenance of status information should be as little as possible. The  
periodic update problem is that of sending (and receiving) periodic status 
updates from every processor to every other processor in the network in order 
to maintain status information. The time interval between successive updates 
from one processor to another is given by some increasing function (the 
periodicity function) of the distance between them. Given a network and a 
periodicity function, we wish to find efficient protocols for the 
periodic update problem which have minimum delay (time elapsed between the 
sending of an update and its receipt) and minimum peak bandwidth (maximum 
number of updates sent across any edge in one time-step).

We present a general design paradigm and construct periodic update 
protocols for the one and two dimensional mesh networks for both polynomial 
and exponential periodicity functions. Given a periodicity function, we 
demonstrate a trade-off between delay and peak bandwidth. Then,  
using two general techniques, we transform minimum delay protocols into 
families of highly efficient protocols with performances spanning the entire 
spectrum from minimum delay to minimum peak bandwidth.
END:: CORNELLCS//TR92-1295
BODY::
Periodic Updates in Processor Networks
Sridhar Sundaram
TR 92-1295
July1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
Periodic Updates in Processor Networks
Sridhar Sundaram
July 21,1992
Abstract
In implementing parallel scientific applications, the crux is often effi-
ciency in communication. So, it becomes important to abstract out
and study frequently occurring patterns of communication. We de-
scribe one such abstraction, periodic update, which arises in such
diverse areas as computational molecular dynamics, distributed sys-
tems and n-body simulations.
Consider a synchronous network of processors. Every processor
needs to know the status of every other processor. The status keeps
changing and it is more important to know the correct status of nearer
processors. The work done in maintenance of status information should
be as little as possible. The periodic update problem is that of send-
ing (and receiving) periodic status updates from every processor to ev-
ery other processor in the network in order to maintain status informa-
tion. The time interval between successive updates from one processor
to another is given by some increasing function (the periodicity func-
tion) of the distance between them. Given a network and a periodicity
function, we wish to find efficient protocols for the periodic update
problem which have minimum delay (time elapsed between the sending
of an update and its receipt) and minimum peak bandw?dth (maximum
number of updates sent across any edge in one time-step).
We present a general design paradigm and construct periodic up
date protocols for the one and two dimensional mesh networks for
both polynomial and exponential periodicity functions. Given a peri-
odicity function, we demonstrate a trade-off between delay and peak
bandwidth. Then using two general techniques, we transform minimum
delay protocols into families of highly efficient protocols with perfor-
mances spanning the entire spectrum from minimum delayto minimum
peak bandwidth.
1 Introduction
Consider a group of processors connected in a simple topology. Each pro-
cessor needs to know the status of every other processor in the network.
This status keeps changing and the status information of nearer neighbors
is more important. The maintenance of status information should interfere
as little as possible with the work being done by the processors.
To maintain status information, each processor has to send periodic up-
dates to (and receive periodic updates from) every other processor. Updates
have to be sent more frequently to processors which are nearer. The problem
is to schedule these updates so that only a small amount of work is done
in exchanging updates in each time-step. We refer to this as the periodic
update problem.
One situation where periodic updates are required is a network of ma-
licious processors. Each processor needs to know which of the others are
alive. The liveness status of nearer neighbors is more important for routing
purposes. So the processors periodically exchange authenticated (to pre-
vent malicious forgery) "I am `? messages. The communication pattern
required is captured by the periodic update problem.
Another example comes from molecular dynamics where the movement
of molecules has to be simulated over a large number of time-steps. The new
position of any molecule at each time-step depends on all other molecules
but nearer molecules affect it more strongly. Suppose the spatial placement
of molecules corresponds approximately to the placement of processors in
the network. Then we can identify one processor with one molecule and
use an incremental approach to calculate a good approximation to the new
positions of the molecules. For this, changes in positions of nearer molecules
have to be communicated more frequently. Agaln periodic update provides
an abstraction of the communication pattern required.
The periodic update abstraction also arises in the following problems:
gateways in an internet which need to exchange routing information period-
ically, some clock-synchronization protocols and n-body simulations.
"k-k routing", discussed by Kunde and Tensi [KT89J, is related to pe-
riodic update. Other authors [Lei90, LMT89, Kun88J discuss algorithms
for routing permutations ("1-1" routing). In the "k-k routing" problem
each processor contalns k messages initially and finally and the messages
have to be routed from intial to final processor locations as fast as possible.
Thus each processor sends (and receives) a small number of messages to
(from) arbitrary destinations. In contrast, in the periodic update problem,
2
every processor is periodically and continuously sending messages, cycling
through all possible destinations. Nassimi and Sahni [NS8I] describe Ran-
dom Access Read and Random Access Write, while Saad and Schultz [SS89j
describe Gather, Scatter and Broadcast as communication primitives. Peri-
odic update is a more structured communication primitive useful only for a
certain class of problems. We feel that it is difficult to implement parallel
applications with low-level primitives like Broadcast.
The basic issue in designing protocols for the periodic update problem
is this. How do we stagger the updates to different distances and schedule
them so that a small, uniform amount of work is done in every time-step
by each processor? In other words, rather than lumping all communication
actions together, we wish to stagger them over a period of time to achieve
greater efficiency. We would also like the delay between sending and receipt
of an update to be predictable.
Studying periodic update as a communication abstraction in isolation
clarifies the concepts involved. It has helped us to discover new algorithms.
These can be used to structure communications in scientific computations.
This was the primary motivation for studying the periodic update prob-
lem. We describe the problem more rigorously in the next section.
1.1 Notation and Terminology
We model the synchronous network by an undirected graph whose vertices
correspond to processors and whose edges correspond to bidirectional com-
munication links between processors. Every processor in the network sends
messages or updates to every other processor in the network. An update
might have to be forwarded by intermediate processors to its destination.
The distance between two processors is the minimum number of edges tra-
versed in the associated graph to reach one from the other.
The periodicity function, Per?od(d), is defined as the number of time-
steps between successive updates, sent from one processor, to another at
distance d from itself. Period(d) is a monotonic non-decreasing function of
the distance d. We define Delay(d), as the number of time-steps elapsed
between the sending of an update, and its receipt by a processor at distance
d. Since each intermediate processor takes at least one time-step to forward
an update (possibly more if the update is stored for a few time-steps before
being forwarded), Deia?(d) is at least as large as d. We want the amount
of work done per time-step by each processor in processing updates to be
small. This motivates the definition of the peak bandwidth as the maximum
3
begin PROTOCOL
Counter? Counter+ 1
for each distance d such that Counter e SS(d) do
send update towards processor at distance d;
od
for each waiting update U that has travelled distance d do
forward U towards its destination if Counter E fS(d);
od
Receive and read updates;
Discard all updates for which the destination is self,
end PROTOCOL
Figure 1: A Basic Paradigm for Periodic Update Protocols.
Action at each processor at each time-step
number of updates any processor has to forward and/or send in a single time
step.
Now we state our problem more precisely.
The Periodic Update Problem: Given a network and a periodicity func-
tion Period, we wish to find an efficient protocol with low peak bandwidth
and low Delay, in which, every processor in the network sends updates to
(and receives updates from) any processor at a distance d, periodically at
intervals of Period(d) time-steps.
1.2 A Basic Paradigm for the Protocols
Send refers to the creation and dispatch of an update by a processor. For-
ward refers to the forwarding of an update by an intermediate processor on
the update's route to its final destination.
Now we describe a framework for periodic update protocols. Each pro-
cessor is equipped with its own Counter to keep track of time. All counters
are initialized to the same value in the protocols (unless otherwise spec-
ified.) For a periodicity function, Period, and a network with maximum
distance n between any two processors, the counter needs only to count
modulo Period(n). It is natural to assume that sending/forwarding of an
update occurs when the counter has a particular value.
4
Figure 1 describes a basic paradigm for periodic update protocols. For
simplicity, we consider only one of any two symmetrical directions of com-
munication. SS(d), the Sending Set, is a set of counter values when an
update should be sent to a processor at distance d. Similarly, ?S(d), the
Forwarding Set, is a set of counter values when an update which has traveled
a distance d should be forwarded.
The paradigm of Figure 1 is extremely general. Under some assumptions
on the periodicity function and the protocol, this basic paradigm can be
made more specific. First we consider the periodicity function.
Definition 1 A periodicity function, Pernod, is regular if for any two dis-
tances d1 and d2 with d1 less than d2, Pernod(d2) is an integral multiple of
Pernod(d1). i.e. Pernod is regular iff
Vd1 ? d2a positive integer r : Period(d2) = r * Period(d1). ?
Regular periodicity functions allow greater efficiency in forwarding updates.
An update sent to a distance d can be used by all intermediate processors
which forward that update. For periodicity functions which are not regu-
lar, intermediate processors have to forward updates which they themselves
cannot use.
By conceptually speeding up the clock, we can normalize any regular
periodicity function, Period(d), so that Period(1) is 1. Here onwards, we
will assume normalized periodicity functions.
Fact 2 A protocol has minimum delay iff every update is forwarded in
the time-step immediately after it is received.
For a minimum delay protocol, Delay(d) equals d. Forwarding of updates
occurs as soon as possible and need not be scheduled. Only the sending
of updates has to be scheduled. Protocols in which the same number of
updates are sent in each time-step are desirable because they distribute the
work of communicating updates uniformly over all time-steps.
Definition 3 A protocol is a single-send protocol if exactly one update is
sent by each processor in each time-step. C
Any protocol in which the same constant number of updates is sent in each
time-step can be thought of as a single-send protocol with an appropriately
speeded up clock. Single-send minimum delay protocols are very easy to
design as the isomorphism theorem below shows. Later we will describe
techniques to transform such protocols into single-send protocols with any
desired bandwidth.
5
begin PROTOCOL
Counter? Counter+ 1;
comment: Send an update to the maximum distance permitted;
d max?c Period(c) divides Counter?;
send an update towards processor at distance d;
comment: Forward all updates that satisfg forwardin9 condition;
c Countermod Period(d+ 1);
for each waiting update U that has travelled distance x do
forward U towards its destination if Forwd(x)
od
Receive and read updates;
Discard all updates for which the destination is self,
end PROTOCOL
Figure 2: A Paradigm for Regular Single-Send PrntocoTh.
Action at each processor at each time-step
Theorem 4 (Isomorphism) For an? regular periodicity function Period
let W be a single-send minimum delag protocoL
Consider protocol Y which is also single-send minimum delay with
SSy(d) = fr * Period(d) r is a positive integer).
Then at least as many updates are sent and forwarded in ? as in Y.
All our protocols follow the paradigm for single-send protocols for regular
periodicity functions shown in Figure 2.
The sending method is just a convenient implementation of the Sending
Set SSy(d) described in the isomorphism theorem. One update is sent in
every time-step to the maximum distance allowed by the counter and the
periodicity function. Each Update is stamped at the time of sending with
the distance it has to travel and the sender's identification.
An Update message is forwarded after the processor has read it. ff
the distance the message has travelled equals the distance to which it was
sent, then it is discarded. Otherwise it is forwarded towards its destination
processor at the time-step when the distance it has travelled satisfies the
forwarding condition on the counter. This forwarding condition can be ob-
6
Name Pernod(d)			Forwd(d)			Peak bandwidth Delay(d)
2			next step			log* n			d
1			2 og			next step			logn			d
?1			2 og			next step			loglogn			d
Table 1: Two Basic iD-Protocols and their Performance
tained from the Forwarding Set using the isomorphism theorem and some
symmetry considerations. The function Forwd is defined by the protocol.
In the rest of the paper we consider versions of the periodic update
problem with polynomial and exponential periodicities on one, two and
higher dimensional mesh topologies. In the next section (Section 2) the
one-dimensional version of the problem is explored. Minimum delay pro-
tocols are presented for polynomial and exponential periodicity functions.
We demonstrate a trade-off between delay and the peak bandwidth, and
address lower-bounds and efficiency issues. Two general protocol trans-
formation techniques to transform minimum delay protocols into protocols
with any desired bandwidth are described. Thus we get a family of pro-
tocols spanning the entire performance spectrum from minimum delay to
minimum bandwidth. In Section 3 these results are extended to two and
three dimensional meshes. Finally, we conclude with a summary and the
scope for future work in Section 4. Proofs for theorems and lemmas are
given in the appendix.
2 Periodic Update in One Dimension
The one-dimensional mesh is simply a linear connection of n processors.
First we present two minimum delay protocols. Then we discuss transfor-
mations that make them more efficient with respect to peak bandwidth.
2.1 Two Basic Protocols
We summarize two basic protocols in Table 1. Column 1 gives a name to the
protocol. Columns 2 and 3 specify the periodicity function Period(d) and the
forwarding condition Forwd(d). The last two columns give the performance
of the protocol, its peak bandwidth and its delay function Deiay(d).
7
It is easy to see that the protocols function correctly. In protocol &1, a
message is sent to distance d whenever the (d --H 1) lower bits of the counter
have a particular value (all 0's). Thus a message is sent to a processor at
distance d once in every 2d--H1 time-steps as required.
In protocol ?1, (and similarly in 1) the period is 2[2 log d1 which is a reg-
ular periodicity function approvimating the periodicity of d2 (which is not
regular).
Theorem 5 Protocols 1, ?l and ?1 function correctl?.
All three protocols have minimum delay and Delay(d) is d for them since
updates are forwarded as soon as possible by intermediate processors. The
peak bandwidth entries are not so obvious.
Theorem 6 (Peak bandwidth) The peak bandwidth for protocol Sl in a
network of n processors is log* n1. The peak bandwidth for ?l is log log n
and for ?1 it is log n.
Any protocol with an Exponential periodicity (Period(d) = r?, r > 1)
would be similar to ?1 and any protocol with a Polynomial periodicity
(Period(d) = d?, r > 1) would be similar to Pi.
2.2 Lower-bounds, Trade-offs and Efficiency
We derive a lower-bound for the peak bandwidth of any protocol for a given
periodicity function. Then we describe transformations ? and ? of single-
send minimum delay protocols which have higher delays but work with any
desired peak bandwidth.
2.2.1 A Lower Bound for the Peak bandwidth
Let N(d) = the number of processors at distance d from a given processor.
M(d) = Period(n) the number of updates sent to distance d in Period(n)
Pertod(d)'
time-steps.
A message to distance d is implicitly used by all intermediate (d --H 1) pro-
cessors. So we associate only a cost of 1 with it.
The total number of messages sent by a processor is M(1).
1In this paper, we use the following conventions regarding the logarithm.
All logarithms are to the base 2.
n --H loglog(??l) n and log(O) n --H n
log n = the minimum i such that log(?) n < 1
8
Let the mean distance travelled by an update in protocol W be D(x). Then,
d--Hn			d--H--Hn
=			M(d)*N(d) _			Period( 1)*N(d)
___			M(1)			--H			Per?od(d)
___			di
Lemma 7 (Lower Bound) For any s?ngTh-send protocol ? the peak band-
w?dth is at least D(x)
For ?1, Pernod(d) = 2d-1, so 1 <?D(&1) < 2 ? peak bandwidth > 2
For ?1, Period(d) d2, so 1 <?D(?1) ? 2 ? peak bandwidth > 2
For ?1, Pernod(d) d, so D(?1) = logn ? peak bandwidth > logn
This lower bound on the peak bandwidth can be achieved but usually re-
quires higher delay.
2.2.2 Simulating Higher Bandwidth: Transformation ?
Suppose the available bandwidth is w but the peak bandwidth required by a
protocol is bw(n). (Here n is the number of processors.) How can we adapt
the protocol to work with bandwidth w?
[Transformation?:Forwardwupdatespertime-stepfor[bWw(fl))time-stepsl
It is easy to see that this direct simulation of a higher bandwidth works.
However, the delay, which is the number of time-steps between dispatch and
receipt, gets multiplied by [??n)] Protocols ?1 and ?1 can be transformed
using ? to run with a peak-bandwidth of w (w > 2) and delay of d% and
d respectively. Of course, 1 as described has already attained the
lower bound on its peak bandwidth.
Theorem 8 (? Transformation) A protocol ? with peak bandwidth bw(n)
and delay Delay(d) can be converted by transformation ? to a protocol X?
with peak bandwidth w (w > D(Y)) and delay Delay(d) [?w???1
Thus, there is a trade-off between delay and bandwidth. Transformation ?
modifies protocols to fit the available bandwidth at the expense of higher
delays.
2.2.3 Efficiency: Transformation p
Transformation ? of a protocol with peak bandwidth bw(n), to one with
bandwidth w, is inefficient. When the counter counts through from 1 to
9
?amePernod(d)Forwd(d)PeakbandwidthDelay(d)
2d--H1			2d--H1			2			2d--H1 --H
?1P			2 og			*see text			2			*see text
Table 2: Transformation,3 (for w = 2) of the Basic iD-Protocols
Period(n), each processor does Pernod(n) ?w??? work. Our lower bounds
suggest the possibility of protocols in which each processor does only
Pernod(n) work, by staggering the peak forwarding work-load. Transfor-
mation ,3, which achieves this, is motivated by the bandwidth recurrence
relation discussed in the proof of the Peak Bandwidth theorem.
The idea is to stagger the forwarding of the updates so that the same
bandwidth w is used in every time-step. We divide the updates waiting
to be forwarded into forwarding classes based on the distances they have
travelled. As an update travels farther, it is promoted to higher forward-
ing classes and gets forwarded after longer delays. Updates in a particular
class get forwarded periodically whenever the value of the counter lies in
a certain set. We map (w --H 1) forwarding classes onto one counter value.
The single-forward theorem below shows that there is atmost one update
in each forwarding class in each processor. Therefore, one update is sent
and at most (w --H 1) updates are forwarded in each time-step leading to a
peak-bandwidth of w.
Formally, the ith forwarding class Fj consists of updates which have trav-
elled distances in the range [f?, ft+i --H 1] i.e. Ft = [f?, f?+i --H 1] where
fi = 1
De1ay(f?+i + 1) = Pernod(f2- + 1) + Deiay(f? + 1)
We describe the mappings of forwarding classes onto counter values now.
Let G(i) = F(??i)(??i)+i U F(i?l)(w?l)+2 U ... U Fi(w?i).
Let Forwd(G(i)) = v, denote that for any update which has travelled a dis-
tance d lying in G(i), Forwd(d) = v.
Perio d+1
Let Pd = erio d) Then
d--H1			d--H1
Forwd(G(?(P1 --H 1) + k)) = k fi p?,			1 < k <Pd
5--Hi			j=i
Note that De1a?(f?) depends on the forwarding condition that classes F1 to
Fj are mapped onto. Specifically, if Forwd(f?) = v then
10
fi+1 = fi + Pernod(f,- + 1)/Period(v+) where v+ = min(d Period(d) > v?
Example 1 Consider Period(d) --H 2d-I Let us enumerate the forwarding
classes when the available bandwidth w is 2.
fi = 1
= i			(easy to show by inductioni
So Fj = [fi,fi+i --H ii = [i,iJ and G(i) =
Forwd(G(i)) = Period(i).
We can calculate the delay as follows:
Delay(d) = Period(d --H 1) + Delay(d --H 1) = ??d1--H1 Period(i) = 2d--H1 --H 1
From here onwards, we identify an update with the distance it has already
travelled. Thus update d refers to an update that has travelled a distance d.
We say that an update d is forwarded normally when (d --H 1) lies in the same
forwarding class as d does. Otherwise, we say that update d is forwarded
on promotion. This difference becomes significant because of the following
lemma.
Lemma 9 Let the delay to forward an update normally in forwarding class
Fj be A,. Then the delay to forward on promotion i.e. the delay to forward
update fi+i is A,.
Theorern 10 (Single-Forward) Consider two updates c and d at the same
processor at the same time-step with c being the highest update which is less
than d. Then if d is in the ith forwarding class, c must be in the (i-1)th
forwarding class. i.e.
V updates c,d: (? update a with c ? a < d) A (d E F,)) ? c E F\--Hi
By the above theorem, transformation,3 seems to be close to optimal. Thus,
we now have a family of protocols derived from the basic protocols ?1, fli
and P1 by applying transformations ? or,3. Of course, it is possible to apply
both in conjunction. We try to extend these ideas to higher dimensions in
the next section.
3 Higher Dimensional Periodic Update
First we try to extend protocols ?1 and T'1 to two dimensions and then
consider extensions to higher-dimensional meshes.
11
Name Period(d) Forwd(d) Peak bandw?dth Delay(d)
2			next step			log*(2?n)			d
2 og			next step			2?			d
Q2			2 2 og			next step			2log(2?n)			d
P2			2 og			next step			loglog(2yW)			d
Table 3: 2D Versions of the Basic 1D Protocols
3.1 2D-Meshes
We consider the ?x ,/=nmesh with interconnections to 4 nearest neighbors.
Distance is now the sum of the x-distance and the y-distance. We only
describe the UP and LEFT directions for updates. This corresponds to
the upper left quadrant. The other quadrants are symmetrical. Two basic
protocols are shown in Table 3. (The maximum distance between any two
processors on a x ?mesh is 2?n.) In the following sections we describe
these protocols.
3.1.1 Protocol ?2: 2D
First define a unique path to each processor by always moving left as much
as necessary before moving up. This path can be encoded as an address
(dL, du) where dL is the number of steps moved left and du is the number
of steps moved upwards. Now for Period(d) --H 2d we think of the counter
as being a binary counter, with a bit-value of 0 as indicating left and 1 as
indicating up. The lowest bits encode the address of the destination of the
update in unary. For example if Counter . .011100 then (dL, du) = (2,3).
Formally, the address is (dL, du) where
dL = number of consecutive 0's leftwards from least significant (rightmost)
bit of counter.
du = number of consecutive l's leftwards from (dL + 1)th bit of counter.
One update is sent in each time-step by each processor to the destination
(dL, du) derived from the current counter value. The forwarding policy is
simple. Keep forwarding an update along its unique path to the destination
processor.
The arguments for the performance values are almost the same as in the
one dimensional case, for protocol &1. Transformations 0 and P discussed
12
diagonal
anti-diagonal
update wavefront from processor y
update wavefront from processor x
o processor sending update
processor at end-point of diagonal update
o wavefront from processor x
o clash of update wavefronts at processor
Figure 3; Movement of Updates on 2D Mesh
Each edge has length 2. A clash of update-wavefronts t? shown.
for protocois ?1 are also applicable to &2.
3.1.2 Protocol ?2, ?2 and Q2: 2D
For ?2, the arguments and analysis are almost the same as in Protocol ?1.
Note that the periodicity function here is Pernod(d) d3. (Since there are
d neighbors at distance d from a processor, the lower-bound on the peak
bandwidth would become log n if we use Period(d) d2 as in Pi.) Since
all the processors on a diagonal are at the same distance from the sender,
updates propagate as diagonal wavefronts. No two processors on the same
diagonal within distance d of each other should send to distance d or greater
in the same time-step. Otherwise, the diagonal update wave-fronts will clash
as shown in Figure 3. This increases the peak bandwidth and is undesirable.
13
Formally, for any two counters ci and c2 on the same diagonal, we require,
Vd > distance(cl,c2) : (ci = Pernod(d)) exclusive-or (c2 = Pernod(d))
Let i be the index upwards and j the index leftwards for the 2D array of
processors. We initialize the counter at processor (?,j) to + i.
Now no diagonal sends a message which can clash with another message
sent from the same diagonal as can be readily verified, The peak bandwidth
analysis is similar to the one-dimensional analysis except that we look at the
number of diagonals traversed by an update. Since there are d processors
at distance d from a processor, updates have to be replicated at some point.
This is an 0(1) operation done at one endpoint of the diagonal wavefront
of updates emanating from a sender. (See Figure 3.) Transformation a is
applicable to these protocols, but it is not clear whether ? is applicable.
3.2 Higher dimensional Meshes
Extending &2 to r-dimensional meshes is straightforward if we choose Pern'od(d)
to be 2d--H2+r, We just think of the lowest bits of the counter as encoding in
unary the destination address along a unique path. `Transformations a and
P remain applicable. The details are left to the interested reader.
Extending P2 to higher dimensional meshes is not possible using the
counter initialization technique alone. Even in 3 dimensions, it can be shown
that na1 updates from the same diagonal surface will clash for any counter
initialization. The lower bound seems to suggest, however, that extending
?2 to 3 or higher dimensions is possible.
4 Conclusions and Future Work
We have presented solutions for the periodic update problem for various
periodicities on 1 and 2-dimensional meshes in terms of a basic paradigm.
We outlined two basic families of protocols ? and T' and showed that there is
a delay-bandwidth trade-off. We presented two generic techniques for mov-
ing to any desired point of the trade-off: a which simulates higher bandwidth
and p which staggers the forwarding load by putting updates into forwarding
classes.
We were unable to extend the polynomial periodicity protocols to 3 di-
mensions. We list a few other interesting questions - Can we do better than
protocols ?1 and Pi when we embed the one-dimensional mesh on a hyper-
cube by taking advantage of the higher connectivity? Does randomization
14
help? Is there a way to extend periodic update to irregularly connected
networks? Are the protocols the best possible?
In a more global vein, we have shown a communication abstraction aris-
ing in parallel implementations of scientific problems. The hope is that it
would be possible to make parallel code for such computations more effi-
cient. Certainly, this abstraction does not capture all the different types of
communication patterns but it is a beginning. A good research topic would
be to find more such abstractions.
Acknowledgements
Thanks mainly to John Hopcroft for suggesting the problem and for giving
periodic updates on content and style. And to Suresh Chari, A.J.Ganesh
and others for helpful comments.
References
[KT89]
[Kun88]
[Lei9Oj
[LMT89]
Manfred Kunde and Thomas Tensi. Multi-packet routing on mesh-
connected arrays. Symposium on Parallel Architectures and Algo-
rithms, 1:336--H343, 1989.
Manfred Kunde. Routing and Sorting on Mesh-Connected Arrays.
Lecture Notes in Computer Science 319: VESI Algorithms and
Architectures, 319:423--H433, 1988.
Tom Leighton. Average case analysis of greedy routing algorithms
on arrays. Symposium on Paralfrl Architectures and Algorithms,
2:2--H10, 1990.
Tom Leighton, Fillia Makedon, and Ioannis G. Tollis. 1-1 packet
routing problem (determinisitic) on nxn mesh. Symposium on
Parallel Architectures and Algorithms, 1:328--H335,1989.
[NS81] D. Nassimi and 5. Salini. Data broadcasting in simd computers.
IFEE Transactions on Computers, C-30:101--H106, February 1981.
[5589] Youcef Saad and Martin H. Schultz. Data communication in par-
allel architectures. Parallel Computing, 11:131--H150, 1989.
15
Appendix
Theorem 4 (Isomorphism) For any regularperiodicityfunctionperiod(d),
let X be a single-send minimum delay protocol.
Consider protocol Y which is also single-send minimum delay with
8Sy(d) = fr * Period(d) r is a positive integer?. Then at least as many
updates are sent and forwarded in X as in &.
Proof. Because of regularity of Period, d1 ? d2 ? SSy(d2) C S8y(di).
Therefore any message sent to distance d2 is always used at distance d1. Now
let us look at SS?. By periodicity, r E SS?(d) ? (r+Period(d)) ? SS?(d).
Therefore, we must have: SS?(d) = SSy(d) + f(d) where f is a function
from natural numbers to natural numbers.
Now consider d1 <d2 but f(d1) # f(d2).
ff no such d1 and d2 exist then f(d) = some constant. Thus A' is the same
as Y except for a time-shift. Therefore, their performances are necessarily
the same.
If such d1 and d2 exist then A' sends and forwards more messages than &
does. This is because updates sent to distance d2 cannot be ?ised at distance
d1 but have to be forwarded by some processor at distance d1 anyway. E]
Theorem 5 Protocols 1, ?1 and ?1 function correctly.
Proof. Simple and omitted. E]
Theorem 6 (Peak bandwidth) The peak bandwidth for protocol &1 in a
network of n processors is log* n. The peak bandwidth for Pi is log log n
and for ?1 it islogn.
Proof We will prove a more general lemma from which the theorem
follows as a corollary.
Lemma: The peak bandwidth for a single-send minimum delay protocol
based on the paradigm of Figure 2, in a one-dimensional network of n pro-
cessors, is given by the recurrence relation:
peak bandwidth = BW(n) < BW(Perio?1(n)) + 1.
Proof, Let us consider a message travelling a distance n and count the
maximum number of messages, BW(n), forwarded along with it at any time-
step. When it commences its journey the counter reads Period(n). Since all
16
time
Tn
Th
Delay(n)
Period(c)			Delay(c)
Tc
Figure 4: The Bandwidth Recurrence Relation
Update n sent at Tn and update c sent at Tc
messages are forwarded as soon as possible, messages cannot overtake each
other. So the only messages that this update can clash with are those that
are sent in the same time-step in which it is forwarded and subsequently
get forwarded along with it. It finishes its journey after n time-steps. The
highest distance c, to which a message can be sent in this time, is given
by Delay(n) = Period(c) + Delay(c) (See Figure 4.) For a minimum delay
protocol Delay(d) = d. Therefore, Period(c) + c = n or c < Period1(n).
Thus we get the recurrence: BW(n) ? BW(Period1(n)) + 1. ?
For &1, the recurrence becomes: BW(n) < BW(log n + 1) + 1. From this
we find the peak bandwidth to be log* n.
For Pi, the recurrence becomes: BW(n) < Bw(#)+1. From this we find
the peak bandwidth to be log log n.
A similar (though not identical) argument shows that the peak bandwidth
for 1 is logn. ?
Lemma 7 (Lower Bound) For any single-send protocol X the peak band-
width is at least D(?).
Proof. in protocol W, the amount of communication work required per
message is at least D(x). Since one message is sent in each time-step, the
amount of communication work that should be done per time-step is also at
least D(Y). This implies that the peak bandwidth for protocol X is lower
bounded by D(x). E]
Theorem 8 (? Transformation) A protocol X with peak bandwidth bw(n)
and delay Delay(d) can be converted by transformation ? to a protocol X?
with peak bandwidth w and delay Delay(d) [????1
17
Proof. Simple and omitted. ?
Lemma 9 Let the delay to forward an update normally in forwarding class
F? be At. Then the delay to forward on promotion i.e. the delay to forward
update fi+i is Aj.
Proof Follows directly from the mapping of forwarding classes to counter
values. El
Theorern 10 (Single-Forward) Consider two updates c and d at the same
processor at the same time-step with c being the highest update which is less
than d. Then if d is in the ith forwarding class, c must be in the (i-1)th
forwarding class. i. e.
V updates c,d: (? update a with c ? a < d) A (d E Fi)) ? c Ei Fj?1.
Proof. We want to see if two updates can be in the same forwarding
class i?. For this, the delay between the sending of the first and the time
when it goes out of Fj must be sufficient for the second one to be sent and
to come into Fj. We will show that our definition of the forwarding class
precludes this and also forces the second update to be in Ft?1.
Let d E Fj = [f1,f?+i --H 1]. Then we will prove c E [fi--Hi,fi --H 1]
(proof that fi--Hi <c)
For d and c to be in the same processor at the same time-step waiting to be
forwarded
Delay(d) ? Delay(c) + Period(x), x > c + 1
Delay(f?) < Delay(c) + Period(x) fsince d E F1 ? d > ftY
Delay(f? + 1) --H Aj?1 < Delay(c) + Period(x) ? Here Ai?i is the waiting time
for (promoted) forwarding of update f?.?
Pernod(f??1 + 1) + Delay(f?'?1 + 1) < Delay(c) + At--Hi + Period(x)
Let us see if c can be f?--Hi If so, x must be at least f?--H? + 1.
Period(f??1 + 1) + Delay(f??1 + 1) < Delay(f??1) + Aj?1 + Pernod(L?i + 1)
Delay(f??1 + 1) < Delay(f??1 + 1) --H A1?2 + Aj?1
A2?2 < Aj?1
which is true and therefore fi--Hi < c
(proof that fi > c)
For d and c to be in the same processor at the same time-step waiting to be
forwarded,
Delay(d + 1) > Delay(c) + Period(x), x > c + 1 fsince c arrives before d is
forwarded),
18
De1ay(f?+1) > Delay(c) + Pernod(x) tsince d E Fj ? d < Ji+i --H
DeIa?(f?+1 + 1) > Delay(c + 1) + Per?od(x) fsince c <
DeIay(f? + 1) + Pernod(f? + 1) > Delay(c+ 1) + Per?od(x) ? Using definition
of fj+i?
fi > c
19
