BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1280
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Stable Numerical Algorithms for Equilibrium Systems
AUTHOR:: Vavasis, Stephen A.  
DATE:: April 1992
PAGES:: 34
ABSTRACT::
An equilibrium system (also known as a KKT system, a saddlepoint system or a 
sparse tableau) is a square linear system with a certain structure. G. Strang 
has observed that equilibrium systems arise in optimization, finite elements,
structural analysis and electrical networks. Recently, G. W. Stewart 
established a norm bound for a type of equilibrium system in the case that 
the ``stiffness'' portion of the system is very ill-conditioned. In this 
paper, we investigate the algorithmic implications of Stewart's result. We 
show that all standard textbook algorithms for equilibrium systems are 
unstable. Then we show that a certain hybrid method has the right stability 
property.
END:: CORNELLCS//TR92-1280
BODY::
Stable Numerical Algorithms
for Equilibrium Systems*
Stephen A. Vavasis**
TR 92-1280
April 1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*This work supported by an NSF Presidential Young Investigator grant with matching
funds received from Xerox Corp.
**Department of Computer Science, Upson Hall, Cornell University,
Ithaca, NY 14853.
Stable numerical algoritlims for
equilibrium systems*
Stephen A. Vavasist
April 29, 1992
Abstract
All equilibrium system (also known as a KKT system, a saddle-
point system, or a sparse tableau) is a square linear system with a
certain structure. G. Strang has observed that equilibrium systems
arise in optimization, finite elements, structural analysis, and electrical
networks. Recently, G. W. Stewart established a norm bound for a
type of equilibrium system in the case that the "stiffness" portion
of the system is very ili-conditioned. In this paper we investigate
the algorithmic implications of Stewart's result. We show that ali
standard textbook algorithms for equilibrium systems are unstable.
Then we show that a certain hybrid method has the right stability
property.
1 Equilibrium systems
Recently, Strang [1986] has observed that the problem of solving the struc-
tured linear system
D -A			(1)
*This work supported by an NSF Presidential Young investigator grant, with matching
funds received from Xerox Corp.
tDepartment of Computer Science, Upson Hall, Cornell University, Ithaca, NY 14853.
for x and y arises in many physical applications. Here, D is an m x m matrix,
A is an m x n matrix, x and b are m-vectors, and y and c are n-vectors. We
call this system an equilibrium system.
The focus of this paper is on the stability of algorithms for this linear
system. Here is a table summarizing the applications, and the interpretations
of D, A, x, y, b, c in these applications. In the case of finite elements, we have
indicated the interpretations in the context of a heat equilibrium problem
(Poisson's equation).
Optimi-			Electrical			Finite			Structures
zation			networks			elements
D			Objective			Resistances			Thermal			Element
function			resistance			flexibility
Hessian
A			Constraints			Node-wire			Node-			Node-
adjacency			element			element
geometry			geometry
x			Primal			Currents			Element			Element
variables			heat flow			forces
y			Lagrange			Voltages			Nodal			Nodal
multipliers			temperatures			displacement
b			Objective			Voltage			Applied			Element
sources			heat sources			nodal forces
constraints
The reader will observe that there are some similarities among the various
interpretations. For example, one similarity is that, for the three physical
applications, x and c are measured in the same physical units (e.g., amps),
as are y and b (e.g., volts). This is also the case in certain optimization
problems, such as flow problems. This observation has some importance for
numerical algorithms.
In all of these applications, the following two assumptions are common-
place, and they are made throughout the paper.
Al. Matrix D is symmetric and positive definite.
2
A2. Matrix A has rank n, i.e., full column rank.
These assumptions imply that (1) is a nonsingular linear system with a
unique solution.
The main focus of this paper is what happens when D is severely ill-
conditioned. In the case that D is well-conditioned, the numerical problems
associated with solving (1) are generally not as troublesome, and most stan-
dard methods will give good answers. Thus, we make the following assump-
tion:
A3. Matrix D is very ill-conditioned.
The most natural framework for this assumption is an optimization algo-
rithm involving a barrier function. The primary example of a barrier function
in optimization is the class of interior point methods for linear programming.
In an interior point method, matrix D becomes very ill-conditioned when
the iterate approaches the boundary of the feasible region. (See Karmarkar
[1984] for the first interior-point method proposed for linear programming.
See Wright [1992] for a description of barrier methods, linear programming,
and their relationship.) For linear programming, since the solution is always
on the boundary of the region, ill-conditioning in D always occurs during the
algorithm.
Ill-conditioning of D can also occur in the other three applications listed
above. In electrical networks, ill-conditioning occurs, for example, when one
wishes to model the leakage currents to ground through insulators in the
electrical network. In this case, some of the resistances will be many orders
of magnitude larger than others, leading to a very ill-conditioned D. The
same ill-conditioning occurs if the linear equilibrium system is a model of a
time-varying system in which certain circuit elements have switched "off."
In structural analysis, ill-conditioning in D occurs when the elements of
the structure have widely different rigidity properties. In the finite elements
for a heat application, ill-conditioning in D would occur if part of the domain
under analysis were a thermal insulator and another part were a thermal
conductor.
In the case that D is severely ill-conditioned, it is not at all apparent
that an accurate numerical solution to (1) is possible, since ill-conditioning
in D will presumably mean extreme sensitivity to roundoff errors. A recent
theorem due to Stewart [1989] shows that, in principle, an accurate solution
3
can be computed under further assumptions. In Section 2 we review Stewart's
result. Further theory is developed in Section 3. None of the standard
algorithms are stable, however, when compared with the theoretical bound,
as we demonstrate in Section 4. Our main contribution is to argue in Section
5 and Section 6 that a certain algorithm can in fact accurately solve (1).
Most of the analysis of this paper is valid only under certain further
assumptions, which we now state.
A4. We are more interested in recovering y in (1) rather than x.
A5. In (1), the vector c is zero.
A6. Matrix D in (1) is a diagonal matrix.
In Section 8 we explain how A4, AS, and A6 could be relaxed.
A4 is similar to obtaining error bounds on individual components of the
solution of a linear system. See, for example, Chandrasekaran and Ipsen
[1991]. In our work, the objective is to obtain a bound for a block of com-
ponents of the solution.
In the context of optimization, AS means that the current point is fea-
sible. In the context of electrical networks, this assumption means that no
external current sources are applied-only voltage sources (e.g., batteries or
generators).
A6 holds in the context of interior point methods for linear programming.
It also holds for electrical networks composed of batteries and resistors. In
structure analysis and finite elements, this assumption holds in the case that
the domain is a one-dimensional object (e.g., heat flow in a thin bar). For
two- and three-dimensional objects, one usually obtains a matrix D that is
block-diagonal.
A fifth application of the equilibrium system, not listed in the above table,
is a discretization of Stokes flow. Stokes flow is a linear approximation to the
Navier-Stokes equations for incompressible fluid flow, under the assumption
of a very low Reynolds number. A6 is never satisfied for Stokes flow (even
in one dimension), which is why we do not discuss it further.
4
2 Stewart's norm-bound result
Under the assumptions made in the last section, we can apply Stewart's
theorem. Because of A4, it is useful to write an explicit formula for y in
terms of the other variables. This formula is obtained by eliminating x from
(1), and also substituting A5:
ATD?lAy = ATD?lb.			(2)
Because of Al and A2, the matrix ATD-lA is invertible. Thus, this linear
system uniquely determines y:
y = (ATD?lA)-lATD-lb.			(3)
We are now in a position to state Stewart's theorem. For the sake of
completeness, we replicate his proof of the theorem, because we refer to
some of the proof steps in our algorithm construction.
Theorem 1 (Stewart, 1989). Let V denote the set of all positive definite
m x m real diagonal matrices. Let A be an m X n real matrix of rank n. Then
there exist constants XA and XA such that for any D ? V,
(a) II(ATD?lA)?lATD?lII < XA, and
(b) IIA(ATD?lA)?lATD?lII < XA
In this theorem, we have assumed the use of some matrix norm that
is induced by a vector norm (also denoted II II)
A closely related result was obtained independently by Todd [1990j as
part of an analysis of Karmarkar's algorithm. In particular, Todd shows that
II(ATD?lA)?lATD?lbII < c(A,b).
This bound can be extended to obtain Theorem 1 with a compactuess argu-
ment involving b. Todd's proof is geometric and completely different from
Stewart's. We follow Stewart's proof, because it contains algebraic informa-
tion useful for further development.
Before beginning the proof, we discuss the interpretation of the theorem.
The matrix in (a) is (except for sign) the same matrix that is applied to b to
5
obtain y according to (3). Thus, (a) says that y cannot be much larger than
no matter how D is chosen. In the context of electrical networks, this has
a very natural physical interpretation: For a fixed electrical network with no
current sources, no matter how badly the resistors are out of scale, there can
never be a voltage level in the circuit that is much higher than the applied
battery voltages.
Statement (b) also has an algorithmic interpretation. Specifically, if we
know y and are trying to obtain x, we observe from (1) that
Dx - Ay = b.
The second term on the left-hand side is precisely
A(ATD-l A)-lATD-l b.
Thus, the matrix in (b) can occur in the process of recovering x.
Proof. (Stewart) First, we observe that statements (a) and (b) imply each
other. For example, if we could prove (a), then we would observe that
llA(ATD?lA)?lATD?l II < IlAll ll(ATD?lA)?lATD?l II,
and thus we could take XA to be II All XA. Similarly, the following inequality
allows us to derive (a) from (b):
ll(ATD?lA)?lATD?l II < ll(ATA)?lATll. llA(ATD?lA)?lATD?l II
Thus, we will prove (b) only. We first need a preliminary lemma, which
is also due to Stewart. This lemma is cited later in the paper.
Lemma 1 (Stewart). Define two subsets X and Y of IR? as follows:
X			: z = Aw for some w, and lizil =
and
Y = : ATD?lz = 0 for some D e DJ.
Then X A Y =
Here, Y denotes the closure of Y in the topological sense.
6
Proof. (Stewart) Suppose there were a z ? X n Y, so that z = Aw
for some w and lizil = 1. The statement that z is in Y means that there
exists an infinite sequence of vectors z1, z2,... converging to z, and an infinite
sequence of matrices D1,D2,... such that ATDk?lzk = 0 for all k. Taking the
inner product of this equation with w, and substituting wTAT = zT, yields
zTDk?lzk = 0 for all k. But since Zk converges to z, there is some k large
enough such that for all nonzero coordinate entries of z, the corresponding
entry of Zk has the same sign. This means that zTDk?lzk > 0, contradicting
the previous equation. I
Now we use the lemma to conchide the proof of Theorem 1. Since X is
compact and Y is closed, and since they are disjoint, there is a positive lower
bound, say p, on the distance between these spaces measured in the
norm.
Next, choose some D E D. Choose an arbitrary vector x. Let
w = (ATD?iA)?iATD?ix.
Let y = Aw. The goal is to get an upper bound on y in terms of x. A
one-line calculation shows that ATD?i(x --H y) is zero. Thus, let v = x --H
so that ATD?iv = 0. Set t = 1/??y?I. Then we have
tv + ty = tx.
Note that tv is in Y and --Hty is in X. Thus, the norm of the left-hand side
of this equation is at least p. Thus, IItxII > p, i.e.
?II?II < lxii.
This gives us the upper bound of l/p on XA, and concludes the proof of the
theorem. I
Stewart gives an estimate for XA which was later proved by O'Leary [1990]
to be an exact formula. The formula seems to requires an exponential number
of steps to compute XA Moreover, this formula is not completely constructive
in finite precision arithmetic because of the observation due to Stewart that
the parameters XA and XA in Theorem 1 do not depend continuously on A. in
particular, Stewart shows that XA for A = [0, 1]T is 1, but XA for A = [?, 1]T
tends to infinity as ? tends to zero.
7
It would be very interesting if there were an algorithm to compute or
approximate XA, XA overcoming either of these difficulties (i.e., the algorithm
gives the right answer in finite-precision arithmetic, or the algorithm requires
only a polynomial number of steps, or both). In Section 7, we give a bound
for XA, kA for a special class of matrices.
One would like to apply Theorem 1 to give algorithm stability results for
finite-precision arithmetic. Here is an example of a straightforward result in
this direction. For this theorem, is restricted to being a ?norm. (In a
?norm we know the norm of a diagonal matrix explicitly.)
Theorem 2. Let y be the second component of the exact solution to (1)
(when c 0). Let ? be a computed solution to this system, such that y is
the exact solution to the perturbed system
+E -A			(4)
where E satisfies the bound El < e ID and hell < elibli, where e <			1/3.
Then
II? --H YII < xAe Ilbil (kA + 8/3).			(5)
The strength of this theorem is that the error bound in (5) is independent
of the condition number of D.
Note that we have used the matrix absolute value notation from Golub
and Van Loan [1989]. Thus, the assumption lEl < e ID implies that E is
also diagonal.
Proof. By elimination as described above, we have the following formulas
for y and ?:
ATD?iAy = ATD?ib
and
ATD?lAy + ATFAY = ATD-i(b + e) - ATF(b + e).
Here, F denotes (D + E)-i --H D-1, i.e., ED-i(D + E)-i. Subtracting these
equations yields
ATD-iA(y - y) + ATFAY = ATD?ie - ATF(b + e).
8
We can add and subtract ATFAy on the left to obtain
ATD?lA(y - y) + ATFA(y - y) + ATFAY = ATD?le - ATF(b + e),
which can be written
AT(D-l + F)A(Y - = ATFAY - ATD?le - ATF(b + e).
Define G = D-1 + F = (D + E)-1, another diagonal matrix. Introduce new
vectors ci = G?1FAy, e2 = G?1D?1e, and e3 = G-1F(b+e). Then we can
rewrite the preceding equation as
ATGA(y --H = ATG(ei + e2 + e3).
Now we can conclude from Theorem 1 that
IJY --H ?II < x?(IIei II + IIe?II + IIe?II).
Thus, the theorem is proved once we analyze the three terms on the right-
hand side. First, we have
Ileill =
Next, we have
II(D + E)FAyII
II(D + E)?1EAyII
lIED--H1Il IlAyll
?IIAyll
e.x??.IIbII.
lIe?Il = IlG?'D?1eIl
= II(D+E)D?1eII
<--H II(D+E)D-1II hell
<--H U+?.?.llbll.
Note that 1 + ? < 4/3. Finally,
lle?II = IIG-1F(b + e)II
<?.U+?.llbll.
As above, 1 + ? < 4/3. This proves the theorem. I
9
For the rest of the paper, we regard (5) as the "ideal" bound that could be
satisfied by a finite-precision algorithm. The stability result of this theorem is
somewhat different from well-known perturbation results. Usually, a bound
on the relative error in y is obtained in terms of e and the other parameters.
A relative error bound on y is not possible for the equation at hand. Such a
bound would mean that it is not possible for II? --H ??J to be large when II?II is
small. However, one can select a nonzero b to make y arbitrarily small zero
in fact because b in general has more dimensions than y. Thus, the bound
in the theorem seems to be the best form we could hope for.
We will say that an algorithm for (1) is stable if, in the presence of finite-
precision arithmetic, an error bound of the same form as (5) is satisfied,
where e is on the order of the machine roundoff. In particular, the error
bound should have the form
Jy --H YIj < e f(A) Ilbil,
where f(A) is some function of A not depending on
If we could show that a finite-precision algorithm satisfied the hypothesis
of the theorem (i.e., the computed solution satisfied (4)), then it would au-
tomatically be stable. Unfortunately, we do not know of any finite-precision
algorithm to solve (1) that achieves the conditions of this theorem. In partic-
ular, it seems that the error bound for any obvious algorithm would involve
a backwards error term affecting not only D but also A. Since XA is not
continuous with respect to changes in A, the theorem cannot be extended to
the case when A is also perturbed. In Section 5 we derive other conditions
to yield an algorithm that is stable.
3 Further development of the theory
In this section, we develop the theory further before turning to algorithms
in the next section. First, we make a few remarks about the difference
between XA and XA Then we look at the relationship between XA, XA and
the corresponding variables for the dual problem.
The two parameters XA, XA have different properties with respect to
changes in A. If A is multiplied by a constant, then XA scales in a reciprocal
manner. Parameter XA is unchanged when A is scaled. It should be apparent
from (5) that the ideal situation is when both constants are small. Thus, we
10
will think of A as "well-conditioned" if XA is on the order of unity, and XA
is on the order of i/?? All. Note that XA is at least 1 in any norm: this is
seen by taking D to be the identity matrix in Theorem 1(b), in which case
A(ATA)-lAT is an orthogonal projection matrix with spectral radius of 1.
In the last paragraph we observed that XA is unchanged by scaling. More
generally, one sees by inspection that XA remains unchanged if A is replaced
by Al? where R is any nonsingular n x n matrix. This is the same as saying
that XA depends only on the span of the columns of A, and not the particular
basis selected for that span. The parameter XA, on the other hand, depends
also on the degree of linear independence of the columns of A.
In the proof in the preceding section, we bounded XA in terms of l/p.
We now extend that result to show that this bound is an equation: the
supremum of llA(ATD?iA)?iATD?ill taken over D ? D is exactly I/p. We
already proved that this supremum is at most l/p. For the other direction,
we observe that if x ? X and y ? Y, where X,Y are as in Lemma 1, then
the following argument puts a lower bound on x --H y. Let D be chosen so
that ATD?ly = 0, and let w be chosen so that x = Aw. Then
so
This yields
so
ATD?l(x - y) = ATD?lx = ATD?lAw
w = (ATD?lA)?iATD?i(x --H
x = Aw = A(ATD?lA)?lATD?i(x -
lixil < XA lix --H yll
But lixil = 1 by the assumption that x C X, so we see that 1/X?A is a lower
bound on llx--Hyll.
We now look into the relationship between (1) and its dual problem. Let
Z be a ntttlspace basis for AT, that is, an rn x (m --H n) matrix Z of full column
rank such that ATZ --H 0. We want to know the relationship between XA and
Xz Note that A is a nullspace basis for ZT, so any relationship between
XA, Xz will also hold if A, Z are interchanged. Note also that ?z depends
only on the span of the columns of Z: in other words, it does not matter
which nullspace basis Z is selected.
11
In order to obtain a bound for Xz, we consider two algorithms for solving
the following version of (1), where x0 is some arbitrary m-vector.
ADT );4			(6)
First, observe that the second equation of (6) is ATx = ATxo, i.e., AT(x --H
x0) = 0. This is the same as saying that x --H x0 is in the nullspace of A,
i.e., there is an m --H n-vector q such that x = x0 + Zq. Substituting this
into the first equation yields D(x0 + Zq) --H Ay = 0. Multiply by ZT, using
the fact that zTA --H 0 to obtain zTDzq = zTDxo. Solving for q yields
q = (zTDz)?lzTDxo. Substituting this formula for q into x = x0 + Zq
yields,
x = z(zTDz)?lzTDxo + xo.
Since x0 is arbitrary, we see that:
lix --H xoll
IIz(zTDz)?lzTDI; = sup			lixoll			: x0 $ O;x solves (6)
This means that
Xz = sup lx--Hxoll : D ? D;x0 $ O;x solves (6)
lixoll
By applying the triangle inequality, we obtain the bound
lxii
Xz < sup 1 + lixoil : D ED; x0 $ Ox solves (6)
(7)
Now, we compute this supremum by solving (6) in a different way. The
first equation of (6) is Dx --H Ay = 0. Multiplying by ATD-l yields ATx --H
ATD?lAy = 0. Substituting the second equation yields ATxo?ATD?lAy =
0. Solving for y yields y --H (ATD?lA)?lATxo. Use the fact that x = D?1Ay
to obtain x = D?lA(ATD?lA)?lATxo. Thus,
lixil
lixoil < llD?lA(ATD?lA)?lATii. (8)
Observe that the matrix whose norm we are taking on the right-hand side of
(8) is precisely the transpose of the matrix used to define XA in Theorem 1(b).
12
Thus, we see that the norm on the right-hand side is bounded by ?mXA,
where ?m is the multiplicative factor that makes the following inequality a
true statement for all matrices B:
lIBTIl < ?mtIBII.
For example, ?m = 1 if we use the matrix 2-norm, and ?m = m if we use the
i-norm or co-norm. Substitute (8) into (7) to obtain:
Xz < 1+?X?A
This gives a relationship between Xz and XA A formula for xz in terms of
XA is not possible without making further assumptions (for example, that
ZTZ is well-conditioned).
4 The standard algorithms
In this section we describe three standard algorithms for (1). All three of
these algorithms are shown to unstable in the sense of stability given in
Section 2. By "standard" we mean that these algorithms are described in
optimization textbooks such as Coleman [1984], Gill, Murray and Wright
[1981], and Fletcher [1987], electrical engineering textbooks such as Chua,
Desoer, and Kuh [1987], or civil engineering textbooks such as Timoshenko
and Young [1965].
4.1 Symmetric indefinite factorization
Observe in (1) that if we replace y by --Hy, the coefficient matrix becomes
the symmetric matrix
Therefore, we could think about solving this equation with a standard "sta-
ble" algorithm for symmetric indefinite linear systems. (See Golub and Van
Loan [1989] for some algorithms for this purpose.) Symmetric factorization
has the disadvantage that it does not respect the block of zeros, and therefore
requires more operations than necessary. In the electrical engineering con-
text, solving (1) directly is known as sparse tableau analysis. In optimization,
this algorithm is called the augmented system method.
13
This algorithm should be rejected for the following numerical reason:
The algoritlim "mixes" b, c on the right and x, y on the left during the
forward substitution and back-substitution process. This does not make sense
physically, because b, c and x, y have different physical units and could be
on vastly different scales numerically.
To put it another way, this algorithm is sensitive to scaling matrix D
by a constant multiple. On the other hand, y is mathematically unchanged
by such a scaling. Indeed, for a simple electrical circuit depicted in Fig. 1,
we scaled D by varying amounts and applied the Parlett-R?eid [1970] sym-
metric factorization. Substantially different values of y (the voltages) were
recovered, some accurate to less than one decimal place.
These tests, as well as the others in this paper, were conducted in MAT-
LAB. MATLAB, a software package for interactive numerical computation,
is a trademark of The Mathworks, Inc. All the computations were done in
IEEE double-precision arithmetic, with about 15 decimal digits of accuracy.
Standard stability results for algorithms like the Parlett-Reid algorithm
do not apply, because of our stated interest to recover y accurately, which is
only a part of the solution vector. Moreover, the accuracy of these algorithms
depends on the condition number of the equilibrium system (1), which is
linked to the condition number of D.
For the problem in this example, XA, XA are both small because A is
a reduced node-arc incidence matrix (see Section 7). Thus, the failure of
this algorithm to satisfy something like (5) indicates that the algorithm is
unstable, not that the problem is poorly posed.
Bj5rck [1992] has analyzed symmetric factorization of (1); his work at-
tempts to identify the proper selection of the scaling factor to be applied to
D in order to obtain X or y accurately. Even with Bj5rck's choice of scaling
factor, however, we do not see any evidence that the resulting algorithm will
be stable in the sense proposed in Section 2.
4.2 The range-space method
This method obtains y by solving (2) explicitly. Since ATD-lA is posi-
tive definite, (2) can be solved with Cholesky factorization. Note that this
method, as well as the nullspace method described below, is insensitive to
scaling of D by a constant multiple.
14
`7
(a)
10?15
A			,			1
--H1
0
010			1
001			1			0
100			0
1 0			10--H'?			0
0 --H1			1			0
1 --H1			1			0
Figure 1: The circuit depicted in (a) yields an equilibrium system of the
form (1) with D, A, b, as in (b). Here, all the resistors have size 1, and wires
without an indicated resistor have resistance 10?i5. The correct values of the
voltages of the three nodes are (1.00,1.00,0.67) accurate to two digits. The
Parlett-Reid algorithm, however, yields (1.31,1.31,0.88) when D is replaced
by l020D, and yields (0.86,0.98,0.58) when D is replaced by lO25D.
15
(a)
--H1
0
010			1			0
001			1			0
100			1
1 0			10--H15			0
0 --H1			10-15			0
1 --H1			1			0
Figure 2: The circuit depicted in (a) yields an equilibrium system of the
form (1) with D, A, b, as in (b). llere, all the resistors have size 1, and wires
without an indicated resistor have resistance 10?15. The correct values of the
voltages of the three nodes are (0.33,0.33,0.33) accurate to two digits. The
range-space algorithm, however, yields (0.28,0.28,0.28).
This method is called the range-space method in optimization. In struc-
tural analysis it is known as the displacement method. In electrical engi-
neering it is called the nodal analysis method. In finite elements, the matrix
ATD-lA is known as the assembled stiffness matrix. This matrix is known
as the Schur complement in optimization.
It should be apparent that if D is severely ill-conditioned, then ATD-l A
can also be severely ill-conditioned. This means that inaccurate answers
answers not respecting (5)) may be obtained. An example of a circuit solution
obtained with the range-space method is indicated in Fig. 2. Observe that
not even one significant decimal place is obtained in the answer.
16
In our testing of the range-space method, we observed that the choice of
numbering for edges and nodes affected the resulting answer, and some num-
berings for the example in Fig. 2 gave better answers than the numbering
depicted. It is known in the optimization community (Y. Li, private commu-
nication) that a good ordering in optimization problems is the one in which
the elements of D are sorted with the largest first. This is related to orderings
for least-squares problems (see Van Loan [1985], and Powell and Reid's [1969]
analysis of Businger and Golub's [1965] algorithm). Nonetheless, even with
a different ordering, we do not see any evidence that the resulting algorithm
will be stable in the sense of Section 2, because ATD-l A is ill-conditioned
regardless of the ordering.
4.3 The nulispace method
This method obtains x first, and then y. The essential ingredients of the
nullspace method were already described in Section 3. Following Fletcher,
we can describe this method as follows. First, a pair of matrices Y, Z is
obtained, where Y is an m x n matrix such that ATY = 1, the identity, and
Z is a nullspace basis, that is, an m x (rn --H n) of full column rank such that
ATz = 0. Matrix Y is called a right-inverse of AT.
The most stable way to obtain Y, Z is via a QR factorization of A, say
A = QR. Then Z is set to the last m --H n columns of Q, and Y is set to the
product Q1R1-T, where Qi is the first n columns of Q, and 1?1 is the first n
rows of R.
Once Y, Z are obtained, we observe that the equation ATx = 0 means
x = Zq for some (m--Hn)-vector q. This is because Z spans the nullspace of A.
Substituting x = Zq into the equation Dx --H Ay = b yields DZq --H Ay =
Now multiply by ZT, using the fact that ZTA --H 0, to obtain zTDZq =
zTb. The matrix zTDz is known as the reduced Hessian in the optimization
literature.
The linear system ?T??q = zTb is symmetric and positive definite,
and hence may be solved with Cholesky factorization to obtain q. Once q is
obtained, x can be obtained with the formula x = Zq. Finally, y is obtained
from the equation Dx --H Ay = b by multiplying through by yT, yielding
y = YT(Dx --H
This method is known as the nulispace method in optimization. It is called
the force method in the civil engineering, and loop analysis in the electrical
17
10-16
(a)
10-16			-b =
010
001
100
--H1 1 0
10
01
10-16
--H1			10--H16
--H1
0
0
0
0
0
Figure 3: The circuit depicted in (a) yields an equilibrium system of the
form (1) with D, A,b, as in (&). Here, all the resistors have size 1, and wires
without an indicated resistor have resistance 10-16. The correct values of the
voltages of the three nodes are (0.67,0.33, 0.67) accurate to two digits. The
nullspace algorithm, however, yields (0.89,0.52, 1.16).
engineering.
It should be apparent that this method suffers from the same flaw as the
range-space method. The matrix zTDz could be arbitrarily ill-conditioned,
and hence solving a linear system with this matrix could give a highly in-
accurate answer. Indeed, for the example in Fig. 3, the nullspace method
returned an answer for y without any digits of accuracy.
18
5 A stable hybrid algorithm
We now describe a stable algorithm for (1). We call this algorithm "hybrid"
because it works with both the range-space and nullspace of A. The scaling
matrix D is applied to the nulispace. Thus, the particular method described
in this section is referred to as the "nulispace-scaled hybrid" (NSll) method,
to distinguish it from other hybrid methods that we introduce later.
The first step of the NSH method is to compute a nulispace basis V
for ATD?l. Thus, V is an m x (m --H n) matrix of rank m --H n such that
ATD-lv = 0. The matrix V should be multiplied by a constant so that
IlAll = II VII
Then the linear system
[A, V]
(9)
is solved for y and q. We claim that y is indeed a solution for the original
problem. This linear system can be rewritten as Ay + Vq = --Hb. Multi-
plying by ATD-l causes the second term on the left to drop out, yielding
ATD?lAy = ATD-lb. Thus, this method solves (2) without explicitly
forming ATD?lA. Notice that D does not appear at all in the linear system
(9).
The NSll method does not appear in standard textbooks, but it has
appeared in the literature. For example, Coleman and Li [1989] suggest a
similar approach (called the "full-space" method) for optimization, but with
the scaling done in a different manner.
We now investigate the numerical stability of this method. We start with
a preliminary lemma. For the rest of this section is some ?norm.
Lemma 2. Assume the null basis V computed above is normalized so that
IlVil = IlAll. Suppose also that V is well-conditioned, in the sense that there
zs a constantO >0 such that IIVxII > (IIVII.IIxII)/Oforallx. LetM = [A, V].
Then
< 2XA lIAlI + 20 (1 + xA)
We denote the right-hand side of this inequality by ?A
Proof. We use the notation ?(M) to denote the condition number of M, that
is, IlMil IM--H1I1. The basic idea of this proof is that the condition number of
19
M = [A, V] depends on three things: the condition of A, the condition of V
(i.e., the value of ()), and the angle between the span of A and the span of V.
But this angle cannot be too small because of Stewart's lemma in Section 2.
Suppose that Mz = c for an arbitrary c. Split z into two components, say
y and q. Then Ay + Vq = c. Multiply by ATD-l to obtain ATD?lAy =
ATD?lc. Thus, II?II < xAIIcII and IlAyll < x?AIIcII by Stewart's theorem.
Next, Vq = c --H Ay, so
Finally,
Thus, we have
< lid + x?AIIcJJ.
liqil ?
0
--H IlAll (1 + xA) lid
lizil < II?II + Ilcill
0
? Ilcil XA + IlAll
(1+xA)
Since liM--H111 is the supremum of lizil in the case that lid = 1, the parenthe-
sized factor on the left hand side is an upper bound on lIM-1 II.
Now finally, observe that IlMil < 211A11. Thus, multiplying through by
211A11 gives the result. I
Note that the condition that IlAll = II VII does not have to be satisfied
exactly; it suffices in the above argument to have A and V on the same order.
The scalar ?A in the preceding lemma has a complicated form. One sees,
however, that it is independent of D, and is invariant if A is multiplied by
a constant. We now can state the main theorem about the stability of the
NSH method.
Theorem 3. Assume V satisfies the conditions of the previous lemma. Sup-
pose that the algorithm to solve the linear system Ay + Vq = --Hb in floating
point arithmetic returns approximate solution (Y, q) such that
(A + E)y + (V + F)4 = -b + e,
20
(10)
where Ii[E,F]ii < eli [A, V] ii and hell < elibil. Assume that e <
Then we have the following error bound:
iiy?yii<3e?A2
-			liAli
Ilbil			(11)
where y is the true solution.
Remarks.
1. Note that (11) indicates that the NSll algorithm is stable, because D
does not enter the bound.
2. The theorem seemingly requires that an exact nulispace basis V of
ATD-l be computed. We remark further on this in the next section.
3. The condition that the floating-point algorithm satisfies an error bound
like (10) is very natural, and any well-known stable algorithm, say
Gaussian elimination with partial pivoting, satisfies such a bound.
Proof. Let z = (yT, qT)T and z = (yT, 4T)T. Following the standard anal-
ysis of Golub and Van Loan [1989], Theorem 2.7.2 and using the previous
lemma, we obtain the bound:
But
Thus,
liz --H zil < 3e?A			lizil
lizil ? iiAI?? ii			ilbil
`CA
ii			All			libli.
llz--Hzll <?3e iikii			libli.
Finally, ll? --H ??? is at most liz --H zil. I
21
6 Obtaining V for Theorem 3
As mentioned in the previous section, Theorem 3 seemingly requires V to
be computed exactl?. In fact, it is implicit in the theorem that it suffices to
compute a matrix V such that V + F' is a nullspace?basis for ATD-l, where
F' is a small error matrix. This is because using V instead of V in (9) is
equivalent to compounding the error F in (10) with the further small error
so the same bound holds with a larger constant.
The standard general-purpose algorithms for nullspace bases do not guar-
antee that the computed nullspace basis is only a small distance from a true
nullspace basis (i.e., a forward-error bound). Instead, these algorithms guar-
antee that the computed nullspace basis is an exact nullspace basis for the
perturbed matrix ATD-l + E (i.e., a backward-error bound). This is not
useful in the present context, because adding E to ATD--Hl spoils the special
structure that allows Stewart's theorem to be applied.
We now describe a technique to obtain a V with a forward error bound.
First, we obtain a nullspace basis Z for AT. Then we obtain a nullspace
basis V for ATD?1 with the formula V = DZR, where 1? is another diagonal
matrix.
To compute Z, ?nd a subset of rows of n rows of A that form a basis,
that is, the n x n submatrix of A determined by those rows is nonsingular.
Now, for each of the remaining m --H n ?? rows, solve for the nonbasic
row in terms of the basis rows. The coefficients of this dependence form a
column vector with m entries, of which at most n + 1 are nonzero. This
yields a fundamental nullspace basis Z, that is, a nullspace basis for AT with
an embedded (m --H n) x (m --H n) identity matrix.
If we assume that XA is reasonably small, then it turns out that this
nullspace basis has a special property, namely, all of the nonzero entries have
roughly the same magnitude. If we think of a vector x as being one column of
we can state this as a lemma. In this lemma, we assume that the oc-norm
is used throughout. The constant xz arises in this lemma; recall that Xz is
bounded in terms of kA, as demonstrated in Section 3.
Lemma 3. Let x be a nonzero vector such that ATx = 0. Assume that x
has nonzero entries in positions fii,...,ip), where ?i1,...,ip) C I1,...,mJ.
Suppose also that for every subset I of size p --H 1 of fii,. .. , ip1, the rows of
A indexed by I are linearly independent. Then the magnitude of the smallest
22
nonzero entry in x is no smaller than iixiioo/xz.
Proof. Without loss of generality, we can take lixil = 1. Let ? be the magni-
tude of the entry in x with the smallest nonzero magnitude, and suppose this
entry is in position ?p. Since rows Zi,..., Zp?l of A are linearly independent,
there is a vector w such that Aw attains arbitrarily specified entries in these
p --H 1 positions. In particular, there is a vector w such that Aw agrees with
x in positions zi,..., Zp?l.
Now, let C be a diagonal matrix with entry `1' in positions zi,. . . ,Zp?l,
and very small entries in the other diagonal positions. Observe that CAw
agrees with x in positions ..... , ?p?i, and has nearly zeros in all the other
positions, including ?p. Let y = CAw. Then we observe that if we define
X,Y as in Lemma 1 for Z, i.e.,
and
X = fz : z = Zw for some w, and liz = 1)
Y = fz: zTC?lz = 0 for some C E VJ.
then x ? X and y E Y. This means that lix --H ?ii is at least l/xz But
lix --H ?ii is arbitrarily close to ? (depending on how close to zero the diagonal
entries of C other than ..... , ?p?l are). Thus ? > l/xz This proves the
lemma, since we assumed lixil = 1. I
This lemma has two important consequences. First, the lemma implies
that the nullspace basis Z is well conditioned. In particular, with suitable
ordering of the rows and columns of Z, we can write it in the form
(12)
where W is a matrix whose entries are at most k? in magnitude.
Second, the lemma has the following more subtle consequence: if the
columns of Z are calculated with a small forward error, this means in fact
that these entries have a small componentwise forward error. We can express
this formally as follows. Let Z be the nullspace basis of A computed with
the above algorithm using exact arithmetic, and let Z be the nullspace basis
computed in the presence of floating point errors. If we assume that the
columns of Z are computed with a small forward error, that is,
lix --H xii ? dlxii,
23
where x is a column of Z and x is a column of Z, then we immediately obtain
the result:
Ix --H xl < ?xzlxl.
The componentwise error bound is important for the following reason: to
obtain V, recall that we form the product DZR, where D is the diagonal
matrix from (1), and R is a diagonal matrix not yet specified. Because D, R
are both diagonal and Z has a small componentwise error, then DZR also
will have a small componentwise forward error (and therefore, a small norm
forward error) no matter how D, R are chosen.
The argument in the last two paragraphs presupposed that the columns
of Z would have a small normwise forward error. A small normwise forward
error is obtained if the basis rows of A are well-conditioned, because the
columns of Z are solutions to a linear systems involving the basis rows. So,
now we ask whether the basis rows of A are well-conditioned. In fact, there
is a bound on the condition number of the basis rows. For simplicity of
notation, assume the basis rows of A are the last n rows (this corresponds to
the ordering in (12)), and let B be the n x n submatrix of basis rows. Let b
be an arbitrary n-vector, and let b be its extension to IR? obtained by filling
in the first m --H n entries with zeros.
Let C be the diagonal matrix with l's in the last n diagonal positions and
very small entries in other positions. Then ATCA NN BTB and ATcb NN BTb,
where the approximations can be arbitrarily close as we make diagonal entries
of C close to zero. Thus, if vector x is the solution to ATCAx --H ATcb then
BTBx BTb. Since B is square and invertible, this last equation means
that Bx N b. But we also know that lIxIl < XAllbll, so lIxIl < XAllbll. Recall
that for A "well-conditioned," we should have XA = ?/llAll, where ? is a
small constant. Finally, IlAll > IlBil 50 XA < ?/llBll. Thus, we have the
bound
lixil			? ll')ll			??ll'?ll
Since Bx N b and b is arbitrary, then we can fix Ilbil = 1 and take the
supremum of lix II over all such b's to obtain the bound:
lIB--H1lI ? (kIlIBlI.
Multiplying through by IlBIl shows that B has a bounded condition number.
24
Thus, we conclude that we can compute Z using the above algorithm such
that Z is simultaneously well conditioned and has a small forward compo-
nentwise error. Now, we apply D on the left. As we see from (12), the upper
--H			x			--H n) block of DZ is some diagonal matrix. Finally, choose a
diagonal matrix 1? so that the upper			--H			x			--H n) block of V = DZR
is once again the identity matrix. (This uniquely determines R.)
This algorithm gives a nullspace basis V for ATD-l with a small forward
error. Moreover, V has the form
V
(13)
There is no reason to believe, however, that V is well conditioned. In fact,
can have arbitrarily large entries if the basic rows of A have large corre-
sponding entries in D.
We now show how to maintain control over the size of W' by correctly
choosing the basis rows in A. (Until now, we have not made any assumptions
about the basis rows except that they are independent.) The algorithm for
obtaining B is the following "greedy" approach. Assign weights to the rows,
where the weight of the ith row is the (i, i) entry of D. Now, select the
row with the lowest weight and insert it in B. Continue appending the
lowest-weighted remaining row of A to B. Before appending a row to B, one
checks that the row is linearly independent from the rows already in B. If a
dependence is found, then the row is discarded. Continue this process until
B has n rows.
Since A is sparse for many applications, this process of testing rows for
independence from previous rows should involve both combinatorial and nu-
merical tests for dependence, such as the inner loop of Gilbert and Heath's
[1987j algorithm for nullspace bases. Indeed, for RNAI matrices (described
in the next section), the process of selecting B is purely combinatorial and
very efficient.
We now claim that if B is selected with this greedy approach, then W' in
(13) has a small bound. Consider the (i,j) entry of V. Assume i> m--Hn, so
that the entry in question falls in the W' portion in (13). Assume that this
entry is nonzero. This means that the jth row of A is expressed in terms of
rows m --H n ...... ,m with a nonzero coefficient for the ith row. We claim
that this implies that dii < djj. Suppose not; suppose that djj <dii. Observe
25
that, by the positions of the rows, i is basic and j is nonbasic. This means
that j was passed over by the greedy algorithm for forming the basis, since
row z was added to B even though it has a higher weight than row j. This in
turn means that row j must have been linearly dependent on rows already in
B when it was encountered by the greedy algorithm. But this is impossible,
because we already know that row j can be expressed in terms of the basis
rows with a nonzero coefficient for row i, so row j cannot be dependent on B
until after row i is added to B. Thus, this contradiction shows that dii <
Finally, when we scale Z on the left by D and on the right by R, it is easy
to see that the (i,j) entry of V is equal to the (i,j) entry of Z multiplied by
dii/djj, which we now have shown to be a positive scalar no larger than one.
This shows that the entries of W' in (13) are no larger than the corresponding
entries of W in (12), and we already have argued that W has a small bound.
If V has the form (13) with a bound on W', then it is well-conditioned in
the sense of Theorem 3. Let x be an arbitrary vector, and consider y = Vx.
Observe that y agrees with x in its first m --H n positions. Thus, II?II > lixil.
On the other hand, jjV? ? 1 + jW'I?. Thus, if we take 0 = 1 + ?W'II, then
we have the bound
IIVxII > (IlVil IIxII)/O
needed for Theorem 3.
The only property remaining that V must have is the equation II VII =
IlAll. This is easily obtained by scaling V uniformly.
7 RNAI matrices
A special class of matrices A arising in many applications are reduced node-arc
incidence (RNAI) matrices. Let G be an directed graph, weakly connected,
with no parallel edges and no self-loops. (The assumptions about connec-
tivity and parallel edges could be easily removed at the expense of a more
complicated exposition.) Assume G has m arcs and n + 1 nodes.
The node-arc incidence matrix for G is a matrix A0 with one row for each
edge of G and one column for each node. Each row contains two nonzero
entries, a `1' and a `--H1.' The `--H1' entry occurs in the column corresponding
to the tail of the arc, and the `1' entry occurs in the column corresponding
to its head.
26
The reduced node-arc incidence (RNAI) matrix A for G is obtained by
deleting a column of A0 corresponding to an arbitrary node. The deleted
node is called "ground" in an electrical engineering context. In an RNAI
matrix, each row has either a `1'/'--H1' pair of entries, or a lone `1' or lone
`--H1' in rows corresponding to arcs with one end grounded.
Electrical networks are the main application in which A of (1) turns out
to be an RNAI matrix. Indeed, the matrix A occurring in Fig. 1 to Fig. 3
is an RNAI matrix. Optimization problems with flow constraints can give
rise to RNAI or related matrices. Structural analysis in civil engineering can
give rise to matrices A that have a block-RNAI structure.
It is a well-known fact of algebraic graph theory (see, for example, Welsh
[1976]) that an RNAI matrix A has full column rank. The process of finding
a basis among the rows of A corresponds to identifying a spanning tree of
G0, where G0 is the undirected graph that results when the arcs of G are
stripped of their orientation. A spanning tree is a subgraph T of G0 that
is incident upon every vertex of G0, is connected, and has no cycles. The
resulting nullspace basis Z has the form (12), with the additional property
that every entry in W is either 0, 1, or --H1. Thus, there is a very good upper
bound on
The greedy algorithm described in the previous section corresponds to
finding a minimum-wetght spanning tree. The minimum-weight spanning
tree can be computed very efficiently; see, for example, Papadimitriou and
Steiglitz [1982]. From this spanning tree we construct Z as described above,
and then V by multiplying DZR. The process of obtaining Z and the basis
is completely combinatorial for an RNAI matrix.
We have tried this version of the NSH algorithm on all the problems in
Section 4 in MATLAB. We used an O(m log n) algorithm for constructing
minimum-weight spanning trees. We obtained answers with about 15 digits
of accuracy in all cases.
These results show that we can easily construct a V with the right proper-
ties for Theorem 3 for RNAI matrices. A natural followup question is whether
the constants in the theorem, namely XA and XA, have reasonable bounds for
RNAI matrices. Such bounds are also important because we want to argue
that the incorrect answers obtained with the standard methods in Section 4
are due to numerical problems with the algorithms, and not ill-conditioning
in A.
In fact, both constants XA, XA have small bounds for RNAI matrices, as
27
stated in the following theorem.
Theorem 4. Let A be an RNAI matrix of size m x n. Then XA < n2 and
XA ? n if the co-norm is used.
Proof. This proof will directly put a lower bound on the distance between a
vector ? ? X and z E Y, where X, Y are as in Lemma 1. This gives a lower
bound for p in the proof of Theorem 1, and therefore an upper bound for XA
We define a cut of G to be a partition of its nodes into two disjoint
nonempty sets, 5 and T. The cut-arcs are those arcs with one endpoint in 5
and the other in T. The forward cut-arcs are those with tail in 5 and head
in T; the reverse cut-arcs are the opposite.
Let z be a vector such that ATz --H 0 where A is the RNAI matrix of G.
Note that z has m components, and each component of z is associated with
an edge of G. Let (5, T) be an arbitrary cut of G. We claim that the sum of
the Zj `s assodated with forward cut-arcs is exactly equal to the sum of Zj'5
associated reverse cut-arcs. To see this, assume the ground node is in T, and
focus on 5 (if the ground node were in 5, we would focus instead on T). Let
q be the n-vector "indicator" for 5 that has a 1 in the ith position if the ith
node is in 5, else a zero. Since ATz = 0, we have qTATz --H 0. Letting v
denote Aq, we have vTz = 0.
But there is a combinatorial interpretation for the entries in v. In par-
ticular, for an arc with either both endpoints or neither endpoint in 5, the
corresponding entry of v is zero. For a cut-arc, the entry of v is +1, with
the sign depending on which endpoint is in 5. Thus, the statement vTz = 0
means precisely that the sum of the Zt"5 on forward cut-arcs is equal to the
sum of Zj?S on reverse cut-arcs.
An implication of the result in the last paragraph is as follows. For any
cut (5, T), and for any z such that ATz --H 0, there has to be either a forward
cut-arc with a nonnegative value of Zj, or a reverse cut-arc with a nonpositive
value of Zj. This implication follows by contradiction: if all the forward cut-
arcs have negative Zj`5 and all the reverse cut-arcs have positive Zj, then their
sums could not be equal.
Now, more generally, let z be a vector such that ATD?lz = 0, where D
is an arbitrary positive definite diagonal matrix. Then, since z has the same
pattern of signs as Dz, and Dz is a null vector of AT, we conclude that for
any cut (5, T), z has the property described in the previous paragraph.
28
Next, let ? be a vector such that ????? = 1, and ? = Aw for some w.
The statement ? = Aw has a combinatorial interpretation: each entry of w
may be thought of as "potential" associated with the corresponding node of
G (with the "potential" of the ground node being zero). Then if ? = Aw, we
can think of the component of ? associated with each arc to be the "potential
drop," that is, the difference in potential between the head of the arc and
the tail.
Since ??1?? = 1, there is at least one potential drop of magnitude 1.
Let us write down the potentials of all the nodes, tv???'??.; `u'n1+i' including
ground, in decreasing order. In other words, the list ti'11, w2,.. . ?tvn1+i has the
same entries as the vector (w, 0), but in decreasing sorted order.
Then the difference between the tv'? and tv'n+i is at least 1. This means
that, in the list of sorted potentials, there must be a gap somewhere in the list
of size at least 1/n. In other words, there is a j such that tv,' --H > 1/n.
Now, let (5, T) be the cut such that 5 contains the nodes associated with
and T contains the nodes associated with ?;??,. ?,ti'n'+i Then
every node in 5 has potential at least t/n higher than every node in T.
This means that the component of ? associated with any forward cut-arc is
negative with magnitude at least 1/n, and the component of ? associated
with any reverse cut-arc is positive with magnitude at least 1/n.
But finally, this means that if z is a vector such that ATD?lz = 0, then
for the cut (5, T) constructed in the last paragraph, there is at least one
forward cut-arc where the components of ? and z differ by i/n, or at least
one reverse cut-arc where the components differ by 1/n.
This shows that if A is an RNAI matrix, then the minimum distance
between the two sets X and Y described in Lemma 1 is at least 1/n in the
co-norm. Thus, XA < fl,
Now we bound XA The argument so far has shown that if y is a solution
to ATD?iAy = ATD-ib with IIbII? = 1, then IAy??? < n. The remaining
task is to get a bound on IIyII? Recall the combinatorial interpretation of
y and Ay: if y represents potentials at the nodes, then Ay represents the
potential drops. If all potential drops are at most n, the no node could have
a potential higher than n2, since there is a path of at most n arcs between
any node and ground. Thus, we see that II?II? < n2 I
29
8 Generalizing the problem
In this section we comment on the possibility of lifting some of the assump-
tions made in Section 1. One assumption made throughout the paper is that
we are interested in recovering y in (1) rather than x. Assuming we have y,
x can be recovered from the equation Dx --H Ay = b. It is easy to see that
we can compute Dx stably (i.e., with only small normwise error). Then x
is obtained by scaling the computed Dx with D-1. The recovered x would
have a relative error that could be on the order of the condition number of D.
This is inherent in the problem equation (1) is ill-conditioned with respect
to x when D is badly scaled, and there is no theorem like Theorem 1 that
holds for x. Thus, we see that recovering Dx accurately is the best that
could be hoped for, in the case that b $ 0, c = 0.
This instability is not merely a numerical artifact there is a correspond-
ing physical instability. For example, in the context of electrical networks
with batteries, it is possible to construct a circuit as follows. One arc of
the circuit has very low resistance, but also very small current because the
battery voltages are balanced so that the potential across the arc is small.
Then a small perturbation to the battery voltages (i.e., b) could cause a great
multiplication of the current (i.e., x). On the other hand, Dx, the voltage
drops across the resistors, would suffer only a small perturbation.
Another assumption was that c = 0. Can we drop this assumption? One
special case, which is dual to the case considered for most of this paper, is
when c $ 0, b = 0, and we are interested in recovering x instead of y. In this
case, let Z be a nulispace basis for AT, and let x0 be an arbitrary solution to
the underdetermined system ATx = c. Then the formula for x is obtained
by solving: zTDzq = zTDxo, and then setting x = Zq + x0. The equation
zTDzq --H zTDxo is of the exact format of (2). Thus, we can apply the
machinery developed in this paper, except with Z in place of A, x0 in place
of b, and D in place of D-1. Notice that the nullspace of ZT is the same
as the the range of A. Thus, the version of the hybrid algorithm for this
problem would be the range-space scaled hybrid algorithm, R?SH. Note also
that we have derived relationships between XA and kz in Section 3.
In the most general case that b $ 0, c $ 0, it seems unlikely that we could
obtain an algorithm that computes either x or y accurately when D is ill-
conditioned, because both variables become ill-conditioned. A compromise
algorithm might be as follows. Let Z be a basis for the nullspace of AT.
30
Then solve the linear system:
D?112Ay + D112Zq = --HD-112b + D112x0
where x0 is as in the previous paragraph. Multiplying through by ATD-1/2
shows that this solves the general case of (1) for y. This method has the
advantage that it "splits" the ill-conditioning between the nullspace and the
range-space. We could call this the bi-scaled hybrid algorithm, or BSH.
The last restrictive assumption made in Section 1 is that D is diagonal.
As mentioned earlier, in finite element and structural problems in two and
three dimensions, we can in general assume only that D is block diagonal.
Stewart's theorem does not generalize to nondiagonal matrices, as shown in
his paper. In the applications of finite elements and structures, however, A
has a certain sparsity pattern that is correlated to the blocks of D. Perhaps
a result could be established for special structured matrices A.
9 Open questions
Perhaps the number of open questions raised by this work exceeds the number
of results we have obtained. Here are some examples of open questions.
Is there a good algorithm to obtain XA, x?A? Here "good" means either
that the algorithm is polynomial time, or that it gives an approximate
answer in finite precision arithmetic, or both.
2. What are useful bounds for XA, kA in the case that A has the special
structure arising in applications more general than RNAI matrices?
3. Can Stewart's theorem be generalized to a block-diagonal D in the case
that the structure of A is correlated to the structure of D?
4.
As mentioned in Section 4, there are techniques known in the literature
that can be applied to the standard algorithms (like Bj5rck's choice
of the scaling factor); these techniques seem to stabilize the standard
algorithms. Could it be proved that one of the standard algorithms is
stable if one of these techniques is used? The standard algorithms are
much simpler than the NSll method.
31
5.
This paper has not dealt at all with issues of algorithmic efficiency, but
there are many. For instance, in the NSll method, do we explicitly have
to solve the m x rn system (9), or can we approach it implicitly? The
algorithm for computing Z described in Section 6 was geared solely
towards numerical stability. Recently, Stern and Vavasis [1992] have
proposed a way to compute Z for an RNAI matrix based on separators
of G0. The Stern-Vavasis algorithm aims for sparsity of Z rather than
stability. Is there some combination of the algorithm here with the
ideas from the other paper that attains both numerical stability and a
sparse nullspace basis Z?
6. For the general case of recovering both x and y, is the BSll method
described in Section 8 any better than the range-space or nullspace
methods? We tried the BSH algorithm using a nullspace basis Z gen-
erated by QR factorization described in Section 4 (in particular, not
any special spanning-tree basis). We obtained very accurate answers
for all the problems in which the standard methods failed. Is there an
explanation for the success of the BSH method?
7. Are there good iterative methods for (1)? For instance, is there an
iterative method whose convergence rate depends only on A, and gives
a good solution for y in the sense of a bound like (5)?
Acknowledgments
The author benefitted from discussion of these ideas with Tom Coleman,
Yuying Li, Danny Ralph, Mike Todd, Nick Trefethen and Charlie Van Loan
of Cornell, Alan Edelman of Berkeley, Nick Higham of Manchester, and Ilse
Ipsen of Yale.
References
A. Bj5rck [1992], Pivoting and stability in the augmented system method,
Numerical Analysis 1991: Proceedings of the 1?th Dundee Conference,
Longman Scientific and Technical, Harlow, Essex, England.
P. Businger and G. II. Golub [1965], Linear least squares solutions by House-
holder matrices, Numer. Math. 7:269--H276.
32
5. Chandrasekaran and I. Ipsen [1991], Perturbation theory for the solu-
tion of systems of linear equations, RR-866, Department of Computer
Science, Yale University, New Haven, CT.
L. O. Chua, C. A. Desoer, and E. 5. Kuh [1987], Linear and Nonlinear
Circuits, McGraw--HHill, New York.
T. F. Coleman [1984], Large Sparse Numerical Optimization, Lecture Notes
in Computer Science vol. 165, Springer Verlag, Berlin.
T. F. Coleman and Y. Li [1989], A globally and quadratically convergent
affine scaling method for linear li problems, Technical Report 89--H1026,
Department of Computer Science, Cornell University, Ithaca, NY. Also,
Math. Progr., to appear.
R. Fletcher [1987], Practical Methods of Optimization, 2nd Edition, J. Wiley
and Sons, Chichester.
J. Gilbert and M. Heath [1987], Computing a sparse basis for the null space,
SIAM J. Alg. Disc. Meth. 8:446--H459.
P. E. Gill, W. Murray, and M. II. Wright [1981], Practical Optimization,
Academic Press, London.
G. H. Golub and C. F. Van Loan [1989], Matrix Computations, 2nd Edition,
Johns Hopkins Univ. Press, Baltimore.
N. Karmarkar [1984], A new polynomial-time algoritlim for linear program-
ming, Combinatorica 4:373--H395.
D. P. O'Leary [1990], On bounds for scaled projects and pseudoinverses,
Linear Algebra and its Applications 132:115--H117.
C. H. Papadimitriou and K. Steiglitz [1982], Combinatorial Optimization:
Algorithms and Complexity, Prentic?Hall, Englewood Cliffs, N.J.
B. N. Parlett and J. K. Reid [1980], On the solution of a system of linear
equations whose matrix is symmetric but not definite, BIT 10:386--H97.
M. J. D. Powell and J. K. Reid [1969], On applying Houscliolder's method
to linear least squares problems, Proc. IFIP Congress, 1968.
33
J. M. Stern and 5. A. Vavasis [1992], Nested dissection for sparse nulispace
bases, SlAM J. Matrix Anal. Appl., to appear.
G. W. Stewart [1989], On scaled projections and pseudoinverses, Linear
Algebra and its Applications 112:189--H193.
G. Strang [1986], Introduction to Applied Mathematics, Wellesley-Cambridge
Press, Wellesley, MA.
5. P. Timoshenko and D. H. Young [1965], Theory of Structures, 2nd Edi-
tion, McGraw--HHill, New York.
M. J. Todd [1990], A Dantzig-Wolfe-like variant of Karmarkar's interior-
point linear programming algorithm, Operations Research 38:1006--H1018.
C. F. Van Loan [1985], On the method of weighting for equality-constrained
least-squares problems, SlAM J. Numer. Anal. 22:851--H864.
D. Welsh [1976], Matroid Theory, London Mathematical Society Monograph
8, Academic Press, London.
M. II. Wright [1992], Interior methods for constrained optimization, in Acta
Numerica 1992, Cambridge University Press, Cambridge.
34
