BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1364
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Stable Finite Elements for Problems With Wild Coefficients
AUTHOR:: Vavasis, Stephen A. 
DATE:: June 1993
PAGES:: 43
ABSTRACT::
We consider solving an elliptic boundary value problem in the case that the 
coefficients vary by many orders of magnitude over the domain. A linear 
finite element method is used. It is shown that the standard method for 
solving the resulting linear equations in finite-precision arithmetic can 
give an arbitrarily inaccurate answer because of ill-conditioning in the 
stiffness matrix. A new method for solving the linear equations is proposed. 
This method is based on a ``mixed formulation'' and gives a numerically 
accurate answer independent of the variation in the coefficients. The 
numerical error in the solution of the linear system for the new method is 
shown to depend on the aspect ratio of the triangulation.
END:: CORNELLCS//TR93-1364
BODY::
Stable Finite Elements for
Problems With Wild Coefficients*
Stephen A. Vavasis**
TR 93-1364
June 1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*This?work supported by an NSF Presidential Young Investigator grant, with matching
funds received from AT&T and Xerox Corp. Part of this work was done while the
author was visiting Sandia National Laboratories, supported by the U.S. Department
of Energy under contract DE-ACO4-76DP00789.
**Department of Computer Science, Upson Hall, Cornell University, Ithaca, NY
14853. vavasis?cs.cornell.edu
Stable finite elements for problems with
wild coeflicients*
Stephen A. Vavasist
June 27,1993
Abstract
We consider solving an elliptic boundary value problem in the case
that the coefficients vary by many orders of magnitude over the do-
main. A linear finite element method is used. It is shown that the
standard method for solving the resulting linear equations in finite-
precision arithmetic can give an arbitrarily inaccurate answer because
of ill-conditioning in the stiffness matrix. A new method for solving
the linear equations is proposed. This method is based on a "mixed
formulation" and gives a numerically accurate answer independent of
the variation in the coefficients. The numerical error in the solution
of the linear system for the new method is shown to depend on the
aspect ratio of the triangulation.
1 A boundary value problem
The problem we consider is
V (cVu) = 0 on
u=g			ona?.
(1)
*This work supported by an NSF Presidential Young Investigator grant, with matching
funds received from AT&T and Xerox Corp. Part of this work was done while the author
was visiting Sandia National Laboratories, supported by the U.S. Department of Energy
under contract DEACO4-76DP0O789.
tDepartment of Computer Science, Upson Hall, Cornell University, Ithaca, NY 14853.
Here, ? is a polygonal subset (i.e., a finite union of simplices) of Rd that is
topologically equivalent to a ball, but not necessarily convex. The dimension
d of the problem is commonly 2 or 3 in applications, but our results hold in
arbitrary dimensions. Scalar coefficient field c is given, and is a function from
fl to IR. This field must be positive at every point of the domain. Function
u is the unknown scalar field on ? that is to be computed. Finally, the
Dirichlet boundary data g is also given; this is a scalar field defined on the
boundary.
Solving this equation is a classical problem in mathematical physics and
arises in many applications. For instance, u in (1). can describe the equilib-
rium temperature distribution a body fl whose boundary is held at a certain
constant temperature distribution g. In this thermodynamic application, c
represents the thermal conductivity of the body, and cVu represents the heat
flow at any point.
In this paper we focus on the case that c varies by many orders of mag-
nitude over the domain, and the values of c do not follow any particular
pattern. We say that such a c is "wild." This problem arises in physical
applications in which the underlying domain is composed of materials with
significantly varying properties, and the arrangement of the materials has no
special geometry.
Regardless of how much c varies, (1) satisfies a "maximum principle."
This principle states that the maximum and minimum value of u must occur
on the boundary. See Protter and Weinberger [16] for more information
about maximum principles.
Since (1) is a linear problem, the maximum principle applies also to per-
turbations. In particular, suppose g is perturbed by small amount to yield
new boundary data g. Let u be the solution to (1) for the perturbed data.
Then, by linearity, u --H u is the solution to (1) for boundary data g --H g, and
thus
IIu --H uIIL?(?) < II? --H
The observation in the last paragraph tells us that (1) is a well-conditioned
problem mathematically, in the sense that the solution is insensitive to per-
turbations in the data, regardless of c. Nevertheless, the standard numerical
method for obtaining u is ill-conditioned. Specifically, the condition number
of the assembled stiffness matrix tends to Co as the variation in c increases,
as we argue in Section 2. This situation of a mathematically well-conditioned
2
continuous problem whose numerical method is ill-conditioned is anomalous,
and suggests that the wrong numerical method is being used. In Section 3
to 5 we propose a new method, the NSllI method, for solving the system of
linear equations that has a numerical stability bound independent of c. This
new method does not form the stiffness matrix, but instead approaches it
implicitly via a mixed formulation.
It is important to stress that, in exact arithmetic, our proposed NSHI
method is equivalent to the standard finite element method. Both compute
the same piecewise linear approximation to u. The difference between the
methods is evident only when finite-precision floating point arithmetic is in-
troduced, in which case the ill-conditioning of the standard method magnifies
roundoff errors to give the wrong answer.
The stability bounds for the new method depend on parameters XA, XA
defined by Stewart [18], and applied to a certain "reduced geometry matrix"
A that we introduce in Section 3. In Section 6 we derive bounds on these
parameters in terms of the geometry of the problem; specifically, we show
that the bounds depend on the aspect ratio of the simplices in the finite
element triangulation. Finally, in Section 8 we describe our computational
experiments for the case d = 2 comparing the standard method to the NSHI
method.
2 The finite element method
In this paper we use a piecewise linear finite element method, which we briefly
describe in this section. See Strang and Fix [21] for an overview of the finite
element method. First, one forms a triangulation of ?, that is, a partitioning
of ? into simplices. Let no denote the number of nodes in the triangulation,
of which n are interior and n0 --H n lie on the boundary. Let m denote the
number of simplices. Let V..... , V?0 denote the nodes of the triangulation,
with the first n entries being the interior nodes. Let ..... . , Tm denote the
simplices.
The next step is to make the assumption that the solution u to (1) is
piecewise linear and continuous, with the pieces of linearity being the sim-
plices. This assumption means that once the values of u at the n interior
nodes are known, then all of u is known because the values at other points
of ? may be found with linear interpolation across simplices. (Note that at
3
the n0 --H n boundary nodes, u is already known from the boundary data.)
Let y be the n-vector ....... , y?)T encoding the nodal values of u; it is this
n-vector that we must compute.
To obtain a system of equations for y, we introduce basis functions
?i,..., ??, which are functions from ? to IR. These functions, commonly
known as "hat functions," are piecewise linear and continuous, and have the
property that ?i(vi) = 1 and ?i(vj) = 0 for j $ i. This uniquely defines each
on all of ? by linear interpolation. These functions depend only on the
triangulation, not on u or g. It is straightforward to prove that ?i,..., ??
form a basis for all functions that are piecewise linear on ? with respect to
triangulation ..... . , Tm and that have boundary value identically zero.
Next, let ?o be a piecewise linear function that is zero at all interior
nodes and agrees with the boundary values at the boundary nodes; again
this uniquely defines ?. The consequence of these definitions is that u may
be written as
u = ? + SYi?
(2)
where the ?j'5 are the unknown interior nodal values of u.
Next we pass to the so-called "weak form" of (I); this form is obtained by
multiplying both sides of the differential equation by ?j for some?, integrating
over ?, and applying Green's theorem. The end result is that one obtains
the equations
c(V?, Vu) = 0			(3)
for all i = 1,... , n. llere, (.,.`) is the standard inner product in Rd
Up to now, what we have described is common to both the standard
method and the NSllI method. The standard method proceeds from (3) as
follows. Substituting (2) into (3) yields n equations
Yif?c(V?,VQ? =
j=1
(4)
for? = 1,...
These equations are linear in ..... . , yn; the coefficients on the left-hand
side depend on c and on the triangulation. The quantity on the right-hand
side depends on c, on the triangulation, and on the boundary data g.
4
Based on (4), we form an n x n linear system
Ky =
The (i,j) entry of K is taken from (4):
Ki5 =
(5)
(6)
This matrix K, known as the assembled stiffness matrix, is clearly seen to
be symmetric. It is also easily seen to be sparse, since the supports of bi
and bi will be disjoint or have a measure-zero intersection unless i and j are
neighboring nodes in the triangulation. Finally, it can be proved that K is
positive definite (this will follow from the discussion below).
Once the linear system is assembled, it is solved, for instance, with sparse
Cholesky factorization as described in by George and Liu [11]. We will call
this method CFSM ("Cholesky factorization of the stiffness matrix") and
regard it as the standard approach.
The difficulty with the CFSM method in the case of wildly varying co-
efficients is that K may be arbitrarily ill-conditioned. The ratio between
diagonal entries Kjj and Kjj, for instance, depends on the ratio of c in the
support of bi versus c in the support of bi. The condition number of a sym-
metric positive definite matrix is bounded below by the maximum ratio of
any two diagonal entries. Thus, it follows that for a fixed triangulation, the
condition number of K will grow like max(c)/ min(c).
The effect of this ill-conditioning is that if Ky = --Hf is solved using
floating point arithmetic, one can expect a relative error on the order of
e max(c)/ min(c), where e denotes machine epsilon. Thus, in IEEE double-
precision arithmetic, we would expect the CFSM method to collapse when
max(c)/ min(c) is on the order of 1016. Indeed, this is what we have observed
in our experiments described in Section 8.
3 A mixed formulation
The first step of the NSHI method is to introduce new variables Pk,?, for
k = 1,..., m and ? = 1,. .. , d. The vector Pk,. stands for the gradient of u
on simplex Tk, multiplied by the constant factor
[fTkC] /5k
5
(7)
Here, sk denotes the longest side length of simplex Tk. Since u is assumed to
be piecewise linear, the gradient of u on Tk is constant and therefore can be
represented by a d-vector. Let us denote the nid vector of scaled gradients
as
Notation: For the remainder of this paper, we adopt the convention that
scalars are denoted with plain italic type, vectors of size d or d + 1 such as
V1,... , V?0 are denoted with bold italic type, and "long" vectors such as y
and p are denoted with bold Roman type.
The side-length factor sk that we have introduced is largely irrelevant for
the developments in this section; it plays an important role in Section 6.
Note that the gradients of u can also be determined from the nodal values
by subtraction and interpolation, since u is linear on each triangle. This
observation implies that there is a system of equations relating the gradients
p and the nodal values y. Let us write this system as
Dp =Ay+b,			(8)
where we now describe the entries of D, A and b in detail.
On both sides of this equation we want to have the md-vector of gradients
of u, multiplied by the scalar factors sk. First, we explain D. Since we scaled
the gradients by constant factors to obtain p, we have to unscale them with
D. Thus, D is a diagonal scaling matrix with positive diagonal entries. It
has m blocks of d consecutive equal diagonal entries, and the diagonal entry
occurring in the kth block is equal to
(9)
Now we explain why Ay + b represents the scaled gradients on the right-
hand side. Let .1...., Mm be matrices each of size d x (d+ 1) such that Mk
performs the following transformation. Given values ..... . ,Xd+1 of a linear
function at the d + 1 vertices of Tk, Mk? should return the gradient of that
linear function. This uniquely defines M?; note that it has a i-dimensional
nulispace spanned by the vector [1,1.... ,1]T.
An actual recipe for Mk may be described as follows. Pick a vertex of Tk
arbitrarily; let this vertex be called the "origin," and let it be numbered last.
6
Let M0 be the d x (d + 1) matrix
M0			1			(10)
1			--H1
--H1
1			--H1
Then M0? maps x, a (d + 1)-vector of nodal values, to the vector of nodal
values minus the origin value.
Notation: For the remainder of the paper we use the superscript `0'
to indicate a combinatorial object, that is, a matrix or vector all of whose
entries are 0, 1, or --H1.
Next, let Jk be the the matrix that transforms coordinates from the "stan-
dard" d-simplex
((xl,...,xd)TE IRd:xi>Ofori= 1,...,dandx1+...+x?<?1J
to T?; that is,
(v1?vd+1)T
Jk =
(vd --H vd+1)T
where vl,... , vd+l are the vertices of Tk with the origin numbered last. Then
one can see that
Mk = j?1M0
The choice of origin affects Jk, but Mk itself is independent of this choice
(except for the ordering of columns).
Next we form a matrix A0 of size md x n0 as follows. For k = 1,... ,
we construct the block of rows numbered (k --H 1)d+ 1 to kd based on simplex
k. The entries of this block of rows are the entries of skMk, with the columns
scattered according to the global numbering of the nodes.
Now, finally, we obtain A by deleting from A0 the columns corresponding
to boundary nodes. We call A0 the extended ?eometry matrix, and we call A
the reduced geometry matrix.
Next, we explain where b in (8) comes from. This comes from the bound-
ary data; let Yo be the nodal values of the boundary data. Then b is equal to
the columns of A0 corresponding to boundary nodes (i.e., the n0 --H n columns
7
that were deleted to form A) multiplied by Yo Thus, Ay + b is the md-vector
of gradients scaled by 8k
We conclude that (8) is a system of equations that must hold relating the
gradients and nodal values. Note that A depends only on the geometry; the
dependence on c has been relegated to D. Vector b depends on the boundary
data and on the geometry (but not c).
Note also that A is rank n. This follows because, assuming b = 0 (zero
boundary values), if any interior nodal value is nonzero, then the gradient
vector on some simplex must be nonzero. This is the same as saying that
whenever y is nonzero, so is Ay.
Now we formulate a second group of linear equations to enforce (3). We
can rewrite (3) as:
k? fTk c(V?, Vu? = 0.
Because p is defined as a scaled version of Vu, we can write this as:
41sk [fTk c] ??. fTk c(V?, Pk,.') = 0
Note that (V?,Pk,.? is a constant on Tk so it may be pulled out of the
integral (write this the constant value of V? as as V?i(Tk)), cancelling the
integration of c and leaving:
m
Zsk(V?(Tk),pk,.? = 0.
k=1
We now try to come up with an expression for V?i(Tk). Observe that V?i(Tk)
is zero if i is not a vertex of Tk, and otherwise it is M?ei, where e? is a vector
with a `1' in the position corresponding to node i and zeros elsewhere. This
product M?ei is equal to a column of Mk.
Thus, (V?(Tk),Pk,.? is the inner product of a column of Mk with Pk,.
Summing over k, keeping track of the node indices, and recalling that A has
the factor sk built in, we see that in fact
Sk(V?i(Tk),Pk,.)
k=1
8
is the same as the inner product of the ith column of A with p. Thus, (3) is
equivalent to the linear system
ATp = 0			(11)
Combining this with (8) yields
w
(12)
We call this system an "equilibrium system" following Strang [20j. Strang
has observed that equilibrium systems occur in many contexts, including
electrical networks, optimization, and structural analysis.
Since D is positive definite and A has full column rank, this system is
invertible and hence uniquely determines p and y.
This formulation is sometimes called a "mixed" formulation in the litera-
ture; see, for example, Roberts and Thomas [17]. Indeed, mixed formulations
have been considered specifically for problems with ill-behaved coefficients by
Babuska and Osborne [2] and by Falk and Osborne [10]. In fact, our approach
is not a true mixed formulation because, as we will see in the next section,
we do not solve (12); instead we reformulate the problem again so that p is
not recovered by our method. The purpose of writing (12) is to separate A
from D.
We now
method can
formally state the fact that the standard linear finite element
be written in the form (12):
Theorem 1. In exact arithmetic, the two linear systems (5) and (12) yield
same value of y.
One final remark is that (12) is commonly called a "weighted least squares"
problem in the linear algebra community. The case when D is ill-conditioned
in this context is known as the "stiff" case. See Bj5rck [6].
4 The NSll method
From (12) we can obtain a linear system for y alone. Multiply (8) by D-1
to obtain p = D?1Ay + D-1b, and substitute this in (11) to obtain
?TD?l?y = ATD-lb.			(13)
9
and thus
y = ?(ATD-lA)-lATD-lb.
(14)
It can be argued that (13) is identical to (5); indeed, K = ATD-lA and
f = ATD-lb. Accordingly, we do not want to try to solve (13) directly
with Cholesky factorization. But (14) has a remarkable property that was
established by Stewart [18]. For this theorem we imagine that m' = md.
Theorem 2 ([18]). Let A be an arbitrary m' x n matrix of rank n. Let
be a matrix norm that is induced by a vector norm. Let D(m') denote the set
of all positive definite m' >c m' diagonal matrices. Define
and
XA = supfII(ATD?lA)?lATD?lII : D E D(m')J
XA = sup(IJA(ATD?lA)?lATD?lII : D e D(m')J.
Then XA, XA are both finite.
Further theory concerning these parameters was developed by O'Leary
[14]. As a consequence of the results of Stewart and O'Leary, there exists an
algorithm for computing XA and XA, although its running time is unfortu-
nately exponential.
If we combine Theorem 1 with Theorem 2 we immediately obtain the
following consequence.
Theorem 3. Let u be the piecewise-linear finite element approximate solu-
tion to (1). Let be some norm for piecewise linear functions on
Then
lull < Cllgll???an?
where C is a constant that depends on m, n, d, on the norm, and on the
triangulation, but not on 9 nor on the conductivity c.
This theorem is an example of a "discrete maximum principle." Ciarlet
[8] has written about discrete maximum principles, although this particular
theorem does not seem to appear in [8]. Wahlbin [23] has since strengthened
Theorem 3 in the d = 2 case. If the oo-norm is used, and all the triangles are
nonobtuse, then Wahlbin has shown that we can take C = 1 in the preceding
10
theorem. In Section 6 we will determine an estimate for C for the general
case.
The existence of this discrete maximum principle suggests that the dis-
crete problem of solving (13) should have a stable algorithm because the
solution is insensitive to perturbations.
Indeed, Vavasis [22] has proposed a stable floating-point numerical algo-
rithm for (13). We now describe this algorithm, which is called the NSH
method. NSH stands for "nullspace scaled hybrid" method.
1.
Find a nullspace basis V for ATD-l, that is, an mdx (md--Hn) matrix of
rank md--Hn such that ATD-l V is zero. The exact procedure for finding
V is described below. Matrix V must have the following properties:
(a)
(b)
It is well-conditioned in the sense that fdr any x, IIVxII is bounded
above and below by constant multiples of I?xII.
If a numerical approximation V is computed (for instance, using
floating point arithmetic), then V must be close to a true nullspace
basis V of ATD-l in the sense that ?V --H Vil/Ilvil is small (i.e.,
V must have a small "relative normwise forward error").
(c) Matrix V is scaled so that ?? V? NN IlAll.
2. Solve the square linear system
Ay+Vq= -b
(15)
using any standard stable algorithm, e.g., Gaussian elimination with
partial pivoting. In exact arithmetic, this system yields the same solu-
tion y as (13); this is seen by multiplying (15) by ATD-l. Vector y is
the solution, and vector q is discarded.
The procedure for finding V is as follows.
1. Sort the diagonal entries of D into ascending order.
2. Form an n x n nonsingular matrix B, a submatrix of A, as follows. Loop
through rows of A following the sorted order defined in the previous
step. For each row, append it to B if it is linearly independent from
the rows previously inserted in B. If it is dependent, then discard it.
Proceed until B has n rows.
11
3.
Solve for all the nonbasic rows of A in terms of the basis B. For each
nonbasic row i, we obtain a vector in the nullspace of AT. Specifically,
this vector has a `--H1' in position i, and the coefficients from solving
for row i in the positions corresponding to B. If we concatenate these
nullspace vectors together, we obtain a matrix Z that is a nullspace
basis for AT. Further, Z contains an embedded copy of --HI, the negative
identity matrix, in nonbasic rows.
4. Obtain V from the computation
v = IlAll DZR,
where D is the original diagonal matrix from (12) and R is an --H
x --H k) diagonal matrix uniquely determined by the requirement
that we want DZR to also have an embedded copy of the negative
identity matrix in nonbasic rows. In other words, R should "unscale"
the embedded identity matrix that was scaled by D.
The theory supporting the NSll method appears in [22]. The main point
of the method is the following theorem.
Theorem 4. In floating point arithmetic, the NSH method returns a solu-
tion ? to (13) such that, when compared to the true solution y, we have the
following bound:
II? --H YII < f(A) e Ilbil,
where e is machine-epsilon, and f(A) depends on XA, XA, and II All but not
on
5 Refining the theory: the NSHI method
We run into a difficulty when we try to apply the algorithm and theory de-
scribed in the last section to (1). The difficulty is that the two parameters
XA, XA depend on the geometry of the triangulation in an undesirable way:
these parameters depend on alignments of the simplex faces with the coor-
dinate axes. In this section we describe a variant of the algorithm and a
refinement of the theory to overcome this problem.
12
First, we must explain why the dependence on alignment arises. It should
be observed that (1) is isotropic, that is, the PDE is invariant under a d x d
orthogonal changes of coordinates. In contrast, our method for forming A
depends on the coordinate system because the components of the gradient
in Cartesian coordinates are encoded in the formation of A.
Another way to understand this dependence is to consider the two-dimensional
anisotropic PDE
a
a? a(x,Y)?ax			b(x?Y)?ay
(16)
Here, a(x, y) and b(x, y) are positive scalar fields. (This PDE generalizes in
an obvious way to d dimensions.) If a(x, y) is significantly larger or smaller
than b(x, y), then the solution to the PDE is sensitive to the orientation of
the triangles.
One observes that (1) and (16) have exactly the same geometry matrix A,
and hence the same values of XA, XA Therefore, we should not be surprised
that the orientation of the triangulation affects these parameters.
Even though A is the same for (1) and (16), D has special structure for
(1). Specifically, for (16), D may be an arbitrary matnx in D(md), but for
(1), D has blocks of d consecutive equal entries. Let us write the set of such
D formally as follows:
D(md, d) = (D E D(md) : D can be written as D' 0 1,
where D' e D(rn) and 1 is the d x d identity matrix.1
We can now define two new parameters for A:
XA,d = sup?II(ATD?lA)?lATD?lII : D E D(md,d)J
and
XA,d = suptIIA(ATD?lA)?lATD?lII : D E D(md,d)).
Obviously Stewart's theorem implies that these parameters are finite and are
at most XA, XA respectively.
These parameters are truly isotropic as the following argument shows.
Let W be an md x md rotation matrix defined to be block diagonal: W =
diag(Q, .... . , Q), where Q is an arbitrary d x d orthogonal matrix. It is
13
easy to show that, in the 2-norm, XA,d and XA,d are equal to XwA,d and
XwA,d respectively.
Most of the theory of stable algorithms described in the last section goes
through when we restrict D to D(md, d) and when we substitute XA,d and
XA,d for XA and XA,d The only step of the algorithm that must be modified
is the computation of V in order to satisfy the three conditions enumerated
in the last section.
It is possible to propose an algorithm for obtaining V satisfying bounds
depending on XA, kA,d for a general m' x n matrix A, but it seems that this
algorithm would be very complicated and would probably not have many
applications beyond (1).
Therefore, we instead explain how to form V only for the special case
that A is a reduced geometry matrix arising in (1). The NSHI method is
completely defined once we have explained this step, since all that remains
is solving (15). NSllI stands for "NSllAsotropic."
First, observe that A can be written as a product:
A = SJ-1A0
where S is a diagonal matrix with the sk scaling factors, J is a block-diagonal
matrix where each block is the Jacobian Jk defined in Section 3, and A0 is
an md x n matrix consisting of m copies of M0 (defined by (10)) with the
entries scattered into appropriate columns, and some columns deleted. To
simplify the discussion in this section, we want the "origins" of the simplices
to be selected as nonboundary nodes. (Recall from Section 3 that the origins
may be selected arbitrarily without changing A, although j and A0 will be
different). The consequence of this choice of origin is that every row of A0
has at least one nonzero entry.
It is not hard to see that A0 is actually a reduced node-arc incidence
(RNAI) matrix of the following directed graph G: G has n0 nodes, one for
each node of the triangulation. It has an arc from node i to i' whenever
(vi, vii) is an edge of some simplex Tk and i is the origin of Tk. Note that
G may have multiple copies of edges; we always keep these multiple copies
present.
The node-arc incidence matrix for 0 is a matrix A00 with one row for each
edge of 0 and one column for each node. Each row contains two nonzero
entries, a `1' and a `--Hi.' The `--H1' entry occurs in the column corresponding
14
to the tail of the arc, and the `1' entry occurs in the column corresponding
to its head.
The reduced node-arc incidence (RNAI) matrix A0 for G is obtained by
deleting all the columns of AoO corresponding to boundary nodes. The deleted
nodes are said to be "grounded" in analogy with electrical engineering. In
an RNAI matrix, each row has either a `1'/'--H1' pair of entries, or a lone `1'
or lone `--H1' in rows corresponding to arcs with one end grounded. (No arc
has both ends grounded by choice of origins.)
The first step toward obtaining V is to compute a nullspace basis Z0 of
AoT. The process of finding Z0 is described in [22] and is purely combinato-
rial. We compute a minimum-weight spanning tree of the graph G1, where
weights are assigned to each arc. The graph G1 is obtained by coalescing all
the grounded nodes in G to a single node, and then removing the orientation
from the arcs. Arcs without orientation are commonly called edges, but we
will not change terminology. The weight to use on arc 1 is the value of the
diagonal entry of D corresponding to arc 1, that is, the quantity (9) where k
is the simplex that gave rise to arc 1. Each column of Z0 then corresponds
to the induced cycle for one nontree arc of G.
Basis Z0 has the following properties. First, each entry is either 0, 1 or
--H1. (This of course implies that Z0 may be computed exactly, without any
floating point error.) Second, Z0 has an embedded (m' --H n) x (m' --H n) copy of
the identity matrix; the m' --H n rows holding the identity matrix correspond
to the m' --H n nontree arcs.
Now we form a nulispace basis V for A by multiplying the product:
v = Ds?lJTzoR.			(17)
In this formula, R is an (m' --H n) x (m' --H n) diagonal matrix that we specify
below, and all the other factors have already been introduced. Since AoTzo =
0, it is apparent that V with this definition is indeed a nullspace basis for
ATD-l.
In order to define R, we need some notation to manage the subscripts.
Let 1 be the index of a column of Z0, that is, an integer between 1 and m'--Hn.
Then 1 corresponds to a nontree arc of &. Arcs in 0 are identified by two
indices, (k(t), ?(l)), where k(l) lies between 1 and m and specifies the simplex
giving rise the arc, and i(l) lies between 1 and d. We can think of (k(t), i(l))
as a row index for the RNAI matrix A0; it is the row index of the nontree arc
15
and corresponds to row number (k(1) --H 1)d + i(1) according to our standard
numbering of the md rows of A0.
These indices (k(1), ?(1)) also form a row index for Z0 (because the rows
of Z0 are in correspondence with the rows of A0). This particular row of
Z0 has a single `1' entry (in column 1), with all the other entries being zero,
because Z0 has an embedded identity matrix.
The diagonal matrix R is defined as follows; the ith diagonal entry is
defined to be the reciprocal of (9) with k taken to be k(1). Let us call this
quantity r(k(t)). Thus, R has been defined to cancel out a certain entry of
We now would like show that V has the three desired properties stated in
the last section. First, we consider condition (a). Let us examine the product
Vx for an arbitrary (m' --H n)-vector x. The product Rx scales the ith entry
by r(k(1)). Next, the product Z01?x, an md-vector, has as its (k(t),i(1))
entry exactly r(k(1))x(1).
For this k = k(1), let us consider all the entries in block k of Z0Rx.
This block consists of d entries that all arise from the same simplex of the
triangulation. For the entries corresponding to nontree arcs, as argued in
the preceding paragraph, these entries have the form r(k)x(t) for some 1. For
entries corresponding to tree arcs, these entries are sums of terms of the form
Ir(k(t'))x(1'), where 1' is a nontree arc whose induced cycle contains a tree
arc in block k. This is because a tree row of Z0 contains entries that are +1
or zero, with the +i entries corresponding to columns indexing nontree arcs
that contain the tree arc in their induced cycles.
It is a property of minimum-weight spanning trees that the any nontree
arc must have a higher weight than all of the arcs on the cycle that it induces.
In particular, this means that r(k((')) ? r(k), because 1/r(k(l')) is the weight
on arc 1', and 1/r(k) is the weight on the tree arc in block k.
Thus, we see that the kth block of Z0Rx contain two kinds of entries: ei-
ther it has entries of the form r(k)x(1), or entries of the form +r(k(11))x(!i)+
... + r(k(18))x(t?) where each r(k(4)) is at most r(k). Let y be this product
Z0Rx.
Next we consider the product s?1JTz0Rx, i.e., 5?1JTy Since S-lJT is
block diagonal, it acts on each block of y independently. It will be shown
in the next section that the norm of each diagonal block JkT/sk in this block
diagonal matrix is bounded above by a constant Cd depending only on the
dimension d. Furthermore, the norm of the inverse of J??TIsk is also bounded
16
above by a constant a depending on d and on the "aspect ratio" of the
triangulation. Therefore, the norm of each block of S-1 JTy is within a
constant multiple of the norm of the corresponding block of y.
Finally, when we form the product z = Ds?lJTy, this has the effect of
scaling the kth block of s?lJTy by 1/r(k).
We claim that the end result is a vector z within a constant factor of the
norm of x. Define t = IIxII?, and let A be chosen so that IxAl = t. First,
we track the chain of computations for block k = k(A). In this block, one
entry of y is r(k)x?. Therefore, the norm of this block of y is at least r(k)t.
Following the arguments in the preceding paragraphs, this means that the
norm of the k block of z is at least r(k)t/(c?r(k)ot), that is, at least a constant
multiple t. This shows that lizil is at least a constant multiple of lix II.
For the other direction, let k be the index of an arbitrary block of y. The
nontree entries of the block are r(k)x(l), i.e., absolute value at most r(k)t.
The tree entries have absolute value at most r(k)tm', using the arguments
above. Now, applying s-lJT multiplies this by at. most cd, and applying D
cancels the r(k) factor. This shows that lizil is at most cdm' times lxii. Let
us state this as a lemma.
Lemma 1. For an arbitrary x, iivxll? is at least iixli?/(c?a) and at most
c?m'iixiioo.
Next we consider condition (b), which states that in floating point arith-
metic, the computed V must be close to V. The computation of Z0 is exact.
Let us next consider the computation of s-lJTZo. This multiplication acts
independently on columns of Z0, and furthermore, it acts independently on
each block of each column. Since each diagonal block of 5-1 jT is well-
conditioned, this implies that each block of each column of s?lJTzo can be
computed with a small relative normwise error in floating-point arithmetic.
Next, we compute Ds-iJTzo. Since, on each block, D has identical
entries, the property that each block has a small relative forward normwise
error is preserved. (Note that if D were arbitrary, this would not be true:
for example, [10--H6, 1]T is a close relative normwise approximation to [0, 1]T
but this property is not preserved under multiplication by diag(1012, 1).)
Thus, each block still has a small relative normwise error. Finally, we
multiply on the right by R; this does not change the property since R scales
columns and hence scales each block by the same factor.
17
Thus, we see that V can be computed accurately in floating point arith-
metic.
The last property is that V? II All. This is easily achieved with one
final scaling. It turns out however, that this scaling is not necessary; a
consequence of the arguments of this section is that both ?? V?? and II All are
close to unity they are off from 1 by small constant factors such as cd,
and a.
A conclusion of the arguments in this section, coupled with the earlier
theory of the NSH method, is the following theorem.
Theorem 5. With floating point arithmetic, the NSHI method returns a so-
lution ? to (13) such that, when compared to the true solution y, we have a
bound of the following form:
II? --H YII < f(xA.d, XA4, a, m') e IlbIl,
where e is machine-epsilon.
6 Analysis of the geometric factors
In this section we try to analyze the numerical stability bound Theorem 5
to understand how it depends on the geometry. When no norm is indicated,
we will generally mean the co-norm in this section. The first observation is
that the error bound depends on Ilbil; we would prefer to have it depend on
IIyoII, the boundary data.
Recall that b is obtained from Yo by multiplication by a subset of columns
of A0, that is, by columns of matrices of the form SkJk?1M0. Thus, we can
assert that
llbli < ???? llSkJk?1M0II IIyoII
Note that lIM0lI = 2 so we can ignore that factor. Thus, we must be con-
cerned with IISkJk?1II.
Let us define the aspect ratio of a simplex to be its maximum side length
divided by its minimum altitude. It can be shown that, for d = 2, the aspect
ratio is bounded above and below by constant multiples of the reciprocal
of the smallest angle of the triangle. We have the following two lemmas
connecting aspect ratios to s?J??1:
18
Lemma 2. Let Tk be a simplex with aspect ratio a. Then
ll5kJk?1ll < cda
where cd is a constant depending on
Notation: In this section cd will denote a factor depending on d whose
value may change from equation to equation.
Note that Jk is not uniquely defined until the origin of Tk has been se-
lected. In order to preserve generality, the following proof, and also the proofs
of the upcoming lemmas, will not depend on which node is chosen as origin.
Proof. For this proof we will switch back and forth between several norms;
changing norms is an additional constant-factor contribution to Cd. Let
vl,..., vd+l be the vertices of Tk. To simplify notation, assume vd+l = 0
so that Jk = ....... , vd]T. Let it be an arbitrary d-vector whose co-norm is
1. Without loss of generality, assume the maximum entry is actually 1 (not
--H1). Let ? be the position of the maximum entry. Let w = J%??it. Then
w =			u?v
?=1
Observe that the sutrimation term in the second line lies in the plane spanned
by 0 and v1..., vd, excluding v'. Thus, this point lies in the plane that
contains one of the facets of Tk (the facet going through v1,... , vd excluding
v?). Hence w is the vector from a vertex of Tk, namely v?, to a point on the
plane containing the opposite facet. Thus, 11w 112 > t, where t is the length
of the smallest altitude of Tk.
Now we have a norm computation.
= llJk?Tll2
= llJk?Twll2/llwll2
--H< Ilitli2lt
<
II			112
19
The second line is based on the fact that it and hence w was arbitrary,
and therefore may be chosen to be the d-vector that determines 1j??T112
The third line follows because it = j??Tw. The fourth line follows because
II'LIIoo = 1.
Thus, IISkJk?1II2 < cdsk/t. Finally, skIt is a by definition. I
This shows that b is bounded in terms of Yo, where the bound depends
on the aspect ratio. From now on, we let OL denote the maximum aspect ratio
occurring in the triangulation.
The previous theorem establishes a bound on II8kJ??1 ?. We can also es-
tablish a bound on the inverse matrix, which is independent of a.
Lemma 3.
115k1JkII < 1.
Proof. This follows because each row of Jk/Sk has 2-norm at most 1, because
5k is the longest side-length. Therefore, the largest matrix entry is 1 in
absolute value, so the infinity norm is at most 1. I
These two lemmas have a converse that is also true: the aspect ratio of Tk
is bounded above by cdIIJkII IIj??1II. Again, assume for notational simplicity
that vd+1 --H 0. Let a be the length of the smallest altitude. There are two
subcases: either the smallest altitude is the altitude to one of v1,... , Vd, or
it is the altitude to Vd+1 = 0. in the first case, suppose e.g. that the smallest
altitude connects v1 to a point in the plane spanned by v2,... , vd, 0; say that
point is t2v2 + ... + tdvd. Let t be the vector (1,--Ht2,... , td)T. Then the
altitude in question is precisely JkTt.
The other subcase is that the smallest altitude is from the origin to a
point t1v1 + ... + tdvd, where t1 + ... + td = 1. Let t denote ....... , td)T.
Then the segment again may be written j??Tt.
in both subcases, IIJkTtII2 = a and I?t??1 > 1 hence
IJk?1II2 >
Next, let b be the length of the longest edge; it is seen that b may be
written as IIJkTitII2 where it is a vector with either a single `1' entry, or two
`+1' entries. Thus, lull2 < 1.5, so 1.5IIJk1I2 > b. Combining this with the
inequality in the preceding paragraph yields:
llJkll2 II ?k1 112 > ccib/a.
20
Now we turn to XA,d and XA,d Theorem 2 shows that these are finite for
an arbitrary A, and therefore for our specific A. Stewart's proof, however, is
nonconstructive (it involves a compactness argument), and we would like to
get specific bounds on these parameters in terms of the geometry. Therefore,
we need to prove new bounds on YA,d and XA,d for a geometry matrix A.
This is the content of Theorem 6 and Theorem 7. Theorem 7 below achieves
a much better bound for a restricted class of triangulations.
Theorem 6. Let ot be the maximum aspect ratio among all simplices of the
triangulation. Then
XA,d < lO(c?a)?0
and
XA,d < 1Omd(c?a)?0+1
Proof. Let us assume that b is an arbitrary vector, and that D is an arbitrary
symmetric positive definite matrix. Let y be the solution to ATD-l Ay =
ATD-ib. In order to bound XA, we need a bound on Ilyll. Note that the
solution y does not necessarily correspond to a solution of (1); the reason is
that vector b arising in a solution of (1) has a special sparsity pattern (its
nonzero entries occur only for simplices adjacent to the boundary). In order
to obtain the needed bound, we cannot assume any special structure in b.
Instead of working with (13), we work with (12). Specifically, assume we
have vectors p and w such that ATp = 0, Dp = w + b, and w = Ay.
Let A0 be the extended geometry matrix defined in Section 3, that is, an
extension of A to n0 columns. Then, since w = Ay, we also have w = A0y,
where ? is an extension of y with zeros in the boundary positions. Let us
call the value of y at each node v its label and denote it as
Now we define a new set of labels that we denote z(v), which are supposed
to approximate the y labels in a certain sense. The z labels, unlike the y
labels, will have a discreteness property: there will be a lower bound on the
difference between any two distinct z labels. Let P > 1 be a constant that
depends on the maximum aspect ratio; the exact value for P is provided
below. The procedure for determine the z's is follows. We write the y labels
in ascending order; let the sorted list be written Yi,?.?, y?0. Let t denote the
span of this list:
t = y%o --H Yi
21
We define a set I of intervals that gets repeatedly updated. Initially, I
contains n0intervalsl= ?[y1,y1+ej,[y2,y2+?J,...,[y%0,y%0+?JJ, where we
define E = t(P+ 2)-no.
Let [a, bj and [c, cl] be two intervals in I such that a <c. We will say that
these intervals are "crowded" if
c--Hb< Pmax(b--Ha,d--Hc).
(In particular, if the intervals intersect, i.e_ c _ b then they are certainly
crowded.) We repeatedly perform the following "merge" operation on I:
if there are two intervals [a, b], [c, cl] that are crowded, we delete them and
replace with the new interval [min(a, c), max(b, d)]. This operation can clearly
be performed at most no times since each time the operation is performed,
the number of intervals decreases by 1. Note that the union of the intervals
in I never decreases because [a, bj u [c, cl] c [min(a, c), max(b, d)J.
When we are done, the union of the intervals in I contains all of the
original y(v) values because of the nondecreasing property. We assign a new
label z(v) to each node v of the triangulation defined to be the left endpoint
of the interval of the terminal I that contains y(v).
This construction of z has several important properties. First, not all the
z`s are equal. In particular, Yi and y?0 cannot end up in the same interval.
The proof of this claim is as follows. Initially, each interval has length e. After
merging two intervals, say of length p, q, it follows from the construction that
the length of the new interval is less than p + q + flmax(p, q), i.e., less than
(P + 2) max(p, q). Thus, after every merge operation, the longest interval in
I grows by less than p + 2. By choice of t, the longest interval can never
reach length t.
Another property of the z's is that any two distinct z's differ by at least
?. This follows because all the intervals in the final I have length at least ?,
and no pair of intervals is crowded.
A third property is as follows. Let v1, v2, v3 be three nodes such that
z(v1) = z(v2) $ z(v3). Then Iy(v?)--Hy(vi)I >--H PIy(v2)--Hy(m)I, and, moreover,
y(v3) --H y?(v1) is approximated by z(v3) --H z(v1):
--Hy(vi))--H(z(v3)--Hz(vi))
z(v3) --H z(vi)			--H
These claims follow from the noncrowded property.
(18)
22
Collect all of the values of z(v) into an n0-vector named z. Let r = A0z.
Since all the y-labels are equal to each other at all boundary nodes (they are
all zero), the z labels at boundary nodes are also equal. Let e be the vector of
all l's; note from (10) that e is in the nulispace of A0. Thus, r = A0(z + Oe)
for any 0. In this last expression we can choose 0 so that z + Oe has zeros for
all boundary nodes. The conclusion is that vector r lies in the image of A,
as well as the image of A0.
Since r is in the image of A and ATp = 0, then rTp = 0. Since r itself is
nonzero (because not all the entries of z are equal to each other), this means
there exists a block of d consecutive entries, say rk, corresponding to simplex
Tk, such that ?k is not all zeros and such that ?Tkpk <0
Let us also split w into m blocks of size d denoted ..... . , Wm. We
now show that wTk?k > O.111rk1122. Let Yk denote the d + 1 vector of y
labels on simplex Tk, and zk the z labels. Then Wk = skj?1M0Yk and
= ski??1M0zk. We claim that M0zk is a close approximation to M0Yk in
the sense of relative normwise error. The argument is as follows. First, recall
that rk is nonzero, implying that not all the z labels of Tk are equal. There
are two kinds of entries in M0z?: zeros, corresponding to vertices with the
same z label as the origin, and nonzeros. Then relation (18) implies that,
componentwise, the entries of M0Zk that are nonzero are within a factor of
(1 i 1/P) of the corresponding entries of M0yk. Where M0zk is zero, the
corresponding entry of M0yk is a factor ? smallef than any of the nonzero
entries of M0Zk.
This implies that M0yk = M0zk + f, where f is a vector whose 2-norm
is at most (cd/P)IIM0zkII2. Now we compute the inner product wtrk.
wTk?k = YTkMOTj??Tsk2Jk?lMOzk
= zTkM0Tj??Tsk2j??1M0zk + fTj??Tsk2j??1M0zk
> II?k1122 --H (cd/P)IIM0zkII2 . IIJk?TskII2 II?k112
> II?k1122 --H (c2?c/P)tIM0z?II2 IIr?II?
= II?k1122 --H (c?2?/P)II5?1J?r?II2 IIrkII2
> IIrkII?2 --H (c3?Q/P)IIr?II2 IIr?II?.
Thus, we can assert that wTk?k > O.lII?k1122 provided that P > l.2cC3d. This
is how P is selected.
Now we continue the argument, recalling the equation Dp = w + b. For
the kth block this may be written dkpk = wk+bk, where dk is a scalar positive
23
constant (it is a diagonal entry of D). Take the inner product with ?k to
obtain 4pTkrk = wTkrk + bkTrk. Since r?Tp? < 0, wTr? < ?bTk?k. Next, we
plug in the inequality from the last paragraph to obtain O.lllrkll2 < bkTrk.
Clearly ?bkT?k < IIbkII2. llrkll2, so
O.lll?kll22 < IIbkII2 II?kII2.
This shows that IIrkII2 < lOIIbkII2. Since llr?ll? > lIM0zkII2/cd, we have
llM0zkll2 < lOcdIIbkII2.
Next, we have IIM0zkII2 > e (this is because Zk has differing entries,
and the minimum difference among distinct z-labels is e), so t(P + 2)-no <
lOcdIIbhII2, i.e.,
t < 1Ocd(P+2)?0IIbII,
Recall that t was the maximum difference between y labels. Since the bound-
ary nodes are labeled zero, the maximum absolute y label is at most t. Thus,
ll?ll? < lOcd(P + 2)?0llblF
This gives the bound on XA,d Recall that XA,d is at most IIAIIxA,d, and II All
is bounded by cdam'. I
The exponential dependence on the number of nodes is rather alarming.
We do not know whether the bound established by this theorem is as tight
as possible. In our computational experiments, we have not encountered
any exponentially large values of ll(ATD?lA)?lATD?lll. It seems that if the
worst-case bound is indeed exponential in n0, it probably takes a very special
choice of D to obtain this large norm.
There is a special case for which we know how to establish a much better
bound.
Theorem 7. Suppose that, in addition to having aspect ratio bound a, all
the simplices Tk are nonobtuse in the sense that that angle between any pair
of (d --H 1)-dimensional facets is at most r/2. Then
XA,d < noacd
and
< n0ma2cd.
24
Proof. Under the conditions of the the theorem, it turns out that the sym-
metric positive definite matrix N = Jk?TJk?l is a Stielties matrix. Recall
that a matrix N is said to be Stielties if it is symmetric, has positive di-
agonal entries and nonpositive off-diagonal entries, and if its inverse has all
nonnegative entries.
The theory of Stieltjes matrices is well-known, and presented, e.g., by
Ortega [15j. For our purposes, we need some variants of the standard results
that we now derive.
Let the vertices of Tk be [v1,.. .,vd,OJ. Let e1,. . . ,ed be the standard
basis, and let e be the vector of all l's. Let Jk be the matrix [v1,... , vdjT.
Consider the inner product between the normals of two facets, say IL and
w. Since the angle between the faces is at most r/2, the angle between IL
and w is at least 7r/2 but less than ?. Thus ILTw <0.
In terms of Jk, the normal of the facet containing all nodes except the
ith is, up to scaling, J??1e?, for i = 1,... , d. The normal of the facet not
containing the origin is --Hi??1e. Thus, the conditions that the facets make
angles at most 7r/2 is
eTJ??TJ??lej <0
for 1 < i,j < d and i # j, and
eTji?Tj??1ei > 0
(19)
(20)
for 1 <i <d.
Notice that (19) immediately implies that the off-diagonal elements of N
are nonpositive, because the expression on the left is precisely the formula
for the (i,j) entry of N.
Next, we show that N is diagonally dominant. Because of the pattern of
signs of the entries of N, we need only obtain lower bounds on each entry of
Ne in order to prove diagonal dominance the ith entry of Ne is precisely
diagonal entry of N minus the absolute values of the off-diagonal elements
in column i. But the ith entry of Ne is precisely the left-hand side of (20).
This proves that N is diagonally dominant.
Now, we return to the problem of bounding XA,d and XA,d We follow
the proof of Theorem 6, except we choose the z labels in a different manner.
Since there are n0 nodes, there must be two consecutive labels in the sorted
list Y1,..., y??, say Y%ut, Y%ut+1 such that Y%ut+i --H Y%ut > t/n0. For nodes
whose y label is Ycut or less, we assign a z label of 0. For the remaining
25
nodes we assign a z label of 1. We denote this entire vector by z0 since it is
combinatorial.
Form the vector r = A0z0 as in the preceding proof. Note that not all
the entries of r are zero. As in the preceding proof, select a simplex Tk such
that ?Tkpk < 0 and such that rk is not all zeros.
We now try to put a positive lower bound on wT?r?. Renumber the
vertices of Tk so that v1,.. .,v? have z-label of 1, and v?+1,. . .,vd+l have
z-label 0, and such that the origin vd+l is the node with the maximum y-
label among the nodes with z-label of zero. (Note that since ?k is nonzero,
both types of z labels must appear on Tk.) Let Z0k denote the z labels of the
vertices, and let Yk denote the y labels.
Thus M0Z0k is a d-vector whose first j entries are l's, and whose last d --H
entries are 0's. Vector M0yk has as its first ? entries numbers of size t/n0 or
more. The last d --H ? entries of M0yk are nonpositive.
Observe that, with these definitions, wk = SkJ??1M0Yk and ?k = skJ??1M0z%,
so wTk?k --H s2(MOyk)TN(MOz%). Let u denote the d-vector s2kN(M0Zk0). Be-
cause N is diagonally dominant, the first tL entries of it are all nonnegative.
Similarly, because the off-diagonal entries of N are all nonpositive, the last
d --H ? entries of it are all nonpositive.
Thus, it and M0yk have the same sign pattern. We now want to argue
that, of the first ? entries of it, at least one of them has a positive lower bound.
Consider the inner product q = (MOz%)Tsk2N(MOzkO), which is the same as
itT(MOzkO). This inner product is thus equal to the sum of the first ? entries
of it. On the other hand, q > sk2/llN?1ll2, because the minimum possible
value of ?Tp? when IxIl2 > 1 is l/liP?1 112 for a symmetric positive definite
matrix P. As argued earlier, sk2/llN?1ll2, which is equal to (sk/IIJkII2)2, is at
least Cd.
This shows that at least one of the first ? entries of it is of size Cd. Finally,
since wTk?k = (MOyk)Tit, we have that wTk?k > Cdt/no.
Now we continue as in the proof of Theorem 6. To summarize, since
dkpk = Wk + 6k, we have wTkrk < bkTrk. Substituting the inequality from
the preceding paragraph yields
Cdt/no < llbkll2 II?kII2.
Note that II?k112 < C?a, hence
Cdt/no < CdoLIIbkII2,
26
thus
t < cdnoOIIbkII2.
This concludes the argument. I
The difference between Theorem 6 and Theorem 7 is very striking; Theo-
rem 7 has a polynomial dependence. If we were to assume that both bounds
are in fact tight, then there is a significant difference between obtuse and
nonobtuse triangulations. This difference may be explained as follows. Re-
call that the parameters XA,d, XA,d correspond to constants occurring in a
certain "discrete maximum" principle. It is well-known that strong discrete
maximum principles are more easily established in the presence of angles
bounded by 900; see, for example, [21, p. 78] or [8, Theorem 20.2]. Compare
this also to the result of [23] mentioned in Section 4.
7 Triangulation algorithms
A consequence of the results in the last section is that, for (1) with wild
coefficients, one would like a triangulation of ? with bounded aspect ratio
and with all angles bounded by r/2.
There has been considerable research on algorithms for triangulations
satisfying angle bounds. An excellent source of information is the survey by
Bern and Eppstein [4]. An algorithm for obtaining nonobtuse triangulations
of polygonal regions in the case d = 2, and with bounded aspect ratio, was
proposed by Baker et al. [3]. Their algorithm does not handle varying levels
of refinement (i.e., great variation in the size of the simplices in different
parts of the domain.) Bern, Eppstein and Gilbert [5] proposed a class of
triangulation algorithms with bounded aspect ratio that can vary the level
of refinement. Using these ideas, Melissaratos and Souvaine [12] have an
algorithm for nonobtuse triangulation with bounded aspect ratio that can
vary the level of refinement.
In the case d > 3 we know of no work on nonobtuse triangulation of
polygonal regions. There is only one algorithm we know of that attains
bounded aspect ratio in the case d = 3, an octree-based algorithm due to
Mitchell and Vavasis [13]. This algorithm handles varying levels of refine-
ment.
27
Note that for all of these algorithms, if the boundary of ? has a sharp an-
gle, then the aspect ratio bound must go up, because the triangle/tetrahedron
that fits inside the sharp angle must necessarily have a bad aspect ratio.
For the case d> 3, we know of no work on triangulations with bounded
aspect ratio or with a nonobtuseness condition for polygonal regions.
8 Computational experiments
In this section we describe computational experiments to compare the NSHI
method to the CFSM method for solving (1). All these experiments are for
d = 2. The integration of the c fields over triangles has been carried out
using the midpoint rule. All the tests have been conducted and all figures
produced using MATLABTM. MATLAB, an interactive software package for
numerical analysis, is a trademark of The Mathworks, Inc. All tests are in
double-precision, with machine epsilon approximately 2 10-16.
For the standard method, we have used sparse Cholesky factorization,
as implemented by the chol instruction in sparse MATLAB, on the stiffness
matrix. Naturally, there are many other possible methods for solving (5),
such as iterative methods. We believe, however, that any method based on
(5) will be inaccurate because the ill-conditioning K will affect any numerical
algorithm.
We have implemented the NSHI method exactly as described in this pa-
per. We have used Kruskal's algorithm for the minimum weight spanning
tree computation (see Aho, Hopcroft, and Ullman [1, p. 237]), which requires
O(mdlog(md)) operations.
For our first group of test problems we used the mesh ?i illustrated in
Fig. 1. This mesh was obtained by placing an evenly spaced mesh of points
inside a polygon, displacing points that fell close to the boundary, and then
triangulating with the constrained Delauney triangulation (see [4]). This
method generally gives good aspect ratio bounds? though some angles are
obtuse.
As our first test we took c = 1 everywhere and boundary conditions de-
pending linearly on planar coordinates x1,x2. This problem has an exact
analytic solution, namely, the extension of the linear function into the do-
main. Furthermore, in exact arithmetic, the finite element method should
give the exact analytic solution (i.e., no truncation error is present) because
28
800
600
400
200			400			600			800
Figure 1: Mesh used for several test cases.
29
the finite element method is piecewise linear. Thus, the only errors are round-
off errors. We found that both the CFSM method and the NSHI method gave
computed answers y such that II? --H was bounded by about lO?15IIyoII
(Recall that Yo represents boundary values.) The spanning tree computed
by the NSHI method is depicted in Fig. 2.
For our second test, we again took c = 1 and boundary conditions so
that the exact analytic solution is the harmonic function x21 --H x22. This
function cannot be exactly represented with linear elements, so we would
expect truncation as well as roundoff errors. The relative norm error for
both methods with respect to the boundary data for this test was 3.6 1o-?
for both methods. The relative difference between the two methods was
about 10-15.
For the next test, we constructed a problem with wild coefficients where
know an approximate analytic solution. The coefficients are set up so that c
is a very large constant value c' in a disk-shaped region U strictly interior to
the domain, and c is 1 everywhere else. If the conductivity were essentially
infinite in U, then u would have to constant in U, and in particular, it would
be constant on the boundary of U. If c = 1 outside, then u is harmonic
outside U. An example of a function that is harmonic on the exterior of the
unit disk, and constant on the boundary of the disk is
x2
u(xi,x2) = x2 --H ?21 +x22			(21)
Furthermore, this function has the necessary compatibility property that its
flux a?/a? averaged around the boundary of the'disk is zero. Thus, if we
choose boundary conditions to agree with (21) and we choose c' very large,
then the exact analytic solution of (1) should be very close to constant inside
U and close to (21) outside.
We took as the boundary of the domain the same polygon used in the
previous cases, with a disk inside. The boundary conditions used were defined
by (21), properly scaled and recentered for the inner disk, and with a constant
added.
We took five different values for the c field. In all cases, c is 1 outside
the disk, and is a large value c' on the disk. The results are summarized
in Table 1. Note that for small values of c' both methods return acceptable
solutions. For wilder coefficients, the stiffness equation returns the wrong an-
swer, and finally breaks down, unable to complete the Cholesky factorization.
30
Figure 2: An example spanning tree generated by the NSllI method.
31
Table 1: This table summarizes the test where the coefficient in the disk
is c'. The second column is the difference between the CFSM solution and
the analytic solution. The third column is for the NSHI method. Omitted
entries in the second column mean that the Cholesky factorization failed.
CI			CFSM error			NSHI error
10?			3.14%			3.14%
1010			3.14%			3.14%
1016			103.12%			3.14%
1017			3.14%
1040			--H			3.14%
The specific solutions in the case of cl = 1016 are illustrated in Fig. 3.
As mentioned in the last paragraph, we added a small constant to the
boundary conditions. In principle, adding a constant to the boundary condi-
tions should affect (1) only by translating the solution by the same constant.
We found, however, that when we did not add th5s constant, the stiffness-
equation method generally returned the right answer for large CI (until the
point where the factorization failed to exist). Apparently the boundary con-
ditions described in the last paragraph do not excite the unstable mode of
ATD-l A unless a constant is added.
We remark also that the relative differences in this example compared
to the analytic solution (truncation errors) were about 0.0314, much larger
than the truncation error observed with the quadratic boundary conditions
described above. We believe that this is because the triangulation is not
able to accurately represent the inner disk, so the c field is only zero-order
accurate.
For the next test, we generated field c with two distinct large values, 10150
and 10??. The results are illustrated in Fig. 4. To generate this mesh, we
have used a two-dimension version of the algorithm in [13] implemented by
S. Mitchell. This mesh generator guarantees a bound on the aspect ratio, but
the triangles may be obtuse. Unlike the Delauney method described above,
this method has the ability to generate triangulations with varying degrees
of refinement.
The CFSM method failed for this problem. We do not know the ana-
32
800
600
400
20C
800
600
400
20(
0
800
600
4O0?
200?
200 400 600 800
200 400 600 800
800
600
400F
200 400 600 800
200?
200 400 600 800
Figure 3: The upper teft figure shows the field c; the blue region is where
c 1016. The upper right figure shows the analytic solution. The lower
left figure shows the solution computed by the N?HI method, and the lower
right figure shows the solution computed by the (iFSM method. Note that
the CESNI solution appears to have a local niaximum namely the large
red region in the center. The color scale of the last three figures has been
equalized.
I.
33
800
600
4(J
200 400 600 800
800
600
4b
200[
200 400 600 800
Figure 4: A problem with coefficients of size to1.?o and 10300. The top figure
shows the coefficient field: blue indicates 10300, green indicates 10150, and red
indicates 1. The bottom figure shows the solution computed by the N?ilI
method.
:34
lytic solution, so the NSHI solution presented in the figure can be judged
only qualitatively. For instance, we would expect that in the region of con-
ductivity io??, the temperature should be nearly constant since this region
is isolated from the boundary, and the computed solution does indeed have
this property. The boundary conditions are a linear function of the (x1, x2)
coordinates.
For our last experiment on wild c's, we used the same domain, but we let
c vary from 1030 up to 10186 smoothly. In particular, c was defined to be an
exponential function of the x1 coordinate. Again, linear boundary conditions
were used.
Surprisingly, the CFSM method ran to completion and returned an an-
swer that had a relative error of only 5.6 1o?16 compared to the NSHI
method. MATLAB warned that the Cholesky factors were probably singu-
lar. The CFSM method can sometimes be robust in the face of extreme
ill-conditioning; we do not know why this is so.
As our final battery of tests, we compared the running time of the NSHI
method to that of the standard method. It is not completely obvious how to
make this comparison since the NSHI method requires combinatorial as well
as numerical computation, and MATLAB is not efficient for combinatorial
computation. Accordingly, we compared the two methods based only on the
number of flops, as measured by MATLAB. There is a further issue that
clouds the comparison. It is well-known that the number of flops for sparse
factorization methods depends heavily on the ordering of the equations and
the unknowns. There are very efficient orderings known for CFSM method;
see [11j. For the NSHI method, on the other hand, we have not studied
the ordering issue. We opted to simply use the native ordering in both
algorithms, that is, the ordering produced by the Delauney mesh generator.
It turns out that our implementation produces a "band ordering" for the the
CFSM method, which is favorable for this method (though not optimal). We
do not know whether this is a favorable ordering for the NSHI method. The
sparsity pattern of the stiffness matrix K is depided in Fig. 5. The sparsity
pattern of the matrix [A, V] is depicted in Fig. 6.
We generated three problems based on the same domain but with different
mesh spacings. The running time figures for three problems are given in
Table 2. Notice that the NSHI method took significantly longer times. The
primary reason for this gap is that the system K is n x n, whereas [A, V]
is md x md, in general, much larger. We observe also from the table that
35
P
200
400
600
800
1000
0
200			400			600			800			1000
nz=7833
Figure 5: Nonzero structure of the stiffness matrix, n = 1165.
36
n
1000
1500
2000
2500
3000
3500
4000
4500
500
?.% .???.?.?.. !. o+..o+.. !. o+.. ?, la--H--H
?-?\ o+-?
- \\:-:--
-			-
\ \`?`?.?..
:- .;-.:
0
1000			2000			3060			4000
nz=87518
Figure 6: Nonzero structure of the matrix [A, V], md = 4984.
37
the gap between the two methods appears to be \Nidening as the problems
get larger. We suspect that this is because the band ordering asymptotically
improves the Cholesky running time but apparently does not help the NSHI
method as much.
Table 2: Comparison of the number of flops required by the two methods.
n			md CFSM flops NSHI flops
73			373			18,880			952,6(;9
292			1328			153,438			14,272,869
1165			4984			1,761,780			476,627,626
9 Conclusions and future directions
We have shown that boundary value problem (1) with wild coefficients may
be solved stably by the NSHI method, even when the condition number of the
stiffness matrix is well beyond the reciprocal of machine-epsilon. It would be
interesting to generalize this work to other boundary value problems. Here
are some examples of possible generalizations.
1.
2.
A straightforward generalization would be to use finite element basis
functions of order higher than linear. We do not anticipate any dif-
ficulties in the theory, but we do not know how XA,d and XA,d might
change.
A more general problem is (1) with a nonzero right-hand side, i.e.,
V (cVu) = --Hf. In this case, we believe that the NSllI method and
the theory developed in this paper will break down. There will be no
maximum principle in this case; there is no bound on u in terms of f
and g independently of c.
There is one special case of a nonzero right-hand side that could be
partially handled, namely, the case that g = 0:
V (cvu) = --Hf on
u = 0			on
38
3.
4.
In this case, we believe that the RSH method (dual to the NSH method;
see [22]) can accurately recover the vector p, that is, the vector repre-
senting cVu (but not y). There appears to be a maximum principle
bounding cVu in terms of f independently of c when g is zero.
Another more general problem is the anisotropic version of (1), that is,
V (GVu) = 0, where G is a function ? IRdx(i that is a symmetric
positive definite matrix at each point.
Let Amax, A?n denote the maximum and minimum eigenvalues of U.
Thus, Amax, Arnin are also functions of the spatial coordinates. There
are two different ways in which G could be wild. The first case is
that Ama_ < kA__ where k is a small constant at every point on the
domain, but Arn?n and Arnax could vary significantly as a function of the
coordinates. This case might be called the "weakly" anisotropic case.
We believe that the NSllI method, as described in this paper, will be
adequate for this case.
The other case, the "strongly" anisotropic case, is when Amax may be
far larger than Arnin at some points, and both may vary by orders of
magnitude over the domain. In this case, it seems that the plain NSll
method should be used (not the NSHI method), because the isotropic
versions of XA, XA are irrelevant. The plain versions of XA, XA for a
strongly anisotropic problem will depend significantly on the orienta-
tion of the faces of the triangulation with respect to the eigenvectors
of G(x1,... , xd). We do not know the nature of this dependence, but
it will probably be necessary for the triangulation to have very special
properties in order to get reasonable bounds on XA, XA
Nonlinear boundary value problems could also be considered. The most
straightforward nonlinear generalization of (1) is the case that c de-
pends on u.
5. Stokes flow could be considered. Stokes flow in a fluid with highly
varying viscosity can be written:
--HV (G(x, y, z)Vv) + Vp =!			(22)
V.v=O
39
Strang [19] observes that this can be written in the form of (12). The
difficulty is that D will be in general nondiagonal, so none of Stewart's
results underlying the NSH method could be applied.
By factoring this D, say as QTD?Q where Q is orthogonal and D' is
diagonal, we may rewrite (22) with a diagonal D' by changing variables.
This new diagonal matrix D' presumably would encode the variation in
viscosity. The change of variables would replace the original A in (12)
with QA. Then the theory can be applied. Of course, the interesting
question is how to obtain bounds on XA, XA in this case; indeed, it is
not clear even what QA represents (mathematically or physically).
By writing Stokes flow in the form (12), Wathen et. at. [24] have de-
rived a method for obtaining optimal preconditioners for iterative linear
solvers. This method as presented does not apply to the case when G
is highly varying.
There are other future directions for this research besides widening the
scope of applications. Here are some of the more interesting research ques-
tions.
Perhaps it is possible to simplify the NSHI method, which currently
involves a variety of numerical and combinatorial tasks, or to derive a
completely different algorithm satisfying a bound like Theorem 5.
Indeed, the CFSM method showed surprising robustness on many test
cases as mentioned in Section 8, and appears to be far more efficient.
Perhaps there is some simple modification of the CFSM method that
guarantees stability.
If, on the other hand, no stable method simpler than the NSHI method
is found, then the question of sparse orderings and efficiency in the
NSHI method needs attention.
A very interesting question is whether iterative methods can be applied
to problems with wild coefficients. Many efficient iterative methods
for boundary value problems with nonconstant coefficients have been
proposed; see, e.g., Concus and Golub [9]. We do not know whether
these methods are numerically stable when c is wild. Another iterative
approach to problems with nonconstant coefficients is based in domain
40
decomposition: see Bramble, Pasciak and Schatz [7]. These methods
have a convergence rate bounded independently of the variation c, but
c must follow a certain pattern (it must be constant or nearly constant
on substructures). Furthermore, the connection between convergence
rate results (which assume exact arithmetic) and numerical stability is
unclear.
o+ Do XA,d, XA,d have better bounds in the case of obtuse simplices?
XA,d, XA,d can indeed be exponentially large for obtuse triangulations in
cases of interest, then it seems that nonobtuse triangulation algorithms
must be used. Can the algorithms described in Section 7 be extended
to three dimensions or higher?
10 Acknowledgements
If
The author benefitted greatly from discussions with James Bramble, Mark
Hanisch, Alfred Schatz, and Lars Wahlbin of Cornell, Gilbert Strang of MIT,
Scott Mitchell of Sandia National Laboratories, and Marshall Bern of Xerox
PARC.
References
[1] A. V. Aho, J. E. Hopcroft, and J. D Ullman. Data Structures and
Algorithms. Addison--HWesley, 1983.
[2] I. Babuska and J. E. Osborne. Generalized finite element methods:
their performance and their relation to mixed methods. SlAM Journal
on Numerical Analysis, 20:510--H536, 1983.
[3] B. 5. Baker, E. Grosse, and C. 5. Rafferty. Nonobtuse triangulation of
polygons. Discrete and Computational Geometry, 3:147--H168, 1988.
[4]
M. Bern and D. Eppstein. Mesh generation and optimal triangulation.
In D. Z. Du and F. Hwang, editors, Computing in Fuclidean Geometry.
World Scientific, 1992. Also Xerox Palo Alto Research Center Technical
report CSL-92-1
41
[5]
M. Bern, D. Eppstein, and J. Gilbert. Provably good mesh generation.
In Proc. 31st Symp. Foundations of Computer Science, pages 231--H241,
1990.
[6J A. Bj5rck. Least squares methods. In P. G. Ciarlet and J. L. Lions, ed-
itors, Handbook of Numerical Analysis, volume I. North--HHolland, Ams-
terdam, 1990.
[7j J. H. Bramble, J. E. Pasciak, and A. II. Schatz. An iterative method for
elliptic problems on regions partitioned into substructures. Mathematics
of Computation, 46:361--H369,1986.
[8]
P. G. Ciarlet. Basic error estimates for elliptic problems. In P. G. Ciarlet
and J. L. Lions, editors, Handbook of Numerical Analysis, volume II.
North-Holland, Amsterdam, 1991.
[9] P. Concus and G. Golub. Use of fast direct methods for the efficient
numerical solution of nonseparable elliptic equations. SlAM J. of Nu-
merical Analysis, 10:1103--H1120, 1973.
[10] R. S. Falk and J. E. Osborn. Remarks on m?xed finite element methods
for problems with rough coefficients (preprint). 1992.
[11] A. George and J. W. II. Liu. Computer Solution of Large Sparse Positive
Definite Systems. Prentice--HHall, Englewood Cliffs, N.J., 1981.
[12] E. Melissaratos and D. Souvaine. Coping with inconsistencies: A new
approach to produce quality triangulations of polygonal domains with
holes. In Proc. 8th Symp. on Computational Geometry, pages 202--H211,
1992.
[13]
5. A. Mitchell and 5. A. Vavasis. Quality mesh generation in three
dimensions. In Proc. 8th Symp. on Computational Geometry, pages
212--H221. ACM Press, 1992.
[14] D. P. O'Leary. On bounds for scaled projections and pseudoinverses.
Linear Algebra and its Applications, 132:115--H117,1990.
[15] J. Ortega. Numerical Analysis: A Second Course. Academic Press,
1972.
42
[16j M. II Protter and II. F. Weinberger. Maximum principles in differential
equations. Springer Verlag, New York, 1984.
[17] J. E. Roberts and J. M. Thomas. Mixed and hybrid methods. In P. G.
Ciarlet and J. L. Lions, editors, Handbook of Numerical Analysis, vol-
ume II. North-Holland, Amsterdam, 1991.
[18] G. W. Stewart. On scaled projections and pseudoinverses. Linear Alge-
bra and its Applications, 112:189--H193, 1989.
[19] G. Strang. Introduction to Applied Mathematics. Wellesley--HCambridge
Press, 1986.
[20] G. Strang. A framework for equilibrium equations. SlAM Review,
30:283--H297,1988.
[21] G. Strang and G. J. Fix. An analysis of the finite element method.
Prentice Hall, Englewood Cliffs, N.J., 1973.
[22] 5. A. Vavasis. Stable numerical algorithms for equilibrium systems.
Technical Report 92-1280, Department of Computer Science, Cornell
University, Ithaca, NY, 1992. To appear in SlAM J. Matrix Analysis.
[23] L. Wahlbin. Private communication. 1993.
[24] A. J. Wathen, B. Fischer, and D. J. Silvester. The convergence rate of
the minimum residual method for the Stokes problem (in preparation).
1993.
43
