BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1391
ENTRY:: 1994-03-17
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: An Accelerated Interior Point Method Whose Running Time Depends 
        Only on $A$
AUTHOR:: Vavasis, Stephen A. 
AUTHOR:: Ye, Yinyu
DATE:: October 1993
PAGES:: 68
ABSTRACT::
We propose a ``layered-step'' interior point (LIP) algorithm for linear 
programming. This algorithm follows the central path, either with short steps 
or with a new type of step called a ``layered least squares'' (LLS) step. The 
algorithm returns the exact global minimum after a finite number of 
steps-in particular, after $O(n^{3.5}c(A))$ iterations, where $c(A)$ is a 
function of the coefficient matrix. The LLS steps can be thought of as 
accelerating a path-following interior point method whenever 
near-degeneracies occur. One consequence of the new method is a new 
characterization of the central path: we show that it composed of at most 
$n^2$ alternating straight and curved segments. If the LIP algorithm is 
applied to integer data, we get as another corollary a new proof of a 
well-known theorem by Tardos that linear programming can be solved in 
strongly polynomial time provided that $A$ contains small-integer entries.
END:: CORNELLCS//TR93-1391
BODY::
An Accelerated Interior Point Method
Whose Running Time Depends Only on A
Stephen A. Vavasis*
Yinyu Ye**
TR 93-1391
October 1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
* This work is supported in part by the National Science Foundation, the Air Force
Office of Scientific Research, and the Office of Naval Research, through NSF grant
DMS-8920550. Also supported in part by an NSF Presidential Young Investigator
award with matching funds received from AT&T and xerox Corp. Part of this work was
done while the author was visiting Sandia National Laboratories, supported by the
U.S. Department of Energy under contract DE-AC04-76DP00789.
** This author is supported in part by NSF Grant DDM-9207347. Part of this work was
done while the author was on a sabbatical leave from the University of Iowa and
visiting the Cornell Theory Center, Cornell University, Ithaca, NY 14853, supported in
part by the Cornell Center for Applied Mathematics and by the Advanced Computing
Research Institute, a unit of the Cornell Theory Center, which receives major funding
from the National Science Foundation and IBM Corporation, with additional support
from New York State and members of its Corporate Research Institute.
An Accelerated Interior Point Method
Whose Running Time Depends Only on A
Stephen A. Vavasis*			Yinyu Ye
0ctober 14,1993
Abstract
We propose a "layered-step" interior point (LIP) algorithm for
linear programming. This algoritlim follows the central path, either
with short steps or with a new type of step called a "layered least
squares" (LLS) step. The algorithm returns the exact global mini-
mum after a finite number of steps--Hin particular, after O(n3?5c(A))
iterations, where c(A) is a function of the coefficient matrix. The
LLS steps can be thought of as accelerating a path-following interior
point method whenever near-degeneracies occur. One consequence of
the new method is a new characterization of the central path: we
show that it composed of at most n2 alternating straight and curved
*Department of Computer Science, Upson Hall, Cornell University, Ithaca, NY 14853.
Email: vavasis?cs.cornell.edu. This work is supported in part by the National Science
Foundation, the Air Force Office of Scientific Research, and the Office of Naval Research,
through NSF grant DMS-8920550. Also supported in part by an NSF Presidential Young
Investigator award with matching funds received from AT&T and Xerox Corp. Part of
this work was done while the author was visiting Sandia National Laboratories, supported
by the U.S. Department of Energy under contract DE-AC04-76DP00789.
tDepartment of Management Sciences, The University of Iowa, Iowa City, IA 52242.
E?mail: bbuyin?vaxa.weeg.uiowa.edu. This author is supported in part by NSF Grant
DDM-9207347. Part of this work was done while the author was on a sabbatical leave from
the University of Iowa and visiting the Cornell Theory Center, Cornell University, Ithaca,
NY 14853, supported in part by the Cornell Center for Applied Mathematics and by
the Advanced Computing Research Institute, a unit of the Cornell Theory Center, which
receives major funding from the National Science Foundation and IBM Corporation, with
additional support from New York State and members of its Corporate Research Institute.
segments. If the LIP algorithm is applied to integer data, we get
as another corollary a new proof of a well-known theorem by Tardos
that linear programming can be solved in strongly polynomial time
provided that A contains small-integer entries.
1 Interior point methods
In this report we consider solving a linear programming problem (LP) in the
following form:
minimize bTy
subjed to ATy > c (1)
Here, A is an m x n matrix assumed to have rank m, b e R"' and c Ei R?
are given vectors, and y e 1Rm is the unknown vector. This format of LP is
commonly known as "dual format."
We propose a path-following interior point method for this problem. In
the taxonomy of interior point methods, our method would be called a "dual"
path following method, but it has some features of a "primal-dual" method.
In particular, our algorithm produces both a primal and dual optimal solu-
tion.
Interior point methods were originally introduced by Karmarkar [8], and
the first path-following method is due to Renegar [19]. Traditional path-
following methods take small steps along the central path until they are
"sufficiently" close to the optimum. Once sufficiently close, a "rounding"
procedure such as Khachiyan's [9] or least-squares computation such as Ye's
[31] is used to obtain a global minimum.
In our new method, we interleave small steps with longer layered least
squares (LLS) steps to follow the central path. Thus, our method is always
at least as efficient as existing path-following interior point methods. The last
LLS step moves directly to the global optimum. Thus, our algorithm, which
we call "layered-step interior point" (LIP), terminates in a finite number of
steps. Furthermore, the total number of iterations depends only on A: the
running time is O(n3?5c(A)) iterations, where c(A) is defined by (63) below.
This is in contrast to all previous interior methods, whose complexity depends
on b and c as well as A. This is important because there are many classes
of problems (see, e.g., Section 10) in which A is "well-behaved" but b and c
are arbitrary vectors.
2
In order to provide intuition on how the LIP algorithm works, and why
our complexity bounds are stronger, we consider the linear programming
problem presented in Figure 1. The problem solved in this figure is:
minimize
subject to
2Yi + 5Y2
o ? Yi < 1,
o < Y2 < 1,
Yi + 2Y2 > e,
where e = 0.1. The dashed line indicates the boundaries of the feasible re-
gion defined by these constraints. The solid line is the central path, which
is defined in Section 2. It connects the point A, known as the analytic cen-
ter, to the point D, the optimum solution. Note that the optimum in this
example is at a vertex of the feasible region it is always the case that the
optimum is achieved at a vertex, and for a nondegenerate problem (such
as this one) the optimum is unique. The asterisks show the iterates of a
conventional path-following interior point method. Such a method generates
iterates approximately on the central path with spacing between the iterates
decreasing geometrically. Once the iterates are sufficiently close to the opti-
mum D, where "sufficiently close" means that the current iterate is closer to
D than to any other vertex of the feasible region, then the exact optimum
may be computed.
Now, consider what happens as we let e get smaller in this problem. This
creates a "near degeneracy" near (0, 0), and the diagonal segment in Figure 1
gets shorter and shorter. This means that the conventional method must
take more and more iterates in order to distinguish the optimal vertex (e, 0)
from the nearby vertex (0, e/2). In fact, it can be proved that a standard
method applied to this problem would require 0(1 log el) iterates. Note that
e is a component of right-hand side vector c; this explains why the existing
methods have complexity depending on c. The running time of all existing
methods is written as O(?L) (or sometimes O(nL)) iterations, where L is
the number of total bits required to write A, b, c. Thus, for this example, L
depends logarithmically on e.
In contrast, our method would jump from point B directly to point C
without intervening iterations. This accomplished by solving a weighted
least-squares problem involving the three constraints Yi > 0, Y2 > 0, Yi +
2Y2 > f, which are "near active" at optimum. In a more complex example,
there would several "layers" involved in the least-squares computation. When
3
0.8
0.6
0.4
0.2
0
r
*
0			0.2			0.4			0.6			0.8			1
Figure 1: Example of an interior point method.
4
point C is reached, the LIP method takes small interior point steps, the
number of which depends only on the constraint matrix
AT =
10
--H1			0
o			1
o			--H1
12
After these small steps, the LIP method jumps again directly to optimal
point D.
An observation concerning this figure is that the central path appears to
consist of a curved segment from A to B, an approximately straight segment
from B to C, a curved segment near C, and another approximately straight
segment from near C to D. Indeed, a consequence of the LIP method is a new
characterization of the central path as being composed of at most n2 curved
segments alternating with approximately straight segments. For a problem
with no near-degeneracies, there is only one curved segment followed by an
approximately straight segment leading to the optimum.
Ours is not the first complexity bound for linear programming that de-
pends only on A; Tardos [23] earlier proposed such a method. Tardos'
method, however, "probably should be considered a purely theoretical con-
tribution" [23, p. 251] because it requires the solution of n complete LP's.
In contrast, our method is a fairly standard kind of interior point method
accompanied by an acceleration step; accordingly, we believe that it is quite
practical.
Another difference is that we regard our method as primarily a real-
number method. For example, our method is the first polynomial-time linear
programming algorithm that also has a complexity bound depending only on
A for finding the global minimum in the real-number model of computation.
In contrast, Tardos uses the assumption of integer data in a fairly central
way because an important tool in [23] is the operation of rounding down to
the nearest integer.
The remainder of this paper is organized as follows. In Section 2 we define
the central path and characterize points that are "approximately" centered.
In Section 3 we describe quantities XA, XA that were discovered independently
by Stewart [22] and Todd [24], and state some of their properties. Our
complexity bounds depend on these parameters. In Section 4 we describe
5
the layered least squares step and the LIP method. In Section 5 we prove
the first main theorem concerning the LIP method, namely, that every point
on the line segment defined by the LLS step is approximately centered. In
Section 6 we define crossover events, a combinatorial event concerning the
central path that forms the basis of our complexity analysis. We prove the
second main theorem that each LLS step forces at least one crossover event.
In Section 7 we assemble the pieces in order to obtain a complexity result for
solving LP problems. In Section 8 we explain a method for initializing the
LIP method, and analyze its complexity. This complexity is no worse than
the complexity of the LIP method itself. In Section 9 we specialize to integer
data and provide a new proof of Tardos' theorem. Finally, in Section 10
we apply our bounds to the special case of minimum-cost flow to explore
how our interior-point method compares to other strongly polynomial time
algorithms for this problem.
2 The central path
It is common to rewrite (1) by introducing slack variables as follows:
minimize
subject to
bTy
ATy --H s =
S >0.
(2)
We will assume this format for the remainder of the paper. Assume that a
strictly interior point, that is, a point with s > 0, exists. Let us introduce
a log-barrier term into the objective function, yielding the new optimization
problem:
minimize bTy --H ? Zt%i in 5j
subject to ATy --H s =
> 0.
(3)
Here, ? is a positive parameter. This problem is strictly convex. This means
that if we assume the set of feasible points for (2) is bounded and nonempty,
there is a unique minimizer for (3). This point (s(ji), y(?)), the minimizer, is
called the central path point for parameter'i, for problem (2). Note that as
H 00, the objective function bTy no longer matters and (s(?), ?(i)) tends
to the analytic center of the feasible region. As ? 0, (s(?), y(?)) tends to
6
an optimum since (3) tends to (2) in this case. The definition of the central
path is credited to Bayer and Lagarias [2], Megiddo [13], and Sonnevend [21],
and it was partially analyzed by McLinden [12] earlier. For the relationship
between the central path and the classical log-barrier method, see Wright
[30]. For interesting properties of the central path, see the review article by
Gonzaga [7].
If we introduce Lagrange multipliers into (3), we find that (s(i),y(i))
exactly satisfy the equations:
?AS?1e =
ATY --H S =
liere and for the rest of the paper, 5 denotes diag(s) and e denotes the vector
of all l's. These nonlinear equations generally have no closed form solution
nor any finite algorithm. Accordingly, an interior point method generates
points that are approximately centered.
In order to further develop the theory, we consider the primal problem
associated with (2):
maximize cTx
subject to Ax = b, (4)
x> 0.
This problem is written in "standard form." It can be shown that the central
path for (4) is essentially identical to the central path for (2) if we make the
identification x = ?S?1e.
When one is interested in approximate centering, however, several criteria
for measuring proximity to the central path are used, depending on whether
primal or dual information is available. For the remainder of the paper, all
norms are 2-norms.
The first is the dual measure: Given y that is strictly feasible for the
dual, define
= IIs?iAT(b/[t --H AS?1e)II.
Note that ? depends on s also; given a point y; we let s = ATy --H c to apply
this formula. This is the measure that we will use; it requires only dual
information. Observe that for an exactly centered point, ?(y,;) = 0.
The second is the primal-dual measure: Given an x satisfying Ax =
and given y that is strictly feasible for the dual, define
= IISx --H eli.
7
Note that ?x is a primal point with A(?x) =
The third is: Given y that is strictly feasible for the dual, define
= lvii, where v satisfies b/? --H AS?1e = AS?iv.
In the following lemma, which is proved by Gonzaga [7] and also in Goffin et
al. [5], it is shown these three measures are essentially equivalent. Moreover,
starting from an approximate central path point, Newton's method can be
applied and converges to the central path point at a quadratic rate.
Lemma 1 Given y that is strictly feasible for the dual and ? > 0, we have
1. For all x satisfying Ax b/j, and for all v satisfying blti --H AS?ie =
AS?1v,
n(x,y,jt) and ?(y,?) < ?(y, v, it)
2. There exists an X such that Ax = b/jt and
<--H qy,it).
3. There exist a v such that AS?iv = b/it --H AS?1e and
lvii < S(y,it).
?. Suppose ?(y?it) < 1, and let
r = (AS?2AT)?i(b/it --H AS?1e)
and y = y --H r (the Newton step). Then y is strictly feasible for the
dual and
?(Y?it) ? ?(y,it)2
The main point of the proof is that solving the least squares problem
minimize llSx --H eli, subject to Ax = b/it
whose solution is
x(y) = S?2AT(AS?2AT)-i(b/1) + S?1P?s-ie,
8
(5)
(6)
where PAD is the nulispace projection matrix
PAD = I - DAT(AD2 AT)-l AD
simultaneously minimizes all the quantities in the lemma for a fixed y.
Suppose for some y that 6(y,jt) = ? < 1, and let the slacks be s. It
follows from quadratic convergence that for all i,
H?(1 --H ?(2?))&i ? si(i) < fI?(1 + 8(2P))5j
p=o			p=o
(7)
Here, s(?) is the slack of the central path point. Multiplying out the right-
hand product gives exactly the geometric series
1 + ? + ?2 +
The left-hand product is bounded below by the geometric series
1 ???2??
although this bound is not exact. Thus, we have the following corollary which
is proved in various places, including [7].
Corollary 1 Let y be an approximately centered point with ?(y, ?) = ? and
? < 1. Let s be the slacks for y, and let (y([t), s(?)) be the central path point
for?. Then for alli=1,...,n,
where e = ?/(1 --H
(1 --H e)sj ? si(ii) < (1 + e)si
3 The parameters XA and XA
Let A be an m x n matrix of rank m. Let us define D to be the set of all
positive definite n x n diagonal matrices. Let us define
XA = suptIIAT(ADAT)?lADII : D E D)
9
Stewart [22] and Todd [24] independently proved that this quantity is always
finite. Equivalently, one can show that
XA = supf II(ADAT)?lADII : D e Di.
is also finite. These bounds apply for any induced matrix norm, but for this
paper, we confine ourselves to 2-norms.
An equivalent way to define these parameters, which is perhaps more
relevant for this paper, is in terms of weighted least-squares:
XA = sup
and
y minimizes IID(ATy --H c)II for some c C R?, D ? D
(8)
XA = sup IlATYIl			.
cl : y minimizes IID(ATy?c)II for somec E R?,D ED
(9)
Our complexity bounds will depend on these two parameters. Note that XA is
unchanged when A is scaled by a constant, whereas XA scales inve?rsely. Since
our complexity bounds are independent of constant scaling, we naturally can
expect the bounds to depend on XA and on the product xAIIAII.
A further result concerning these parameters is due to Stewart [22] and
O'Leary [16]. Let
Let
Define
S = E ??: Ilsil = 1 and s = ATw for some w E ?m1.
X = E ? : ADx = 0 for some D E DJ.
Po = inf[IIx--HsII : x E X,s E SJ.
Their result is that 11XA = Po Thus, we could use the reciprocal of this
infimum as an alternative definition of XA Indeed, it is more general because
it applies to matrices A that are rank deficient.
Further theory concerning these parameters has been developed, for ex-
ample, by Vavasis [27, 28]. The first paper shows that XA, XA act like condi-
tion numbers for numerically stable solution of weighted least-squares prob-
lems involving A. The second paper estimates variants of these parameters
10
for finite-element problems in terms of the geometry of the finite-element
triangulation.
Here is an example of a useful result stated explicitly in [27] but also
implicit in [16] and [22].
Lemma 2 Let B be an m x m submatrix of A that is nonsingular. Then
liB-111 < XA.
PROOF. Let y be arbitrary, and consider solving BTx y. If we could
bound lixil in terms of Ilyll, then we would have a bound on IIB?TII. Let De
be an n x n diagonal matrix with l's in the positions corresponding to columns
selected for B, and a small quantity e > 0 in other positions. Then we have
BTx = so BBTx --H By, so x --H (BBT)?lBy. Let y be extended to an
n-vector y by filling in zeros in positions not corresponding to B; then we
have BBTx --H AD?y. Now, let x(e) be the solution to AD?ATx(e) = ADcY,
i.e., x(e) = (ADfAT)?lAD?Y, so IIx(e)II ? XAIIYII, i.e.,
IIx(e)II < xAIIYII
Since ADfAT tends to BBT as e H 0 and the limit matrix BBTis invertible,
then (ADcAT)?l tends to (BBT)-l, i.e., x(e) tends to x. This shows that
lxii ? xAIIYiI. This gives an upper bound on iiB?Tii; since the matrix 2-norm
is preserved under transpose, we get the same bound for lIB-1 II. 1
This paper provides some additional results concerning these parameters
such as Lemma 3. One major gap in our understanding of these parameters
is the question of computing them; we do not know of efficient algorithms
for computing either parameter. See Section 11 for more comments on this
matter.
A final note is that in some earlier work such as [22] and [27] it was
assumed that A was an m x n matrix of rank n, and the definitions of XA, XA
in those papers involved the transpose matrix rather than the definition used
above. Thus, the notation is slightly different.
l1
4 Layered least squares
In this section we describe one main-loop iteration of the LIP algorithm.
Assume at the beginning of the iteration that we have a current approximate
central path point (y, it) with ?(y, it) < 6. Recall that 6(.,.) was defined as
proximity to the central path in Section 2. We assume throughout that 6 =
1/4. Partition the constraints into p layers ..... . , Jp by using s --H ATy --H c
as follows. Find a permutation ? that sorts the slacks into ascending order:
Sir(i) < S42) <			<
Let g > 1 be a "gap size" defined in terms of A by equation (59) below.
Find the leftmost ratio-gap of size greater than 9 in the sorted slacks, i.e.,
find the smallest? such that S?(i+1)IS?(i) > g. Then let J1 = f7r(1),... ,
Now, put ir(i + 1), 7r(i + 2),... in J2, until another ratio-gap greater than y
is encountered, and so on. Thus, the constraints indexed by Jk for any k are
within a factor of 9? of each other, and are separated by more than a factor
of g from constraints in Jk+i.
To formalize this, define
and
0k = maxs?
iEJk
= m1ns?.
jEik
The construction above ensures that for each k,
0k <?+i/g.
and that
0k < gfl?
We continue using ?k, 0k with these definitions throughout the paper.
Define x as the solution to weighted least-squares problem (5). Note
that if y were exactly centered, then x would be identically S?1e. Let X =
diag(x), and let
A --H X-'12S112
Again, note that if y were exactly centered, we would have A =
(10)
12
Consider the matrix xs; if y were exactly centered, then XS = I. Since
y is approximately centered, we find from Lemma 1 each diagonal entry of
XS is between 1 --H ? and 1 + ?. Thus, we have the two inequalities
and
lixl/2s1/211 ?
jix--H1125--H1121 < 1/VMS,
which will be used frequently during the upcoming analysis.
We introduce a new kind of step, the LLS step, to generate a direction
r for the interior point method. Let A1,... , Ap be the partitioning of the
columns of A according to ...... , Jp. Similarly, let A1,... , Ap be diagonal
matrices that are the submatrices of A indexed by ..... . , Jp. Various other
vectors and matrices are partitioned in the same way.
The layered least squares (LLS) step from point y is as follows. Define
L0 = R?. Then for k = 1,... , p compute affine subspace
Lk := fmimmizers r of IIA?l(A?Tr --H sk)II subject to r ? Lk?11,
(11)
so that L0 D L1 D D Lp. Finally, let r be the unique element in Lp
(unique because A is assumed to be full rank). Note that, for example, if Af
spans R?, then L1 has a unique element, and thus L1 = = Lp = frj.
Thus, the data items necessary for specifying an LLS step are (a) a parti-
tioning ..... . , Jp, (b) positive definite diagonal weight matrix A, (c) coeffi-
cient matrix A, and (d) right-hand side vector s. The solution is a linear func-
tion of s. Thus, by linearity, we can restate the computation of r as solving an
LLS problem for y where on the kth step we minimize IIAkl(AkTy?ck)II, and
then taking r = y --H Y. This alternate viewpoint is useful for understanding
some of the examples below.
The layered least squares step may be thought of as weighted least-squares
with the weights in J1 "infinitely higher" than the weights of J2, and so on.
Indeed, this idea is formalized by Lemma 3 below.
The vector r can be compared to the Newton step associated with advanc-
ing an ordinary path-following method. For instance, if all the constraints
were in J1, then r would be precisely the direction of the Newton-step for
a primal-dual interior point method (e.g., Kojima et al. [10], and Monteiro
and Adler [15]).
13
We now update y = y --H (1 --H a)r, and ? = lia. Here, a is a scalar
parameter between 0 and 1. The idea is to choose a as close to zero as
possible (or perhaps equal to zero, in the case of the final LLS step) as long
as the point y remains approximately centered. In Section 5 we give a precise
formula for such an a, which involves solving some additional least-squares
problems. An inexact line-search is also possible; see (61) below.
After the LLS step we improve the centering of y with three Newton
iterations, holding the central path parameter fixed at ?a. Following the
recentering, we perform ordinary path-following steps to decrease the central
path parameter from ?a to jta/C(A) where C(A) is defined by (60) below.
We now state the LIP method.
Algorithm LIP
Input: A, b, c and (y, jt) satisfying ?(y, j) < ?.
Output: Optimal points y* for (2) and x* for the primal (4).
Repeat:
1. Compute LLS solution r as above.
2. Compute a e [0,11, either by the method of Section 5
or by line-search.
3. Update ? := y --H (1 --H a)r, ? :=
4. If a = 0, then set y* := y, compute x*, and HALT.
5. Recenter y with three Newton steps, yielding y'.
6. From (y', jta), compute approximately centered point
(y", ia/C(A)), using existing path-following techniques.
7. Update y := y", ? := ?a/C(A), and continue.
It is not a priori clear that this algorithm has finite termination, let alone
polynomial running time. Most of the remainder of the paper will be devoted
to establishing various properties of Algorithm LIP. Note that the algorithm
assumes that an approximately centered point is part of the input. This
in turn implies that the feasible region for (2) is nonempty and contains a
strictly interior point. In Section 8 we explain an initialization procedure
that makes these assumptions valid. The output y* of Algorithm LIP is a
dual optimal solution, but below we explain how to use information from the
final LLS step to also compute a primal optimal solution x*.
The above algorithm uses existing path-following techniques applied to
the dual problem (2) to reduce the central path parameter from jta to
14
?a/C(A). For the analysis of such a technique, see, for example, [19] or
[7]. Usual path-following techniques decrease the value of central path pa-
rameter by a factor of 1 --H 1/(const#) on each iteration using a Newton
step. Thus, O(n112 log C(A)) traditional interior point steps are used for
step 6 above.
We now show that the LLS step is in fact the limiting case of an ordinary
weighted least-squares problem.
Lemma 3 Consider the general LL5 problem with input data given by a
partition ..... . , Jp, positive definite diagonal weights A, rn x n coefficient
matrix A of rank m, and right-hand side c. LetD(e) denote the nxn diagonal
matrix depending on e > 0 whose entries are as follows. For k --H 1 . . ,
for indices i ? Jk, the (i, i) entry is ek. Let y* be the LLS solution, and let
y(e) be the solution to
Then
minimize ?D(e)A?1(ATy --H c)tI.
J?w+ y(e) =
PROOF. Let us assume e < 1/2. If we write out the Lagrange multiplier
conditions defining the LLS step, they are:
A1A1?2Afy* --H A1A1?2c1 = 0,
A2A22A2Ty* --H A2A22c2 = A1 A2,1,
ApA?2A?Ty* --H A?A?2c? = A1Ap,1 + ... t Ap?1Ap,p?1.
(12)
We claim these conditions uniquely define y*, assuming that A is full rank.
To see this, suppose y, y* are two different solutions. The first equation
implies that y --H y* lies in the nullspace of Af. The second equation implies
that that A2Ai2AT2(y --H y*) lies in the image of A1. Taking the inner product
with (y--Hy*) shows that in fact y--Hy* lies in the nulispace of A2T. Proceeding
in this manner by induction, we conclude that y --H y* lies in the nullspace of
ATk for all k, i.e., y --H y* = 0.
15
Therefore, there must exist pseudoinverse matrix P that maps the data
vector AA?2c to the unique solution y* of (12). Note that the Lagrange
multipliers are not unique.
Write y in place of y(E). Note that y, being the solution to a weighted
least-squares problem, satisfies
Ii?II < xAIIcII
independently of ? by (8). Now, consider the optimality condition defining
E2AlAf2AfY+???+?2PApAp2ApTY??2Al?1?2Cl??????2PApAp2Cp= 0. (13)
Let C be the maximum of IIAkAk2AkTII + IlAkAT211 taken over k = 1,...
If we separate the terms involving A1, we obtain:
II?(AiA1?2Afy --H AiA1?2ci)II < (e4 + ? + ... + c2?)(CxAIIcII)
< ?4c'
where C' denotes 2CxAIIcII. Thus,
IIA1Af2Afy --H AiA1?2ciII < ?C'
Similarly, from (13) we can obtain the inequality
IIA2A2?2A2Ty --H A2A2?2c2 + AiA1?2Afy/t2 --H AiA1?2ci/?2II < t2C'
which we can write as
IIA2A2?2AT2y --H A2A2?2c2 --H A1ull < ?C'.
Proceeding in this manner, we see that y, accompanied by appropriate
choices for the Lagrange multipliers, is an ?2C'-approximate solution for (12).
Since these conditions have a pseudoinverse P, we conclude that
II? --H y*II < IIP?I?2C'
This proves the lemma. I
Note that as a corollary, we can conclude from (8) and (9) that the
solution y to the LLS problem is bounded in norm by
II?II < x?IIcII			(14)
and
IlATYlI < kAilcil
independently of the choice of weights and partitions.
(15)
16
5 On the centering ofthe LLS step
In this section we propose a specific rule for choosing a E [0,1] in the LIP
algorithm, and we prove the first main theorem concerning the method. This
theorem states that for all choices of a between a and 1, the point y =
y --H (1 --H a)r is approximately centered for central path parameter ia.
First, we define a. Define an auxiliary sequence of vector r(k), k =
1,... , p as follows. Vector r(k) is also defined as the solution to a LLS
problem with the same coefficients used for r but different right-hand sides.
Specifically, the LLS data for r(k) is the same partitions ..... . ,Jp, the
same weight matrix A, the same coefficient matrix A, but right-hand side
vector ....... ,sTk?l,O,... , 0)T, i.e., zeros filled in for positions indexed by
..... U Jp. For example, note that r?1? --H 0. Intuitively, r(k) "agrees" with
in affine spaces up to Lk?l in (11), but from k on, we choose r(k) so as to cause
minimal weighted change on constraints ATky?ck, AkT+ly?ck+l,... , ApTY?Cp
Define a sequence of nonorthogonal projection operators Pi... . , Pp as
follows. For matrix A, find a subset of columns of A1 that span the range
of A1 but are linearly independent. Next, find a subset of columns B2 of A2
such that [B1, B2] are linearly independent and span the range of [A1, A2].
Repeat this process until we obtain an n x n nonsingular matrix ....... ,Bp].
Denote this matrix as B. Now, for k = 1,... ,p, define Pk = [0,Bk,0]B?1,
where the sizes of the zero blocks are chosen so that Bk in [0, Bk, 0] lies in
the same position as Bk in B. (If Bk has no columns, take Pk = 0.)
These projection operators have the following properties:
1. They are truly projections because PkPk = Pk.
2. They are complementary in the sense that PkPI = 0 if k $ 1.
3. The image of Pk lies in the span of Ak.
4. The sum Pi + ... + Pp is the identity operator.
The other important property of these projections is stated in Lemma 5
below.
Define Sk, fork = 1,... ,p, as follows:
= IIA??iA?T(r --H r(k))II.			(16)
17
Define
= IIA??l(AT?r --H sk)II.			(17)
Intuitively, ? measures how much improvement we can make decreasing
AT?y --H Ck on the kth LLS step, and Ek is the residual from the kth LLS step
defining r.
We define 0tk, for k ...... ,p, as follows. If
?			1			(18)
then we take
= min(1, l5Ek?).			(19)
Else we take
= min 1 30fl1?50kXAIIPkbI			(20)
[tI
Let us say that an index k is of Type 1 if (18) holds and Ok is defined by
(19), else k is of Type II.
Now we define scalar Pk,I for all choices of (k, 1) such that k is Type I, 1
is Type II, and k> 1. The scalar Pk,t is defined to be
Pk,t = min(l,3Ox?Ojn2/??).			(21)
Now we choose d to be the maximum among all the 0Lk'5 and the Pk,1'5
(which is always between 0 and 1). We assume for now that the parameter
value a used in Algorithm LIP is precisely this a.
The upcoming analysis of the properties of a is rather involved, so we
start by providing four examples of the LLS step to help the reader to gain
intuition about this parameter. We discuss further properties these examples
in Section 6.
Examples.
1. The first example is the case when there is no near-degeneracy in the
problem. Consider
minimize Yi + 2Y2
subjed to 0 < Yi ? 1,
0 < Y2 < 1.
18
Suppose we are at an approximately centered point such as y = (271,71)
where 71 > 0 is fairly small. Then J1 = fyi > 0,Y2 > O? and J2 =
fyi < 1,Y2 < 1J. In this case, J1 is spanning, so the LLS step is
completely determined by Ji. Furthermore, the constraints in J1 are
linearly independent, so the least-squares computation associated with
J1 is solved as an equation. In particular, the value r computed is
identically equal to y. Note that ?i = 0 because the least-squares
problem has no residual. Also, a2 = 0 since P2 = 0. Finally, there
is no Pk,I defined since there is no k of Type I, ( of Type I, such that
k > 1. Thus, a = 0. Note that y --H r is the exact optimum solution of
the LP.
2.
The next example is the problem illustrated in Figure 1. Suppose c> 0
in the figure is very small, and suppose y is a point like point B. Then
the near-active constraints will be J1 = fyi > 0,Y2 > 0,Yi + Y2 >
The second layer will be J2 = fyi ? 1, Y2 < 11 Note that A1 spanning,
so the step r is determined entirely by a weighted least-squares problem
involving J1. Thus, the vector --Hr, an offset from y, connects y to a
point that is O(?) away from (0,0) (possibly infeasible). Furthermore,
J1 is Type I and J2 is Type II. The parameters are a? = c ? for some
constant (the least-squares residual) and a2 = 0 (note that P2 = 0 in
this case). There is no Pk,t for this example. Thus, a = O(?). Thus,
the LLS step takes us to a point that is O(?) away from the origin, that
is, a point like C in the figure.
3. For the third example, consider the problem
minimize
subject to
Yi + ?y2
o < Yi ? 1,
o < Y2 < 1.
Assume ? is very close to zero, but may have either sign. Suppose
we are at an approximately centered point like (71, 0.5) where 71 > 0
is small but much larger than El. In this case, J1 = fyi > 0) and
J2 = fy2 > 0,Y2 < 1,yi < 1). Here, J1 is Type land J2 is Type II.
The least-squares computation associated with J1 has a exact solution,
so that r1 = Yi = 71, i.e., y --H r exactly solves the constraint in Ji.
The second component of r is left undetermined by Ji. The least-
squares computation associated with J2 causes a small perturbation,
19
so r = (7?,0), where 0 Is on the order of e. Note that the first least-
squares computation was exact, so ci = 0. For J2, which is Type
II, we observe that a2 = o(IIP2btI). For this simple example, P2 is
projection onto the second coordinate, so we see that a2 = O(IeI).
Thus, a = O(IeI), and we follow r so that the LLS step carries us to a
point approximately (O(Jei), 0.5). Again, we prove in this section that
such a point is approximately centered.
4. For the last example, consider the problem
minimize
subject to
Yi
o < Y2 < e,
Yi > Y2
Here, e > 0 is very small. Suppose we are at an approximately centered
point such as (100,e/2). In this case, J1 = ?Y2 > 0,Y2 < eJ, and
J2 = ?Yi > Y2) Observe that J1 is of Type II, and the least squares-
computation associated with J1 forces r2 to be a number TI very close to
zero much smaller than e. The least-squares computation associated
with J2 determines r1 and has no residual. Thus, r is something like
(--H100 + TI --H e/2, ij), so that y --H r satisfies Yi > Y2 as an equation. Note
that a? = 0 because b is in the nulispace of P1. Note also that a2 = 0
because the least-squares computation has no residual. However, in
this case P2,i is proportional to eli00, so a is proportional to e/10O.
After updating y, we are at a point close to (0(e), e/2).
Before stating the main theorem, we have a few preliminary lemmas.
Lemma 4 Suppose v = Ax for two arbitrary vectors v and x. Let B be a
subset of m linearly independent columns of A. Let u be the (unique) solution
toBu=v. Then
lull < x?llxll.
PROOF. Since Bu = v --H Ax
u = B?1Ax = BT(BBT)?iAx
20
By filling in zeros, BT(BBT)?lAx can be extended as
DAT(ADAT)?1Ax
where D is a diagonal matrix with d,j = 1 if column j is in B and djj = e > 0
otherwise. Thus, we have
lull = J?w+ llDAT(ADAT)?lAxll < x?llxll.
Lemma 5 Suppose that q = P1 A'x for vectors q, x, where A' is a subset of
columns of A. Then we can also find a vector w such that q = A1w and
liwli < x??llxll.
PROOF. It is sufficient to prove the case that A' = A, since we can always
pad x with zeros to make A' = A. By definition of P1, q = [0,B1,0]B-1Ax.
This is the same as writing q = B1y, where y is a subvector of ? = B?1Ax.
That is, B? = Ax. Recall that B is a subset of columns of A that is a basis.
Thus, by Lemma4, we can see that IlYll ? x'?llxll. Then ll?ll < IlYll
Thus, we have shown that if q = P1Ax, then q = Biy with ll?ll < x?llxll.
We can also claim that q = A1w with the same bound on w, since B1 is a
subset of the columns of A1. I
Lemma 6 For each k,
llA??lAT?r(k)ll `? #m? 7nx??Ig.
PROOF. Since r(k) is an LLS solution, then by (15),
llATr(k)ll ? x?Allull,
where u is a vector that agrees with s in positions indexed by J1 ... U Jk-l
and has zeros in the remaining positions, By definition of u, we can see that
hull < VW0k-1, so
llATr(k)ll < Ynx??O?-i.
21
Thus,
iiA??lAT?r(k) ii
ilAk1il			iiAT?r(?ii
= iiXk1?2Sk?112il . iiAT?r(k)ii
? iiXk1?2Sk112ii			iiSk?1ii			iiAT?r(k)ii
? Am (1/bk)?x?A0k-1
? Am
Now for the first main result of the paper.
Theorem 1 Let a E [a?, 1] be chosen arbitrarily. In the case that a = 0,
assume also that a > 0. Then y, defined to be y --H (1 --H a)r, is a strictly
feasible point. Furthermore,
?(y, ?a) < 2? + (1 + ?/5.
PROOF. Note that we are assuming ? ? 0.25, that is, ?(y,ij) <0.25. Thus,
the bound on the right-hand side is at most 0.75. We denote the slack vector
ATY --H c by s. Note that
= AT(y --H (1 --H a)r) --H c = s --H (1 --H a)ATr. (22)
Observe that if ak = 1 in (19) or (20), or Pk,I = 1 in (21), then a = 1, so
= y and the claim is trivial. Thus assume that a < 1 for this proof, and
that none of the minima in (19), (20) or (21) is achieved at 1
Before proving the theorem, we develop several inequalities. Let I and II
denote the partition of .1.... ,p) into Type 1 and Type 11 layers. For k E I,
since iiA?i(AT?r --H 5k)ii = ek, we have
iiS??lAT?r --H eli
This, with (22), means that
iiSk?'5k --H aeii
= ii(xksk)?1I2??w1(AT?r --H 5k)ii
< il(XkSk)?112ll			llA?1(A?Tr --H sk)ii
< ek/Am.
= lie --H (1 --H a)&lA?Tr --H aell
= (1--H a)iie --H S??1AT?rii
<?(1--Ha)e?/Am
<ek/Am.
22
This implies
(23)
ii			____
Sk? 5k --H e < Ek
a
Note that we used the fact that (??/a) < 1/(15?), which follows from the
fact that a> a? = lSYnEk, taken from (19). Therefore, for i E Jk and k E I
--H Cj			-
a?y			Sj
a(aTy --H ci)			as?
<1+
<2			(24)
where we require
<1.
Obviously, ? = 1/4 meets the requirement.
For k E II, observe from Lemma 6 and (16) that
II???iATkrII <
IIA??iAT?r --H A?iAT?r(k)II + II???iAT?r(k)II
? +.
(25)
Since k is a Type-Il layer, we have an upper bound on ? given by the opposite
of (18). Similarly, we have the same upper bound on ffTh Yn(x?A/g) if
we assume that
9>3O#?l+?.nxA.			(26)
When we define 9 in (59) below, we will take (26) and other similar inequal-
ities into account. Assuming (26) and the opposite of (18), we conclude
that
IIA?lAT?rII <			1
15?			(27)
IISk?isk --H eli = lie --H (1 --H a)S??1ATr --H eli
This means
23
= 11(1 --H c)s?lAT?rII
= 11(1 --H Q)(x?sk)?l/2???lAT?rII
? (1 --H Q)II(Xk5k)?1?2II . IIA?lA?TrIi
15#nVV7?
Thus, for i E Jk and k E II
a?y--HCj			= Sj
aTy --H Cj			--H--H1
Sj
- 15#Th
? 1.
_____ _____			(28)
(29)
Again, we used requirement (25) to derive the last line. Note that (23) and
(29) show that y is strictly feasible because they imply s > 0.
Observe that we have proved the following statement: for constraints in
Type-I layers, along the segment connecting y to y --H (1 --H a)r, the slacks
decrease roughly proportionate to c. For Type-Il layers, the slacks stay
roughly constant. These observations are crucial for most of the remainder
of the paper.
As we discussed earlier, to prove the theorem, it is sufficient to prove that
the central path residual
q = b/? --H
can be written as aAS?1v where lvii ? P, where we can take P = 3/4. This
establishes the approximate centering property.
Note that b/jt = Ax, where x is given by (6), so we can write q as
p
--H oA?S??1e)
k=1
--H cA?%1e)
kEl
--H aA?S??1e)
kEl
--H qi + qii + q'?,
+ ?%i(Akxk --H ?A?%1e)
+ ? (QAkxk --H ?A?%'e) + ? (1 --H
kEll			kEll
24
where
and
qI = ?(Akxk --H oA?S??'e),
kEl
qii = ? (oAkxk --H ?A?S??1e),
kEll
q'H= Z(l--H?)Akxk.
kEll
We will prove that
qi = 0AS?1v1, qil = ?AS?1vu, and q'? = aAS 1v111
(30)
where
Ilvil < ? + (1 + ?)/15, IIviiII < ? + (1 + ?)/15, and IIV'HII ? (1 + ?)/15.
This proves the theorem by taking v = vi + v11 + v'11, with
lvii ? 26 + (1 + 6)/5.
If 6 = 1/4, then lvii < 3/4.
First consider qi. Concatenate the matrices Ak for k E I into a large
submatrix A1, and index other variables similarly. Then, from (22)
qI =
Note from (5) that
A1x1 --H ?A141e
A1S71 (Xisi --H ae)
AiS1?1(Xi(si --H (1 --H ?)Afr) --H oe)
AiSy1(Xisi --H (1 --H ?)X1Afr --H ae)
AiS1?1[?(Xisi --H e) + (1 --H o)(Xisi --H XiAfr)]
oA1S1?1[(Xisi --H e) +			?(xisi --H XiAfr)].
1--H?
lixisi --H eli ? 6
1--Ho'(Xrsj --H XiAT?r) < ll(Xisi--HXiAfr)ll
a			a
and
25
II(X1S1)"2?i1(s1--HAfr)II
Q
II(Arisi)l/211lAy1 (Sr--HAfr)II
OL
#m?.IIATl(Sr--HAfr)II
a
#m?ZkErIIAk1(Sk--HAT?r)II2
a
a
#7Ynmax??r Ek
a
rnaXkEI Ek
fflaXkEI ak
The last inequality due to the choice of 0Lk in (19). Thus, we can take
1--Ha
v1 = (Xis1 --H e) + ? (X1s1 --H XiAfr)
and fill zeros up to dimension n, and have
Ilvill ? ? + (#7)115.
This concludes the first term of (30).
Second, consider qrr Define A11 as above. Then, from (22)
qil = aAii(xii --H SWe)
= aAiiSW(Xiisii --H
= aAiiS1-;(X11(s11 --H (1 --H a)Af1r) --H
= aAiiST' [(X11s11 --H --H (1 --H a)XiiAf1r].
llXiisii --H eli < ?
Note again that
26
and from relation (27)
11(1 --H ?)xiiAfirII = 11(1 --H a)(X?S?)112?7?A?rII
<--H II(x11s11)1?2II II?i?AfirII
Thus, we can take
= II(xiisii)11211 k?I IIA??lAT?rII2
15#n
?
vii = (??HsII --H e) --H (1 --H a)X?Af1r
and fill zeros up to dimension n, and have
IIviiII < ? +
This concludes the second term of (30).
Third, consider
q'H= Z(l--HQ)Akxk
kEll
p
= ?Pt[z(1 --H?Akxk]
1=1			kEll
p
=?Pt[ Z
1=1			k>I,kEII
(1 --H ?)Akxk].
The last relation holds because PLAk = 0 when k < 1, i.e., Ak lies in the span
of Bl,...,Bk. Let
qi=i)[?(l--Ha)A?x?]=Pt[ z (l--HQ)Akxk]
kEll			k>1,kEII
If we can prove that for 1 < 1 < p
= ?AiSF1vi and IIvtII ? (1 t ?)/(15?), (31)
27
then, we define
v'ii = (vT, ...... , vT)T, and q'? = aAS?1v1,1,
and establish
IIvi'iII ? 11v1112 ? (1 + ?)/15,
which concludes (30).
Consider the following two cases for qi.
CASE 1, 1 is of Type I. In this case, we must have
Note we have
qi =Pt[Z(1--H?Akxk]
kEll
=P1[ ?
k>I,kEII
11(1 --H a)xkII
(1 --H
--H 11(1 --H o)X?S?(S?)?1eII
< IIXkSkII hell liSk111
<(1 + ?)#/?
Now we apply Lemma 5 to conclude that
= Z Aiwk = A1( ? wk), IIwkII < x?A(1 + ?)Vmn/?
k>I,kEII			k>I,kEII
Next, observe that
= ?ATSF1( ? Uk) where Uk = SIWk/oL
k>t,kEII
so
liSt/all llwkll
=max?Si llwkll
iEJt a
< max2s? llwkll
iEJ1
<2OL.x?A(1+?Ai/4)k
<2xA(1 + ?)7n/g.
ilukil
28
Note that to derive the third line we used (24) since 1 is of Type I. The last
line was derived because ?0t < ?k since k > 1.
If we require that
9 > 30XA?2			(32)
then we can claim that Ilukil < (1 + ?)/(15n1?5). Then, we let
v1 =			? Uk),
k>1,kEII
and have
qi = aATSF1vt, lviii < (1 + ?)/(15#n),
which gives (31) and concludes CASE 1.
CASE 2, 1 is of Type II. In this case, we have
qi= Pi[Z(1--H?Akxki
kEll
= P1[(1 --H ?)b/? --H S(1 --H oL)Akxk]
kEl
s
k>i,kEI
In the expression (33), the first term is
q'1=(1--H?)Ptb/?
(33)
(34)
where 1 E II. We write (1 --H oL)Pib in the form ?cA1SF1u1. In general, given
the vector P1b, we want to write it as A1y with a bound on Ilyll Observe
that P1b = [0, B1, O]B-1b = B1y, where we define y to be a subvector of
y = B-1b. Thus, we need a bound on y in terms of IlPibil, that is, in terms
of ilBiyil. Let y' be the extension of y to an m-vector with zeros filled in.
Observe that
ii? II = IIy'II
? iiB?1ii
? XA IlBy1ll
--H ?A IlBiyll
--H XA IlPibli.
(35)
29
Note that the third line was obtained from Lemma 2. Thus, we can write
(1 --H a)P1b in the form jt?AtS?'ui such that
IlitaSF1uill < XA IIP1bII.
This means that
Iluill < 1iS1 IXAPibli			(36)
[La
< OtxAliPibil
_			(37)
where Oj denotes the largest entry of S1. it follows from (29) that Oj < 2 01.
Proceeding from (37),
Iluill < 2OlXA?aIIP1bII
Since ?t < a,
hut < 2OixAIIPtbII
[Lat
By choice of aj in (20), this implies that
Ilutli <
Finally, consider the second term in (33)
q7=Pt[--H ?
k>I,kEI
Note we have
11(1 --H a)xkII = 11(1 --H a)(X?S?)S??1eII
< (1 --H a)IIX?S?tI lieu 115k' II
<--H (1 + ?7n/?.
Now we apply Lemma 5 to conclude that
q7 = ? Alwk = A1( ? wk), IIwkII < xA(1 + 6)VW/?.
k>i,kEI			k>l,kEI
30
(38)
Next, observe that
q7 = ?ATSF1( Z Uk) where Uk = $twk/Q
k>t,kEI
so
Ilukil
= liSt/Qil IIwkII
Sj
max--H IIwkII
iEJt CL
? i?i?ajx?%Si `IIwkII
< 20t?kA(1+?)#/(kk
CL
< (1 + ?)/(15n1?5).
Note that to derive the third line we used (29) since 1 is Type II, and to
derive the last line we used CL > Pk,t in (21), since k > 1, and 1 is of Type II
and k is Type I.
Thus, we have
q7= ? CLAISF1Uk
k>i,kEI
where Ilukil < (1 + ?)/(15n1?5), which together with (38) implies that
qi = q1, + q7 = CLAISF1Uk, Ilukil S (1 + ?)/(15n1?5)
kEIi)u?k>(,kEI)
or
= CLA1AF1v1, Ilvill =			Uk < (1 + ?)/(15?).
kE?1?u?k>(,kEIJ
Thus, this proves (31) for 1 = 1,... ,p, and establishes the third term of (30),
and, therefore, proves the theorem. I
We can now comment further on Algorithm LIP. We have proved that
after the LLS step, 6(y, iCL) < 0.75. This means that after three Newton
steps in Step 5 of the algorithm, by Lemma 1, we have a point y' such that
?(yl, [LCL) ? 0.758 <0.25, so y' is well-centered for continuing the algorithm.
31
This theorem has a second consequence as far as Algorithm LIP is con-
cerned. In Algorithm LIP, if a = 0 then we terminate. The following corol-
lary shows that this termination is at an optimum point for (2), and an
optimum point for the primal may be recovered by solving a least squares
problem.
Not only is it an optimum, but, in the case of degeneracy (multiple op-
tima), the point computed by Algorithm LIP is an approximate analytic
center of the face of optimal solutions. Proving that Algorithm LIP com-
putes an approximate analytic center in the case of degeneracy is important
for our initialization procedure defined in Section 8.
Recall that the analytic center of a polytope [y : ATy > cJ is defined to
be a point y* such that AS?'e = 0, where s = ATy* --H c and 5 = diag(s).
If the polytope is not full dimensional (which is always the case for this
corollary) and it is given as : ATy --H c0, ATy > cJ, then its analytic
center y* satisfies (see Mizuno et al. [14])
A0x0 + AS?1e = 0, for some x0, (39)
where s = ATy* --H c. We say that y is an approximate analytic center of a
polytope (y : A?y = Co, ATy > C] if
A0x0 + AS-1f = 0, for some x0, (40)
where s = ATy --H c, 5 = diag(s), and II? --H eli < 0.75.
Corollary 2 Suppose a = 0, that is, ek = 0 for Type I layers, Pkb = 0 for
Type II layers, and Pk,1 = 0 for all k, 1 for which it is defined. Then y* = y --H r
zs an optimum solution for (2), and it is an approximate analytic center for
the (dual) optimal face. Moreover, by solving a single least squares problem
we generate an optimum solution x* for the primal.
PROOF. Define y(a) = y --H (1 --H a)r. From Theorem 1, we know that y(a)
is strictly feasible and approximately centered for all a E (0,1]. In fact, we
have the following equation:
q = b/? --H aAS(a)?'e = aAS(a)?'v(a), (41)
where lIv(a)ll ? 3/4. Consider taking the limit of this equation as a 0+.
Since y(a) is strictly feasible, the slacks satisfy s(a) > 0 for all a ? [0,1].
32
Partition ?1,..., ?1 into B and N, where B is the set of indices such that
= 0 and N is the set of indices such that sj(O) > 0. Note that s*, the
slacks of y*, are equal to s(O). As CL 0+, the matrix CLSN(CL) tends to zero.
Thus, in the limit, the left-hand side of (41) tends to biji --H
But notice that CLSB(CL)-1 is a constant matrix independent of CL (because
sB(CL) is linear in CL), hence, if we denote CLSB(CL)?1 as 5B1? we have that the
limit of the left-hand side is
--H A?S?1e.
Note that from the final LLS step we can easily construct B, N and 5B
because we have a closed-form expression for S(CL).
Now consider taking limits on the right-hand side of (41). In general, we
cannot claim that v(CL) converges to anything. On the other hand, because
v(CL) stays bounded, a compactness argument shows that there must exist
an accumulation point v* as CL H 0 Then we have the equation:
--H A?S??1e = ABSB1VB* and IIVB*II < 3/4.
(42)
Although we cannot construct this particular VB* (its existence was proved by
a compactness argument), we can certain construct some other VE* satisfying
these relations by solving the least-squares problem of minimizing IIvB II sub-
ject to b/? --H A?5??1e = A?S?1v?; clearly the minimizer has norm bounded
by 3/4. This least squares problem is similar to the termination discussed in
[31].
Next, define x* Ei R? as follows. Let XB* = iS?1(e + VB*), a strictly
positive vector, and let XN* = 0. Observe that x* > 0, and that Ax* =
Thus, x* is feasible for the primal problem (4). Furthermore, (x*)Ts* = 0
because xN* = 0 and SB* = 0. Thus, (x*, y*) are feasible points for the pri-
mal and dual respectively satisfying the complementary slackness condition.
This means that both are optimal solutions. Furthermore, they are strictly
complementary, so this implies that the optimal face for the dual is precisely
(y: ATBY = cB,AT?y > cN?.			(43)
Now we show that y* is an approximate analytic center of the optimal
face. If we subtract (42) from (41) and divide by CL we obtain
ANSN(CL)?1(e + vN(CL)) E Range(A?).
33
By compactness, vN(a) must have a limit point vN* as a 0+ whose norm
is bounded by 3/4. Also, SN(a)?1 is well-defined when a 0 by choice of
N. Thus, for some x0,
A?S?(O)?1(e t v?*) = ABxo.
This is precisely the condition that y* is an approximate analytic center of
(43). 1
6 Crossover events
The theorem and corollary in the last section proves that Algorithm LIP is
valid in the sense that it accurately tracks the central path, and terminates
only when an optimum is reached. In this section, we prove that the number
of main-loop iterations required by Algorithm LIP is finite, and, in particular,
is bounded by n2. The key concept for developing a complexity theory is a
crossover event.
Definition. Given an instance of (2) with a strictly feasible interior point,
we say that the 4-tuple (it, it', i, 5) defines a crossover event, for it > it' > 0,
and for i,j E (1,.. .,n), if
and for all it" E (0, it']?
si(it) <3g?			(44)
sj(it) --H
_____ > 5gfl			(45)
Here si(it) denotes the ith slack for a point on the central path of (2).
Note that one must have i $ 5 for (45) to hold, since g> 1.
Definition. We say that two crossover events (it, it', i,j) and (iti,it'1,ii,ji)
are disjoint if (it', it) A (it'i,iti) =
Lenima 7 Let (iti,it'1,ii,ji),... , (itt, it't? it, it) be a sequence oft disjoint crossover
events for a particular instance of (2). Then t ? n(n --H
34
PROOF. We claim that a particular pair of indices (?,j) can occur at most
once in this list. For example, consider the first two events, (ii, !`i? i1,j1)
and (?2,!'2,?2,j2). Since these are disjoint, then either ?i < [t'2 or <
assume the former. Then I'i <Iti < <jt2. But it is not possible for both
zi = i2 and ji = ?2 to hold because (45) for the second event, for ?ll set to
?i would contradict (44) for the first event.
Similarly, we could not have i1 = ?2 and ji = i2 either, because then
(45) would be contradictory for ?? chosen less than min(?'i, ?`2) (i.e., (45)
would have to hold, and the same relation would have to hold with i and j
swapped.) 1
The main theorem of this section is that every pass through the main
loop of Algorithm LIP causes at least one crossover event to occur, until
a = 0. This sequence of crossover events is clearly a disjoint sequence since
`t decreases monotonically throughout Algorithm LIP. Therefore, there can
be at most n(n --H 1)/2 iterations of Algorithm LIP before termination. We
start by revisiting the examples given on p. 18.
1. In the first example, a = 0 so Theorem 2 doesn't apply.
2.
3.
In the second example, recall that the LLS step takes us to a point
away from the origin. Now observe that after a certain fixed
number of ordinary interior point steps (independent of ?) one of the
three constraints Iyi > 0, Y2 > 0, Yi + Y2 > E? must leave J1 because
at least one of these constraints will always have residual at least 0(e),
whereas the residuals of the other two will tend to zero. Hence there
will be a crossover event involving two of the constraints in J1
For the third example, recall that after the LLS step we end up at a
point like (O(IeI), 0.5). Now, further steps on the central path will force
one of the two constraints fy2 > 0, Y2 < 1? to become active, because
the central path equation
(1, e)T = iAs?ie
must be approximately satisfied, and those are the only two constraints
with a nonzero component in the Y2 direction. The constraint that
becomes active depends on the sign of e. Either way, there will be a
crossover event involving the two constraints (Y2 > 0, Y2 < lJ that are
in J2.
35
4.
For the last example, recall that the LLS step ends up at a point like
(0(e), e/2). Now, after a fixed number of interior point steps, the con-
straint Yi > Y2 will become active. On the other hand, of the two
constraints 0 < Y2 < e they cannot both stay near-active once ? gets
below 0(e). Thus, a crossover event takes place between constraint
Yi > Y2 and constraint Y2 < e
Before proving this theorem, we require some preliminary lemmas.
Lemma 8 Let (y, jt) and (y', ?`) be two points on the central path such that
o < ?` < ?. Let the slacks be 5, s' respectively. Then for any i,
ns
PROOF. Assume n > 1 since the n = I case may be trivially analyzed. Let
x = ?S?'e, and x' =
(This is slightly different from the use of x in the last section.) Then, we
have
Ax = b, and Ax' =
Then,
--H x?)T(s --H s') = 0,
since x --H x'is in the null space of A, and 5 --H s1is in the range of AT. Thus,
which implies that
Z(Xt'?Si + 5t'Xi) = xTs + (xI)Ts? = n(? + tj'),
i=1
t ;i?'+$;? =n(pt+jt'),
i+f
36
Note that for each j = 1,... , n, using the arithmetic-geometric mean in-
equality, we derive
Thus, for any i,
si 1'			?
?+A >277ii.
Jt			5j
5j ?`			It'
--HI--H + --H = n			1+--H
Sj [ 5j			It
A
j#i 5j It + 5j
< n(l+?V?) --H2(n--Hl)?
= n(l --H )7It)2 +
Since 0< $7It <land n>l, we have
n(l --H7F?)2+27m'/?<n.
Thus,
and in particular,
which proves the lemma. I
5jIt' + 5'j
51i It			5j
5.
A<n
5j
Lemma 9 Consider an LLS problem with partitions Ji, . , Jp, weights A,
coefficient matrix A, and right-hand side s. Let r1 be the LLS solution. Let D
be the n x n diagonal matrix with 1 `s in positions corresponding to J1,... Jk
and 0's elsewhere. Let IluliD denote the seminorm (uTDu)112, and let r* be
the minimizer (not necessarily unique) of IIATr --H SlID. Then
IlATri --H SlID < (xA + 1). jjATr* --H SlID
PROOF. Let si be the residual for the unweighted least-squares problem, so
that s ATr* = ?i Let D(e) be chosen as in Lemma3, and let D1 = A?1D(e).
Let r(e) be the minimizer of jjDi(ATr --H s)II. Then r(e) tends to r1, the LLS
solution, as e tends to zero.
37
Observe that
Di(ATr --H			Di(AT(r --H			--H si)
by definition of ?i Thus, we could equivalently define r(e) to minimize
IlDi(AT(r --H --H si)II. Thus, by (9),
llAT(r(e) --H r*)II < kA Ilsi II
Since r(e) r1 as e 0,
Thus,
llAT(ri --H r*)ll < kAlIsill
IlATri --H sjj = llAT(ri --H			--H sill
< llAT(ri --H r*)Il + Ilsill
? (kA + 1)llsill.
Substituting the definition of ?i proves the lemma. I
Lemma 10 Given a point (y, ?) in Algorithm LIP, let ek be defined by (17),
that is, ek is the residual of the kth layer in the LLS step. Then, there is a
constraint i ? Ji ??? U Jk such that for all?" ? (0, ii]?
>--H ymts.E(%k+1)ni,5			(46)
PROOF. Let D denote the n x n diagonal matrix with l's in positions
corresponding to J1 U U Jk and zeros elsewhere. Let lIUlID denote the
seminorm (uTDu)i/2.
Let y' be any feasible vector for (2), and let s' be the slacks for y', and
let r' = y --H y'. Let s be the slack variables for y, and let r be the LLS step.
Then we have
lls'llD = ?ATyI --H CIlD
= llATr? --H 5lID
> mintllATrll --H 5lID :			C R?1
> llATr --H sllD/(xA +1).
38
To obtain the last line we applied the preceding lemma.
Thus, for any feasible point, and in particular, for s(j"), we have
IIs(?')IID > IIATr --H sJJD/(x?A + 1).
Let us first address the right-hand side of this inequality. First, note that
Now we have:
IlAk111 = II(XkSk)1?2Sk?1 <
IIATr --H SlID > IIAT?r --H Ski
> IIA??l(A?Tr --H Sk)II/II?k'II
--H ?k/IIAk1II
Combining, we have shown that
?k?k
IIS(?')IID ?
This means that for at least one? c J1?.. U Jk,
--H>
(47)
For the argument in the last paragraph, the choice of? apparently depends
on p11. The claim in the lemma is that we can pick i independently of ijll To
prove this stronger claim, observe that, if, for some ?` E (0, i], the inequality
?k?k
(x?A+1)Ym+?'n15
(48)
holds, then for all ?" Ei (0, it'], (47) could never hold because of Lemma 8.
Therefore, there has to be at least one fixed ? such that (48) never holds for
any it' e (0, it], i.e., (46) always holds for all it" ? (0?it] for this ?. I
39
It should noted that this lemma has a curious nonconstructive property;
the existence of a special i E J1 U ... U Jk is demonstrated, but there is
apparently no method for determining the value of i other than running the
interior point method to completion to find the whole central path. Indeed,
if were possible to identify i in the lemma from the current iterate, then
we could presumably delete the ith constraint from the problem because we
know it is not active at the solution.
We now come to the main theorem for this section:
Theorem 2 Consider one pass through the main loop of the LIP algorithm.
Assume a > 0 in the LLS step. Then at least one event takes place during
this pass as the central path parameter drops from ? to ?a/C(A), where C(A)
is defined by (60).
PROOF. Recall that a is defined to be the maximum of the 0Lk'5 and Pk,t'5
There are therefore three cases to this proof: either a = ak where k is a
Type I layer, a = ak where k is a Type II layer, or a = Pk,i The three cases
correspond to examples 2, 3, and 4 listed above, in that order.
CASE 1, a = ak and ak iS defined by (19).
First, we develop an inequality that applies to any layer k of Type I, not
necessarily the particular k that determines a. This is because we will reuse
the same argument for Case 3 below.
For some layer k of Type I, let v --H r(k+1) --H r(k). Recall that r(k), r(k+l)
are defined by LLS problems. Therefore, v is also defined by a LLS problem
with the same partition, weights, and coefficients but with right-hand side
vector all zeros except for Jk, in which the right-hand side vector is Sk. This
means that v lies in the nullspace of ..... . , ATk?l, and the following Lagrange
multiplier condition is satisfied by v for the kth LLS step:
Note that
Ak??w2AT?v --H Akxk = A1A1 + ... + Ak?iAk?i.
(49)
= IIA?iAT?(r --H r(k))II
= IIA?TiA?TvII
because v --H (r r(k)) = r(k+l)--H r lies in the nullspace of AkT. The assumption
for this case is that ? is bounded below by (18).
40
Consider the product vTb. Since v is in the nulispace of ..... . , ATk?l
and Ax =
p
vTb = vT ? jAixi
(=1
p
= `ivT?Atxi.
i=k
(50)
We now obtain a lower bound on this sum, first by estimating the t = k term,
and then by placing an upper bound on the remaining terms.
Let us focus on the kth term of this sum, which we denote as q:
q = [`vTAkxk.
Substituting the expression for Akxh from (49) into (51) yields
q = [vT(AkAk?2ATkv --H A1A1 --H . --H Ak?1Ak?1)
= !vTAk?k?2AkTv
= 1IIAklATkvII2
(51)
The second line follows because v is in the nulispace of ..... , ATk?l. Ev-
erything else follows from definitions.
Next, let us evaluate the remaining terms in vTb. Observe that v, as a
solution to a LLS problem, has the following norm bound:
IIATvII < X?A0k#
This follows because the right-hand side for the LLS problem defining v has
norm equal to liskil, which is bounded by Ok7n. Let AN = (Ak+1,... , Ap).
Then, analyzing the remaining terms in (50),
it s? vTAtxt = itvTANxNII
1=k+1
< itIIAT?vII IIxNII
< itIIAT?vII . Ii(X?S?)S?1eII
< itIIA?TvII . II(XNSN)II IISN?1II hell
41
< [txA0k#n(1 + ?)IISN1II
? IxA0k(1 + ?)n/??+i
< ix?A(1 + ?)n/g.
If we require itxA(1 + ?)n/g < q/2, i.e., ?xA(1 + ?)n/g < 0.5?? then we
can be guaranteed that vTb > 0?5[t6k2 Taking into account (iS), this places
the following restriction on
y > 2 302(1 + ?)n2xA			(52)
Now, we continue with the analysis of this case, using the inequality
vTb > O.5??. Let us consider a ? and y' on the central path for ?`.
We have:
< vTb
1' vTAi(stl)-le
t=1
= [L v1?e
I=k
= IitvTAN(s??)?le
< KIIANTvII II(SN')?1II lieu
? ?X?A0kfl II(SN')?1II
where AN = (Ak, ..., Ap) and SN' = diag(S?',			, S??). Thus,
II(SN')?1II > ?$$k2Pkn
This means that there is an j ? JkU?? U Jp such that
2?'X?A0k?
2302?'xAOkn2
1
(53)
42
Now we focus on the particular k of Type I that satisfies the hypothesis
of this case, that is, that a = Ok The preceding inequalities held for an
arbitrary it' E (0, it]; select in particular
it?
= 10 3O2(xA + l)xAn45g2?ffTh+?			(54)
and fix a particular j satisfying (53). Then for all ?11 < ?I? combining (53)
with Lemma 8,
5j(it11) < 2302it'X?A0kfl3
it
it?k			2.302XA0k?3
10 3O2(xA + l)xAn4?5g2?Ym+? it
?k0k
S(xA + 1)n"5g2?#m?
Now, consider the index i picked out by Lemma 10; divide (46) by this
inequality to obtain that for all it11< it',
5i(it") >			ek?k			S(xA+1)nl.5g2n??
sj(it11) --H (x?A + 1)n1?5Vm+?			?0k
5??g2fl
Ok
> 5gfl
Thus, we have identified two constraints, i and j, such that (45) holds in the
definition of a crossover event.
Now recall that j ? Jk U . U Jp and i EiJlU???UJk. Ifeither??Ji?or
j ? Jk, then Sj <s?/g. If both are in Jk, then Sj < 9?Sj. Applying Corollary 1
for ? = 0.25 shows that in either case, si(it) < 3gnsj(?) Thus, (44) holds for
z and j. This shows that a crossover event takes place during the main loop
of Algorithm LIP, provided that at the end of the step, ita/C(A) is at least
as small as it' in (54). Recalling that a = u? and 0k = min(1, lS?k?), i.e.,
< 15?k??, we find that ita/C(A) < it' provided
C(A) > 150 302(?A +
CASE 2, a = 0k and 0k is defined by (20).
43
First let us consider i as in (46) for an arbitrary k of Type II (not nec-
essarily the k determining a?). We need a lower bound on ?k for layers of
Type II. Recalling (27), we observe that
IIA??lAT?r --H (X?5?)112eII
II(???S?)112eII --H IIA??lA?TrII
(1 --H ?) --H 1/(15Ai)
1/2
Thus, from (46) there is an			E Ji .... U Jh such that for			all?' < ii,
--H 2(xA + 1)n1?5Vm+?			(55)
Now, consider an arbitrary jt' < jt. We have the following computation
using Lemma 5:
Pkb = Pk ?`A(S')?1e
--H JIAT&v
where AN = (Ak,... ,Ap), 5N' = diag(S?',...,Sp'), and lvii ? x?ii(S?')?1eii.
Thus,
IIPkbII < !`xAiiAkii ii(S?')?1eii.
This means that there exists an j ? Jk ??? U Jp such that
< ?`#kAiiAii
--H IIPkbII
Now, fix the specific k of Type II such that a = ak, and select
1' =			?IIPkbII
lOn3(x? + 1)??gn??A??#7?			(56)
Then for all ?ll < ?1? combining the inequality in the last paragraph with
Lemma 8, there is a j such that
s,(?11) <
--H 10n1?5(xA + 1)gn?S
44
(57)
Dividing (55) by this inequality, yielding the result that for all ?? <
sj(?11) > 5gfl
sj(?ll) --H
This shows that (45) holds for a crossover event.
On the other hand, i E Ji ??? U Jk and j ? Jk ??? U Jp, so initially
Sj < Sjyfl Arguing as in Case I, si(?) < 3gflsj(?) Thus, an event has
occurred.
Now, let us compare j' in (56) to ??/C(A). Observe from (20) that
< 30fl1?50kXA IIPkbII
so
?.IIPkbII
1 = lOn3(xA+l)xAg?IIAII#m?
--H 3OOn45(xA + l)xAxAIIAIIg?Ok#m?
300n45(xA + 1)2xAIIAII92?#m?
Thus, we must pick
C(A) > 300n45(xA + 1)2??j?A?jg2fl?
to ensure that ??/C(A) ?
CASE 3, OL = Pk,1, where k is Type I, 1 is Type II, and k> 1.
From (53) in the Case 1 analysis, we conclude that for any It' ? It, there
is a j ? Jk ??? U Jp such that
Sj(It') < 2302It'XA0kfl2
It
On the other hand, from (55) from Case 2 applied to 1, that there is an
i E Ji U . UJj such that for all It' < It?
5i(It') --H 2(xA + 1)n1?5#m?
45
Thus, if we pick
1' =
20 3020k(kA + 1)???n4.5gn??			(58)
and apply Lemma 8, then for all ?" < `t' we obtain
sj(?11) > 5gfl
sj(jt11)
Ontheotherhand s ?s5/gbecausei?jjU...UJ1and5EJi?U...UJ?,
and k> t. This means also that the much weaker relation sj(j) < 3gfl3j(jt)
holds. Thus, an event has taken place.
Again, we compare ?` from (58) to ?d/C(A). Note that a ? 30X?A0Ifl2/?
by (21), so we have the following bound:
1,
20 3O20k(xA + 1)??n4.5gn??
--H 20 3O20k(xA + l)??n4.Sgn?? 30XA0?fl2
--H 20 3O30k01(xA + 1)XA2fl6?5Y?#m?
--H 20 3O3(X?A + l)??2n6Sg3fl4m+?
Thus, we must pick
C(A)>20.303(x?+1)x/?2n6?5g3? yTTh?
for Case 3.
We now explain how to pick the two parameters g and C(A) taking into
account all constraints. First, g must satisfy (26), (32), and (52). Of these,
(52) dominates, hence we may pick
g = 2 302(1 + ?)n2x?A			(59)
or anything larger. Second C(A) has three lower bounds given by the three
cases of the previous proof; the third dominates except for the xAIIAII factor
in the second. Note that xAIIAII > XA, hence we pick
C(A) = 20 3O3(XA + 1)2x?IIAIIn65:g3?YfT? (60)
to satisfy all three cases. I
46
This theorem shows that Algorithm LIP has finite termination; in partic-
ular, it must terminate after n(n --H 1)/2 main-loop iterations, since each
main loop iteration forces at least one crossover event to occur. These
crossover events are clearly disjoint because the central path parameter de-
creases monotonically during the algorithm.
We now make some remarks on the choice of CL in Algorithm LIP. up to
now, we have assumed that CL is chosen to be equal to CL defined in Section 5.
As an alternative, suppose we picked CL as follows. First, identify CL0 50 that
y --H (1 --H ?)r is infeasible for any CL < CL0 but feasible for CL ? [CLo, 1]. This CLo
can be identified with a simple ratio test. Let CL1 = max(?o, 0). Now find an
CL E [CLi, 1] such that
but
--H (1 --H ?)r, ijCL) ? 0.75
(61)
--H (1 --H ?/2)r,i?/2) > 0.75 or CL/2 < CLi. (62)
From this CL we clearly maintain proximity to the central path by (61).
On the other hand, (62) implies that CL/2 < CL by Theorem 1. Thus, if
we insert an additional factor of `2' in the formula for C(A), then we can
guarantee that Theorem 2 will also go through. Thus, all of the theorems
proved so far still hold if we determine CL in Algorithm LIP with some kind
of a line search for a solution in [CL1, 1] to conditions (61), (62) (and treating
CL = 0 as a special case). Notice that this eliminates the need for computing
....... , P?band r(i),... , r??? that were used to determine CL, presumably a
substantial savings in practice.
The points t satisfying ?(y --H (1 --H t)r,it) = 0.75 are roots of a certain
polynomial in 1, so presumably we could apply polynomial root-finding tech-
niques to solve (61), (62). We have not worked out the complexity of this
procedure.
As mentioned in the introduction, a consequence of the validity of Algo-
rithm LIP is a new characterization of the central path as being composed
of 0(n2) alternating curved and straight segments.
Corollary 3 Consider an instance of (1) that is bounded and has a strictly
interior feasible point. Let AJ > 0 be arbitrary. Then there exist a sequence
of k+1 breakpoints
o =			? Iti ? !2<?.. <			= M
47
such that k <n(n --H 1)/2 and such that:
1. For all i c 10,..., k --H 1) such that i is even, there exists vectors Uj, Vj
such that ?(ui + jivj,?) < 0.75 for all't c (ii,iti+i) In other words,
the central path is approximately a straight line segment over this range
of parameters.
2. For all i E 10,..., k --H 1? such that i is odd, jt?+i/jt? is bounded above
by a constant C(A) depending only on A.
from Theorem 1 and Theorem 2. Intro-
and end of each LLS step of Algorithm
PROOF. This follows immediately
duce breakpoints at the beginning
LIP, and the corollary follows. I
Since the statement of this corollary makes no reference to any specific
algorithm, it seems that the corollary ought to have a proof that is algorithm-
independent. However, we do not know of a direct proof.
A natural question to ask is: if the central path is approximately straight
over intervals [!i, ?i+i] for i even, then shouldn't a traditional path-following
interior point method be able to take larger steps and achieve the same com-
plexity as the LIP method? The problem is that the tangent to the central
path at y(ii+i) (which is the direction followed by traditional algorithms)
does not necessarily "point" in the right direction. This is most clearly seen
on the last LLS step of Algorithm LIP, when the global minimum is found.
The tangent to the central path at y(?) is never precisely aligned with the
displacement vector between the y(?) and the global minimum y* for any
positive't. In contrast, the LLS step r is precisely this displacement.
,T Complexity of Algorithm LIP
We can now estimate the complexity of the algorithm. There are a total of
at most n(n --H l)/2 LLS iterations. There are O(?log C(A)) small steps
per LLS step. The small steps dominate the work, and the total number of
small steps is O(n2?5 log C(A)). If we use (60), then
logC(A) = 2logx??+log(x?jA)+logn+3nlogg+const.
48
The dominant terms are 3nlogg and log(x??iA??). Note from (59) that
logg = 2logn +logX?A + const.
Keeping the dominant terms, we see that the overall number of small steps
is
O(n35(logn + log(x? + 1)) + n25log(x?jAi))
Let us define
c(A) = logn + log(x? + 1) + log(x??jAJJ)/n. (63)
Then we can write the complexity as O(n3?5c(A)) iterations, where in this
context ?? means either a small step or an LLS step. Each itera-
tion requires solution of a system of linear equations, which takes O(mn2)
arithmetic operations.
It should be noted that this complexity bound is independent of the size
of jt that is given as input data to Algorithm LIP. This is an important point
for the initialization procedure in the next section.
8 Initialization
We now examine the question of initializing Algorithm LIP, that is, given
A, b, c, find an approximately centered y, ji. The procedure used is as follows:
we start with a basic subset of constraints whose analytic center is known in
closed form. We then add the constraints one at a time, each time running
Algorithm LIP to find the analytic center of a new polytope. After computing
analytic centers of a sequence of n --H m polytopes, the last polytope is the
feasible region of the problem (plus some extra, "dummy" constraints). From
this point we then run Algorithm LIP. At first glance, it seems that the
running time of initialization will be a factor n --H m larger than the running
time of Algorithm LIP. With a more careful analysis, however, we are able to
show that initialization has the same complexity (up to a constant factor) as
Algorithm LIP. We call this algorithm "cutting-plane initialization," which
is similar to an interior-point column generation algorithm (for example, see
Goffin et al. [5]).
We first comment on a more standard approach to initializing interior
point methods, namely, adding a dummy variable. For example, a common
49
technique is to replace the inequality constraint in (1) with ATy + tq > c
where t> 0 is a new variable and q is a vector with all positive entries. Note
that this problem always has a strictly feasible point; simply take y = 0 and
t sufficiently large. The difficulty with the approach, as far as our analysis is
concerned, is that we do not know how to analyze XA' or XA' in terms of A,
where
A' =
We now describe cutting plane initialization. We generate a sequence
of polytopes Fm,Fm+1,... , Fn. Polytope Fk in this sequence has k + m
constraints, of which k are from the original problem and m are
We have a containment relationship Fm ? Fm+i ??? 0
We start by describing Fm. Select a set of m linearly independent columns
of A arbitrarily, which we will assume are numbered ..... , m. Now define
Fm = ?y : ?T?y > ??,?T?y < NeJ.
Here Am is notation for (a1,... , am), that is, the first m columns of A, and
N is a large positive number determined below.
Recall that the analytic center and approximate analytic center were de-
fined by (39) and (40). For polytope Fm, because of its simple structure, we
can compute the analytic center in closed form. It is precisely the solution
to the square linear system
= ?21(cm + Ne).
(Hence, Fm is full dimensional.)
The sequence of polytopes we work with is ..... . , Fn, where
Fk = fy: ?T?y > c?,?T?y <Ne).
Let Ak be the constraint matrix for this problem, that is, Ak = [--HAm, Ak]
and let ?? be the right-hand side, that is Ck = (?NeT, cTk)T. Thus, Fk =
ATky >
Now we describe an incremental procedure: given an approximate ana-
lytic center for Fk, the procedure returns an approximate analytic center for
Fk+1. The main idea of the procedure is that the analytic center of Fk+1
50
always lies on the central path for Fk if we use the objective function in (64)
below.
More specifically, the procedure is as follows. Let Yo be an approximate
center of Fk, let s = AkYo --H ??, let S = diag(s), and let f be chosen so that
Ak1S?1f --H 0 If --H ejj < 0.75. Solve the nonsingular square linear system
AhS?2ATkw = --Hak+i.
This system is nonsingular because Ak contains a basis Am. Let w' =
S?lATkw so that AkS?iwI = --Hak+i, and let ? = looliw1il. Then we have
--H IoA?s?ie = It0AkS?1(f --H e + w'/?0)
and
Ilf --H e + w'/?oII < If --H eli + IIw'/itoII
< 0.76.
Thus, Yo is approximately centered for LP problem
T
minimize			ak+iY
subject to 4kY > ??.
(64)
with central path parameter ?o. The centering distance can be improved
from 0.76 to 0.25 with three Newton steps. Now we apply Algorithm LIP to
problem (64)--Hnote that we have an initial approximately centered starting
point.
We run Algorithm
find an approximately
LIP, not necessarily to completion, but until we can
centered point y with parameter ? such that
aT?+iy --H Ck+1 = It.
(65)
have
We search for such a y as follows. If, initially, when LIP begins, we
aT?+iyo --H Ck+1 > Ito, then we do not run LIP at all; we merely keep the same
value of Yo and increase Ito 50 that (65) is satisfied. The computations above
show that Yo remains approximately centered for this larger parameter value.
The other case is that aT?+iyo --H c?+? <Ito. Then we run Algorithm LIP
until we detect two approximately centered points (either before and after
an LLS steps or before and after a small steps) such that
T
ak+iYi --H Ck+i <Iti
51
(66)
and
T			--H Ck+1 > It2			(67)
ak+iY2
with Iji > ?2 and Yi, Y2 approximately centered for tLi, !2 respectively. Now
we solve the following linear interpolation problem for p c [0,1]:
a?T+i(pyi + (1 --H P)y2) --H Ck+i = ?iP + !2(1 --H
Wetake? = ?iP+it2(1--HP) andy=y1P+y2(1--HP)for (65). Thefactthat
this choice of y is approximately centered follows from Theorem 1 in the case
that ?i, !2 are the parameter values before and after an LLS step. In this
case, the centering distance is 0.75. If the transition from ?i to ii2 occurs
during one of the three Newton steps for recentering the LLS solution, then
again the point is approximately centered with distance 0.75. Finally, if the
transition from (66) to (67) occurs in a small step, then conventional interior
point theory tells us that interpolating between the steps (which amounts
to truncating a certain Newton step) preserves approximate centering with
distance 0.25.
Once we have a solution for (65), then we have an approximate analytic
center y for Fk+i. This is because the approximate centering condition of y
for (64) means that
with Ilvil < 0.75, i.e.,
S-1			0
0			?-`
--H 4?S?1e = AkS?1v
e=(A?,a?+i) ?O1 i0?i
Since ?--H? is precisely the reciprocal of the slack a?T+iy --H Ck+i by (65), this
latter equation means that y is an approximate analytic center for Fk+1.
We call this procedure one "phase" of the cutting plane algorithm. Thus,
there are n --H m phases total. We have not addressed the case that (67) is
never satisfied during Algorithm LIP, that is, even at the optimum point y
for (64),
T			--H Ck+1 <0.
ak+1Y			_
If this inequality is strict, then clearly Fk+i is infeasible. We claim the original
LP (1) is also infeasible, provided that N is chosen sufficiently large. Instead,
52
we prove the contrapositive: suppose the original problem (1) is feasible. By
assumption of full rank, there is a basic feasible solution to (1), that is, an
index set for a basis B and a point y such that ATBY = CE and ATy> c
Observe that ATmy --H ATmABTcB, hence
IIAmTyII = IIATmABTCII
? IIATmAB?TII
= IIA??1AmII
< xAIICII
lid
lid
The last line follows from Lemma 4. Thus, we choose N > xAIIciI, and we
are guaranteed that ATmy <Ne, hence y ? Fk+1.
Another possibility is that at the optimum point y for (64),
T
ak+1Y --H Ck+1 = 0.
This means that this equation must hold for every feasible point of original
feasible region. Moreover, if this equation holds, and other constraints are
satisfied as equalities at the optimum of (64), then these inequalities are also
satisfied as equations at every feasible point of (1). These equations may be
eliminated by eliminating some variables. Then the cutting-plane algorithm
continues on a lower dimensional problem. Note that y is an approximate
analytic center for the lower-dimensional problem by Corollary 2.
An alternative to eliminating variables is as follows. Suppose we run
Algorithm LIP for phase k to completion, discovering that
T			* --H Ck+1 = 1 = 0
ak+iY			(68)
From Corollary 2 we know that y* is approximately centered in the optimal
face. Let Aeq be the set of constraints that
A?Tqy* = eeq.
Note that LIP also generates a primal optimizer x* with
Aeqx?q = --Hak+i, Xe*q> 0
Then, we must have that for every possible feasible point y of the original LP
problem, this set of constraints, plus ak+i, must be satisfied as equalities. Let
53
A0 = (Aeq, a?+i) and let the rest of Ak be Ak+l, and let the corresponding e
be partitioned as e0 and Ck+1, respectively. Then we have a degenerate Fk+1
given by
Fk+1 = fy: Aty = c0, ATk+ly >
Let xt = (xeq, 1); then x0* >0 and A0xt =0. One can verify that y* is
still an approximate analytic center in Fk+1. In fact, y* satisfies
A?+i(S*)?1(e + v) E Range(Ao) for some v
and II?II < 0.75 and s* = ATk+ly* --H ??+i. This is the centering condition for
degenerate polyhedra. Now we continue to phase k + 1 of initialization as
we did before, starting from y* Ei Fk+1. During this process, we may detect
infeasibility of the original LP and stop.
Since Algorithm LIP assumes nondegenerate polyhedra, we must use a
generalization of Algorithm LIP that involves one additional layer, J0. The
constraints indexed by J0 have all their slacks identically zero throughout
the algorithm. Thus, this layer has highest "priority" in the LLS step; the
diagonal weight matrix for this layer in the LLS step can be arbitrary since
the least-squares problem has an exact (zero-residual) solution. This layer is
always of Type I and always has oo = = 0. It can be proved that all of
the theory developed so far for Algorithm LIP works for this generalization.
Indeed, degenerate polyhedra are the limits of full-dimension polyhedra as c
is changed, but the bounds proved for Algorithm LIP are independent of the
choice of c.
After phase n of initialization, the feasible set Fn of the original problem,
if it is not empty, is represented by
Fn = fy: Aty = e0, ATny >
where A0 is the set of constraints active at every feasible point. Note again,
we have a y*, which is an approximate analytic center for Fn, and an k0* such
that
Ao::t=0, k0*>O.			(69)
If Fn is a singleton, we may stop and y* is the unique optimizer. Otherwise,
we move to the optimization phase solving
minimize
subject to			(70)
bTy
Aty = co,
A?Ty > C?.
54
Applying?Algorithm LIP, we generate y* on the optimal face. Suppose in
addition to Aty* = c0 we have AT?y* = ??, then we have a primal optimizer
(x0*, XB*) with A0x0* + ABXB* = b and XB* > 0. Since xt may not be positive,
(x0*, xB*) may not be a feasible solution for the primal. However, we can use
(69) and redefine
x0* x0* + Pk0*> 0
for ? large enough. Then we have (xt, XB*) > 0 and maintain
AOxO*+ABxB*=b.
Thus, we generate a feasible and strictly complementary solution for the
primal (4).
In general, when the initialization procedure terminates, we are left with
an approximate analytic center for Fn, which is a subset of the feasible region
for the original problem. We claim again that if N is at least xAIIcII, then
the optimal value of minimizing bTy over F? is the same as the optimal
point of bTy in (1), unless the original problem is unbounded. Here, the
reasoning is the same as earlier--Hif the original problem has a lower bound
on the objective function, then there is an optimal basic solution.
In practice, there is an technique that makes the cutting-plane algorithm
much more efficient. We make the observation that if the new constraint
T evaluated at the			center of			has a
?k+iY --H Ck+1,			approximate analytic			Fk,
sufficiently large and positive slack, then it is not necessary to use Algorithm
LIP to find the approximate analytic of Fk+1. Instead, it suffices to use a
constant number of Newton iterations starting from the old center. This
technique does not seem to reduce the worst-case complexity, however.
Now we analyze the complexity of the cutting plane algorithm. First,
observe that we do not work with the original A in this algorithm. We
instead work with Ak for some k. In this regard, we have the following
lemma.
Lemma 11 For all k = 1,... ,n, x(Ak) < 2XA and x(A&) < 4XA
PROOF. First, we address x(Ak). Let b be arbitrary, D ? D arbitrary, and
consider a solution to
A?DAT?y = AkDc.
55
The goal is to obtain a bound on Ilyl in terms of Ilcil. This equation may
be written in block form. Recall that Ak = [?Am, Ak]. Partition D =
diag(D1, D2) and c = (cf, c2T)T conformally. Then
AmDiAmTy + AkThAkTy = -A?D1c1 + AkD2c2.
Let D1 be an n x n matrix that agrees with D1 in positions indexed by Am,
and has a small c> 0 in other diagonal positions. Similarly, let D2 agree with
D2 in positions indexed by Ak, and let it have ? in other diagonal entries.
Finally, extend c1, c2 to vectors c'1, c'2 with n entries by filling in zeros. Then
we have
AD1ATy + ??2AT?y N -AD1C1 + AD2C2
where the approximation may be made arbitrarily close as ? gets small. Let
D = D1 + D2; then we have
-AD[D?1(-D1c'1 + D2c'2)].
Now let c = D-1(--HD1c'1+D2c'2); note that this vector depends on ?. Observe
that because D- is the sum of Di and D2, both positive, the scaling matrices
D-1D1 and D-1D2 both have positive diagonal entries bounded by 1. Thus,
Ilcil < Ilcill + 11c211, so Ilcil < 211c11. Since y solves
ADATy ADc
and the matrix on the left is nonsingular as ? 0 (because D has a strictly
positive entry bounded independently of e in positions corresponding to Am,
a basis),?then by definition II?II < xAIIcII < 2xAIIcII. Thus, we have proved
that x(Ak) < 2XA
Now for the second claim in the lemma, we follow the same argument
to conclude that IlATYIl < 2xAIIcII. Since IIAkTyII = jj?A?Ty + AT?yj <
2IIATyII, we conclude that
j?A?Tyj ? 4x??IIcII.
This proves the second claim in the lemma. I
As noted above, during the cutting plane algorithm, it may become nec-
essary to either eliminate variables or work with a degenerate polyhedron.
56
Eliminating variables is a further change to XA, XA that also can be bounded.
We do not provide those bounds, however, because the other alternative
(working with a degenerate polyhedron) does not require changes to A.
This lemma shows that the total complexity of the n --H?n phases in the
cutting-plane algorithm is no worse than n --H m times the complexity of the
LLS method. We can do better than this, however; we can prove that the
factor of n --H m may be omitted and that the total work of the cutting plane
method is bounded by a constant multiple of the work of the LIP method.
The reason for this is as follows. Suppose a crossover event takes place
during some phase of the cutting-plane method. We claim that this event
is "permanent" in the sense that the same event can never recur during any
other phases of the cutting-plane method, nor during the final LIP solution
of (1).
Examining the proof of Theorem 2, we see that in all three cases, when a
crossover takes place between two indices i and j, the slack 5j that remains
large is determined by Lemma 10 and the constraint that is made small must
remain small by Lemma 8. Lemma 10 holds even in the case of the cutting
plane algorithm because it relies on the residual of a layer of constraints Jk
in the current phase. As additional constraints are added by the cutting
plane algorithm, these constraints in Jk remain in the problem, so additional
constraints do not change the main idea of the lemma.
On the other hand, Lemma 8 apparently may not hold for the cutting-
plane algorithm. llowever, we can prove a new version of the lemma. Let
n = m + n. Note that n denotes the number of constraints in Fn.
Lemma 12 Let (y(?), ?) be a point on the central path encountered in phase
k of the cutting-plane initialization procedure. Let (y', ?`) be a central-path
point either in the same phase, in which case we assume ?, or in a later
phase, in which case ?` is arbitrary. Let the slacks be s(it), s' respectively.
Then for anyi ?rn+k,
5. ns..
PROOF. In the case that the two points occur in the same phase of cutting
plane algorithm, the result is obvious by Lemma Sin this case the bound
holds withm+k ? 8On?.
57
In the other case, during phase k, we do not have y(?) but rather an
approximate central path point ? such that
< 0.75.
In this case, a variant of Corollary I that estimates the left-hand product of
(7) more carefully shows that for all t ...... ,`n + k we have
0.055i < sj(?) ? 4s?			(71)
where s is the slack at ?.
Note that y satisfies the approximate central path equation
AkS?1f =
with
If --H eli < 0.75.
This relation can be written as
AkS?1f + a?+i/? = 0.
Thus, ? is also an approximate analytic center for the system
= fy: A?Ty > ??, aT?+iy > aT?+iy --H
with the centering distance bounded by 0.75. Let yF be the analytic center
for F(ji) with slack sF Ei ?m+k+1 Then, for all i = 1,... , m + k we again
have
<sF. <
0.055i --H			--H 4si.			(72)
Note also that (sF)?le is in the null space of (Ak, a?+i).
Observe that aT?+i y --H < Ck+1, because we are assuming that phase k
has not terminated yet. Thus,
Fk+1 = fy ATy > ??, a?T+iy > ck+i? C F(?).
Now, let y' be any point in F(?) (which includes any central path point in
Fk+1,. . . ,Fn), and let its slack be s' > 0. Then we have
II(sF)?1s1II < eT(sF)?lsI = eT(sF)?1(s??sF+sF) = eT(sF)?lsF = m+k+1,
58
which implies that
st' ? (m + k + l)stF, for i = 1,.. .,m + k+ 1.
Thus, from (72) and (71),
< (m + k + 1)USffi) ? (m + k+ l)(8Osj(jt)), for i = 1,.. .,m + k.
Note that m + k + 1 < n = ni + n. This concludes the proof. I
Thus, we can reprove Lemma 10 and Theorem 2 using Lemma 12 instead
of LemmaS to get the same global bound of n(n--H1)/2 on the total number of
crossover events. Because of the weaker constant in Lemma 12, the bound in
(46) changes by a constant factor, and the definition of C(A) in (60) changes
by a constant factor. This means that c(A) changes by a constant additive
factor, and n is replace by n, which is bounded by 2n. Thus, the running
time is changed by a constant factor.
9 Integer data and Tardos' theorem
In this section we specialize the LIP algorithm to the case that all of the
entries of A are integers. Suppose that the total number of bits required to
write A is LA. We have the following result (similar results can be found in
Tuncel [25] and Vavasis and Ye [29]):
Lemma 13 The parameters XA and XA are both bounded by 20(LA)
PROOF. We follow Todd's proof of the finiteness of XA, XA Specifically, con-
sider solving a weighted LS problem of finding y to minimize IID(ATy --H c)It.
The hyperplanes afy = c1,..., aT?y = c? partition R? into an arrangement
of convex polyhedral cells, some of which are bounded and some of which are
infinite. The solution y to the weighted LS problem must always lie in one of
the bounded cells. This means that it is a convex combination of vertices of
the bounded cell. Therefore, the point y is bounded in norm by the norm of
a vertex in this arrangement. A vertex v satisfies ATBv = cB for some basic
set of constraints. Thus,
XA < supf?jA?T? : B indexes a basisi.
59
This in turn is bounded by 20(LA); see, for example, Lemma 3.1 in [26].
Similarly, XA is bounded by IlAlIXA, so we obtain a bound of the same order.
This leads to a new proof of the following corollary, which is a well-known
result due to Tardos.
Corollary 4 [23] Consider solving an LP of the form (1) on a Turing ma-
chine. Suppose that A, b, c have integer entries, and suppose that LA is the
number of bits to write m x n matrix A. Then there is an algorithm to solve
this problem in polynomial time, and furthermore, the number of arithmetic
operations is polynomial in m, n, LA.
PROOF. We apply Algorithm LIP to the LP. Each LLS step and each small
step is carried out in exact rational arithmetic, but after the step, we truncate
y to 0(L) bits, where L is the total number of bits in A, b, c. The step
involves solving linear equations, which can be done in polynomial time [4,
26].
It can be proved that this truncation preserves approximate centering;
see, for example, Chapter 3 of [26]. The running time is O(n3?5c(A)) interior
point steps, and from (63) and the lemma we see that c(A) = O(LA t log m).
10 Application to flow problems
We can apply Algorithm LIP to flow problems such as max-flow or min-
cost flow. Let the number of arcs and nodes in the network be m and n
respectively. (For this section only, this represents a different use of "m"
and "n".) For flow problems, Algorithm LIP is strongly polynomial-time
because the result in the last section implies XA, XA < 20(m) and hence c(A)
in (63) is polynomial in m, n. In fact, an even better bound is possible for
XA, XA in the case of minimum-cost flow. Recall that the minimum cost flow
problem is to find the lowest cost flow through a network satisfying flow-
balance constraints at every node of the flow, and satisfying upper and lower
bound constraints on the flow on each arc. The arc cost is a linear function
of the flow through the that arc, and the total cost is the sum of the arc
costs.
60
Let us write the flow problem in primal form. Let A be the node-arc
incidence matrix, let 1 denote the arc lower bounds and u the upper bounds.
If we let x be the values of the flows minus the lower bounds, then we have
the following problem to determine x:
minimize cTx
subject to A(x + 1) = 0,
x + z = u --H I,
x>O z>O
The vector z here represents the slack, that is, the residual capacities in the
arcs.
Note that for every basis B of A, B-1 is a matrix whose every entry is
either 1, --H1 or 0 because A is a node-arc incidence matrix [18, sec. 13.2].
Now consider the min-cost flow constraint matrix:
A
Note that A is an (m + n) x 2m matrix. We show that
< O(rnn).
From the proof of Lemma 9 in the previous section, we see that it suffices to
analyze basic submatrices of A.
Consider any basis B of A. Since B is an (m + n) x (m + n) matrix, B
must contain n + k, k > 0, columns from?the first columns of A, and m --H k
columns among the last m columns of A. Thus, B can be written as
BN			0
o			1k			0
DB			0			1m-k
where DB is an			--H			x n such that each row has exact one 1 and the rest
are 0's, 1k is a k x k identity matrix, and B is a basis for A. Consider solving
the system of linear equations with
BTy =
61
or
cl
c3
To solve this system we first obtain
and then we solve
and finally we have
Note that
and
y3 = c3,
BTYi = e1 --H DBTy3 = e1 --H DBTc3,
y2=?2??Ty1
11Y311 < Ilcil
Ilyill < O(rn)IIcII
11Y211 < O(m$?n)IIcII.
Thus, we must have
II?II = J(y?,y2T,y3T)T?? < O(rn#n) lie < O(m#n) Itch.
Therefore, x(A) < O(m#n) and
<--H sup????Ty?J/JJcJ?J < O(rnn).
Therefore, the overall complexity for Algorithm LIP applied to the min-
cost flow problem is O(m3?5 log rn) iterations, where each iteration requires
0(m3) arithmetic operations. According to Ahuja, Magnanti and Orlin [lj,
the best currently-known strongly polynomial algorithm for this problem is
O(m2logn + rnn(logn)2) and is due to Orlin [17]. Thus, the LIP method
needs considerable improvement in order to reach this bound. On the other
hand, in practice, interior point methods usually perform much better than
their worst-case bounds, so we suspect that Algorithm LIP could be com-
petitive in practice, especially if a good method for solving the least-squares
problems is used.
62
11 Conclusions
We have proposed the first known interior point method whose running time
depends only on A. This is attained by including a step called the "LLS"
step that accelerates the convergence. There are many open questions raised
by this work. Here are some of the more interesting questions.
1.
The biggest barrier preventing Algorithm LIP from being fully general-
purpose is the fact that we don't know how to compute or obtain good
upper bounds on XA or XA There are several places in our algorithm
where explicit knowledge of these parameters is used the most crucial
use of this knowledge is in the formula for g given by (59), which
determines the spacing between the layers. Note that we do not need
to know these parameters exactly--Hgood upper bounds will suffice.
The only known algorithm for computing these parameters is implicit
in the results of Stewart [22] and O'Leary [16] and requires exponential-
time. We suspect that computing them, or even getting a good upper
bound, may be a hard problem.
In the absence of an algorithm to compute them, one possibility would
be an LP algorithm that makes a low guess at g, and then increases the
guess as more knowledge of the parameters is gained from the running
of the algorithm itself. We have not developed such an algorithm.
A very straightforward "guessing" algorithm was suggested by J. Rene-
gar. Simply guess XA = 100/11 All and XA = 100 and run the entire
LIP algorithm. If it fails in any way, then guess XA = 1002/liAll and
XA = 1002 and try again. Repeatedly square the guesses, until Algo-
rithm LIP works. (Note that, since Algorithm LIP produces comple-
mentary primal and dual solutions, we have a certificate concerning
whether it terminates accurately.) This multiplies the running time by
a factor of loglog(x???A??).
Another possibility would be to obtain bounds on these parameters
for special classes of linear programs that occur in practice. For ex-
ample, we carried out this process for min-cost flow problems in Sec-
tion 10. Other good candidates for special-case analysis would be lin-
ear programs arising in scheduling, optimization problems involving
63
2.
3.
4.
5.
finite-element subproblems (see example 3 of Coleman [3]), and LP
relaxations of combinatorial optimization problems.
For floating-point number computations where one usually is inter-
ested in computing an approximate solution rather than the global
optimum perhaps there is a different strategy for picking g. For in-
stance, perhaps g should be picked in a manner depending on machine-
epsilon. This issue needs further investigation.
Another very important issue for practical application is the best way
to compute the LLS direction r. This computation can of course be
reduced to solving linear equations, but there are many different ways
to solve linear equations in interior point methods, and some are better
than others see [30].
Although our algorithm has primal-dual aspects, it is principally a
dual algorithm. The step of the algorithm that is most "dual" is the
computation of &, which uses no primal information. Perhaps it is not
a coincidence that this is also the least elegant part of the algorithm.
We suspect that there may be a fully primal-dual algorithm that uses
primal and dual information symmetrically and gives a simpler formula
for c.
Although our upper bound on the total number of crossover events
is 0(n2), we have not been able to construct any examples that have
more than cn crossovers. This raises the possibility that there really
are no more than 0(n) crossover events, but we do not know how to
prove this. Furthermore, another factor of n in the final complexity
bound comes from g3fl appearing (60). The fact that we need C(A) to
be proportional to g0(n) rather than g appears to be an artifact of our
analysis rather than an actual aspect of the complexity.
6. As mentioned earlier, we do not know how to analyze the complexity
of an initialization procedure that involves introduction of a dummy
variable. An alternative initialization procedure gaining popularity is
the so-called "infeasible" interior point method (for example, see Lustig
et al. [11]). It would be interesting if there were a layered version of
these algorithms.
64
7.
8.
As mentioned in Section 3, there is a relationship between XA, XA and
condition numbers of weighted-least problems as demonstrated by [27].
Thus, our new results can be thought of a link between condition num-
ber and complexity of iterative processes (also see [25] and [29]). Such
links are well-known in numerical analysis; the classic example is the
conjugate gradient algorithm (see [6]). Renegar [20] is currently also
developing a theory connecting complexity and conditioning for interior
point methods. His setting, however, is more general than ours (gen-
eral convex constraints in Banach spaces are allowed) but his condition
measure involves b and c as well as A.
Although XA, XA act as condition numbers for weighted-least squares,
it is not clear to us whether they act as condition numbers (that is,
measures of the distance to ill-posedness) for LP problems like (1).
Can the analysis be extended to convex quadratic programming? It is
not clear how the Hessian matrix H of a quadratic objective function
ought to enter the complexity. For the monotone linear complementar-
ity problem: s = Mx t q, s > 0 x> 0 and xTs = 0, we expect that
the running time of LIP depends only on M.
References
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: The-
ory, Algorithms, and Applications. Prentice-Hall, Englewood Cliffs, New
Jersey, 1993.
[2] D. Bayer and J. C. Lagarias. The non-linear geometry of linear pro-
gramming. AT&T Bell Laboratories preprint, 1986.
[3] T. Coleman. Large-scale numerical optimization: introduction and
overview. In A. Kent and J. G. Williams, editors, Encyclopedia of corn-
puter science and technology. Marcel Dekker, New York, 1993.
[4] J. Edmonds. Systems of distinct representatives and linear algebra. J.
Res. Nat. Bur. Standards, 71B:241--H245, 1967.
[5] J. L. Goffin, Z.-Q. Luo, and Y. Ye. Complexity analysis of a column gen-
eration algorithm for convex or quasiconvex feasibility problems. Faculty
65
of Management, McGill University, Montreal, Quebec, Canada, 1993,
1993.
[6] G. II. Golub and C. F. Van Loan. frtatrix Computations, 2nd Edition.
Johns llopkins University Press, Baltimore, 1989.
[7] C. C. Gonzaga. Path-following methods for linear programming. SlAM
Review, 34:167--H224,1992.
[8] N. Karmarkar. A new polynomial-time algorithm for linear program-
ming. Combinatorica, 4:373--H395, 1984.
[9]
[10]
[11]
[12]
[13]
[14]
L. G. Khachiyan. A polynomial algorithm in linear programming. Doki.
Akad. Nauk SSSR, 244:1093--H1086, 1979. Translated in Soviet Math.
Doki. 20:191--H194.
M. Kojima, 5. Mizuno, and A. Yoshise. A polynomial-time algorithm for
a class of linear complementarity problems. Mathematical Programming,
44:1--H26,1989.
I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience
with a primal-dual interior point method for linear programming. Linear
Algebra and Its Applications, 152:191--H222,1991.
L. McLinden. An analogue of Moreau's proximation theorem, with ap-
plications to the nonlinear complementarity problem. Pacific Journal of
Mathematics, 88:101--H161,1980.
N. Megiddo. Pathways to the optimal set in linear programming. In
N. Megiddo, editor, Progress in Mathematical Programming: Interior
Point and Related Method, pages 131--H158. Springer, New York, 1989.
5. Mizuno, M. J. Todd, and Y. Ye. A surface of analytic centers and
infeasible interior-point algorithms for linear programming. Technical
Report 1037, School of Operations Research and Industrial Engineering,
Cornell University, Ithaca, NY. To appear in Math. Oper. Res.
R. C. Monteiro and I. Adler. Interior path following primal-dual algo-
rithm. Part I: linear programming. Mathematical Programming, 44:27--H
41,1989.
66
[15]
[16] D. P. O'Leary. On bounds for scaled projections and pseudoinverses.
Linear Algebra and its Applications, 132:115--H117,1990.
[17]
J. B. Orlin. A faster strongly polynomial minimum cost flow algorithm.
In Proc. 20th ACM Symposium on the Theory of Computing, pages 377--H
387, 1988.
[18] C. II. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Al-
gorithms and Complexity. Prentice Hall, Englewood Cliffs, New Jersey,
1982.
[19] J. Renegar. A polynomial-time algorithm based on Newton's method
for linear programming. Mathematical Programming, 40:59--H94,1988.
[20] J. Renegar. Linear programming, complexity theory, and elementary
functional analysis. Unpublished manuscript, Department of Operations
Research and Industrial Engineering, Cornell University, 1993.
[21]
G. Sonnevend. An analytical center for polyhedrons and new classes of
global algorithms for linear (smooth, convex) programming. In Lecture
Notes in Control and Information Sciences 84, pages 866--H876. Springer--H
Verlag, New York, 1985.
[22] G. W. Stewart. On scaled projections and pseudoinverses. Linear Alge-
bra and its Applications, 112:189--H193,1989.
[23] E. Tardos. A strongly polynomial algorithm to solve combinatorial linear
programs. Operations Research, 34:250--H256,1986.
[24] M. J. Todd. A Dantzig-Wolfe-like variant of Karmarkar's interior-
point linear programming algorithm. Operations Research, 38:1006--H
1018, 1990.
[25] L. Tuncel. A pseudo-polynomial complexity analysis for interior-point al-
gorithms. Technical Report CORR 93--H16, Department of Combinatorics
and Optimization, University of Waterloo, Waterloo, Ontario, Canada,
1993.
[26] 5. A. Vavasis. Nonlinear Optimization: Complexity Issues. Oxford Uni-
versity Press, New York, 1991.
67
[27]
[28]
5. A. Vavasis. Stable numerical algorithms for equilibrium systems.
Technical Report 92-1280, Department of Computer Science, Cornell
University, Ithaca, NY, 1992. Accepted to SlAM J. Matrix Analysis.
5. A. Vavasis. Stable finite elements for problems with wild coefficients.
Technical Report 93--H1364, Department of Computer Science, Cornell
University, Ithaca, NY, 1993.
[29j 5. A. Vavasis and Y. Ye. Condition numbers for polyhedra with real
number data. manuscript, 1993.
[30] M. II. Wright. Interior methods for constrained optimization. In A. Iser-
les, editor, Acta Numerica 1992, pages 341--H407. Cambridge University
Press, Cambridge, 1992.
[31] Y. Ye. On the finite convergence of interior-point algorithms for linear
programming. Mathematical Programming, 57:325--H336, 1992.
68
