BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1388
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: An Interior Newton Method for Quadratic Programming
AUTHOR:: Coleman, Thomas F. 
AUTHOR:: Liu, Jianguo
DATE:: October 1993
PAGES:: 39
ABSTRACT::
Quadratic programming represents an extremely important class of optimization 
problem. In this paper, we propose a new (interior) approach for the general 
quadratic programming problem. We establish that our new method is globally 
and quadratically convergent - published alternative interior approaches do 
not share such strong convergence properties for the nonconvex case. We also 
report on the results of preliminary numerical experiments: the results 
indicate that the proposed method has considerable practical potential.
END:: CORNELLCS//TR93-1388
BODY::
An Interior Newton Method for
Quadratic Programming*
Thomas F. Coleman**
Jianguo Liu
TR 93-1388
October 1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*Research partially supported by the Applied Mathematical Sciences Research
Program (KC-04-02) of the Office of Energy Research of the U.S. Department of
Energy under grant DE-FG02-86ER25013.A000, and by NSF, AFOSR, and ONR
through grant DMS-8920550, and by the Advanced Computing Research Institute, a
unit of the Cornell Theory Center which receives major funding from the National
Science Foundation and IBM Corporation, with additional support from New York
State and members of its Corporate Research Institute.
**Computer Science Department and Center for Applied Mathematics, Cornell
University, Ithaca, NY 14853, USA.
***Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA.
AN INTERIOR NEWTON METHOD FOR QUADRATIC
PROGRAMMING *
THOMAS F. COLEMANt AND JIANGUO LIU?
Abstract. Quadratic programming represents an extremely important class of optimization prob
lem. In this paper we propose a new (interior) approach for the general quadratic programming
problem. We establish that our new method is globally and quadratically convergent --H published
alternative interior approaches do not share such strong convergence properties for the nonconvex
case. We also report on the results of preliminary numerical experiments: the results indicate that
the proposed method has considerable practical potential.
* Research partially supported by the Applied Mathematical Sciences Research Program (KC-
04-02) of the Office of Energy Research of the U.S. Department of Energy under grant D?FCo2-
86ER25013.AOOO, and by NSF, AFOSR, and ONR through grant DMS-8920550, and by the Advanced
Computing Research Institute, a unit of the Cornell Theory Center which receives major funding from
the National Science Foundation and IBM Corporation, with additional support from New York State
and members of its Corporate Research Institute.
Computer Science Department and Center for Applied Mathematics, Cornell University, Ithaca,
NY 14853, USA.
? Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA.
1. Introduction.
programming problem:
(1.1)
In this paper, we consider the following quadratic
= 2ixTHx + cTx)
subject to			Ax = b
1 < x <u,
where H E ???? is symmetric and, in general, indefinite; c E l??, A ? 1?mxn, b E 3?m?
I E [7Z u [--Hoci J?, tt E [? u ?ooJ Jfl, and, without loss of generality, 1 < u. When H
is indefinite, we are interested in locating a local minimizer and call such a minimizer
a solution to (1.1).
Problem (1.1) has been studied intensively due to its importance in optimization:
many real world problems are posed in the form (1.1). In addition, many algorithms
for general nonlinear programming require successively solving subproblems of this
form (e.g., "SQP" methods).
Surprisingly, problem (1.1) is very difficult to solve from a complexity point of
view. Obtalning a global minimizer is NP-hard; moreover, unless sufficiently strong
nondegeneracy assumptions are in force, Murty and Kabadi [16] establish that even
obtalning a local minimizer is NP-hard. Nevertheless, the efficient determination
of a local minimizer to (1.1), or at least a point satisfying second-order necessary
conditions, is often possible in practise. Computational complexity questions regarding
a bound on the number of steps, polynomial time guarantees, etc., are questions which
we do not address in this paper. Vavasis [21], for example, discusses computational
complexity as applied to quadratic programming and other optimization problems.
There are basically two kinds of existing approaches for solving (1.1) in the general
case. The most popular strategies follow an active-set or gradient projection philos-
ophy: a piecewise linear path is generated, following faces of the polytope defined by
(1.1), e.g., [1], [2], [3], [6], [9], [10], [11], [15], [22]). These methods usually generate
a finite sequence of intermediate "approximations". An alternative philosophy is to
generate an infinite sequence of strictly feasible (or interior) points, converging in the
limit to a local solution. There has been a recent resurgence of interest in such meth-
ods for linear programming, and to some degree quadratic programming, especially
the convex case, e.g., [12], [5], [8], [17], [23], [24]). Most recently, Ye [24] proposed
an affine scaling (interior) method for the general quadratic programming problem
(1.1). Ye's method generates a sequence of points converging, in the limit, to a fea-
sible point that satisfies first-order and second-order necessary optimality conditions.
The convergence rate of Ye's algorithm is believed to be linear.
In this paper, we propose an interior Newton method using the trust region idea
and a new scaling strategy. Our algorithm bears some similarity to Ye's in that the
computational cost of each iteration is almost the same and both methods are "in-
terior", i.e., all the iterates stay in the strict interior of the feasible region. But the
methods differ in three important ways. First, our proposed method has quadratic
convergence properties. Second, in the limit, the resulting linear system of our pro-
posed algorithm is better conditioned1. Third, the convergence properties of our
algorithm are established under weaker assumptions. Preliminary numerical experi-
ments indicate that the new algorithm is efficient. A related work for box-constrained
1 See the comment in Section 6 (Concluding Remarks).
2
minimization problems is given ill [5].
Conditions for a point x E ?? to be a local minimizer of (1.1) are well-known and
can be phrased as follows: Let
(1.2)			d=ef ? x E 1?? Ax--H b 1 <x <u 1
be the feasible point set. If x is a solution to (1.1), then ?w E ?m such that
(1.3)			feasibility:			x E ?,
(Hx + c + ATw)i = 0,			if ii < Xj < Uj
(1.4)			first-order:			(Hx + c + ATw)i > 0,			if Xj = ii
(Hx + c + ATw)i < 0,			if Xj = flj?
(1.5)			second order:			H > 0 in AT
where
(1.6)			AT=AT(x) d=ef			: As=O;s?=O, VicA(x)J,
and
(1.7)			A(x) <??			: xj=lj or xi=ui).
Conditions (1.3)-(1.5) are necessary optimality conditions. Sufficiency conditions
are: ?w E ?m such that
(1.8)			feasibility:			x E ?,
(Hx + c + ATw)i = 0,			if l? <Xj <flj
(1.9)			first-order:			(Hx + c + ATw)i > 0,			if Xj = ii
(Hx + c + ATw)i <0,			if Xj = Uj,
(1.10)			second order:			H > 0 in AT.
The multiplier w mentioned above is referred to as the Lagrange multiplier.
For a given pair 1 < x <t, w E )Z?, let
(1.11)			g = g(x, w) = Hx + c + ATw,
and define a vectors v E l?? as follows: for i = 1,..., n,
Xj --H Ii, Uj --H xi),			if ii > --Hoo or Uj < co
(1.12)			v(x)? =			min(			otherwise.
We also define a map Mx = M(x) : 9?fl			?? as follows: if `i' = Mx(?), then
(1.13)			?1i =
Let
(1.14)
Xj --H			if ?			<0 and Uj < oo
--Hmax(1,v?),			ff?			<0 and Uj = oo
x? --H4,			ff?			>0 and ii> --Hoo
max(1,v?),			ff?			>0 and 4 = --Hoo.
def
v = Mx(g).
3
Clearly,
(1.15)			0 < Vj < Ivil, Vi.
Let ".*" denote componentwise multiplication. It is easy to see that (1.3) and (1.4)
are equivalent to the nonlinear system
(1.16)
which implies
(1.17)
def			V.*9			--H
F(x,w) =			Ax--Hb --H
def
F(x,w) =			Ax--Hb =
Suppose x E int(?) def ?x E T : 1 < x < u?, w C gm are given. It is reasonable
to look for a direction (s, s'), with respect to the variable pair (x, w), with the foliowing
three properties:
(1.18) (i)
(1.19) (ii)
(1.20) (iii)
5 is a feasible direction with respect to :r, i.e., As = 0;
5 is a descent direction for q(x);
(e, s') is a Newton direction with respect to (1.16)
neighborhood of a solution.
or (1.17) in a
In the foliowing sections, we propose an algorithm which generates directions
satisfying (1.18) - (1.20). We establish global and quadratic convergence properties.
In Section 2, we motivate the basic generic method, algorithm Interior-Newton.
In Sections 3 and 4, we discuss the strong convergence properties of a particular
implementation of the basic method, i.e., algorithm Interior-Newton1. In Section 5,
we give an accelerated version of the basic method, algorithm Interior-Newton2, and
discuss results of preliminary numerical experiments. (In the Appendix we show that
the accelerated version of the basic algorithm maintains the theoretical convergence
properties of the basic method.) Finaliy, in Section 6, we have some concluding
remarks and observations.
2. The Basic Algorithm. The basic algorithm can be motivated in the
foliowing way. Let v be either v or v and F be either P or F. 5uppose x E int(?),
? are given such that at (x, u'), F is differentiable and VP is nonsingular. The
Newton direction with respect to F = 0 is defined as the solution to
(2.1)			VFp' =
where (Matlab notation diag(.) will be used to denote a diagonal matrix whose diag-
onal elements are the spedfled vector)
(2.2)			VP = diay(g) * (V?v) + diag(v)H diag(v)AT			I			P
A			0			?= Pw
a
System (2.2) leads to the approximation
Mp+ATp_ --g
(2.3)			Ap			= 0,
where
(2.4) M = II + (diag(v)?1 * (Vxv) * diag(sign(g))) * diag(IgI).
Motivated by (2.4) and in the interests of generality, let us assume that M is a
matrix of the form
(2.5)			M = H + D?2 diag(IgI).
Clearly, in some neighborhood of a solution where the sufficiency conditions hold,
diag(v)--H1 * (""xv) * diag(sign(g)) > 0, and (2.4) is recovered from (2.5) if D =
(diay(v)?1 * (Vxv) * diay(sign(g)))?21; however, for the remainder of this section we
assume only that M has the form (2.5) and D is a positive diagonal matrix.
Assume Z is a matrix such that nnll(A) =< Z >, where < > denotes the
subspace spanned by the columns of a matrix. When zTMz is nonsingular and A is
of full row rank, we may solve (2.3) as follows: first, we solve
(2.6)			zTMzy = ?Tg
for y, then
(2.7)			p = Zy,			and Pw = (AAT)?lA(Mp + g).
To ensure a well-defined descent direction, we solve
(2.8) m,infyTzTg + ?2'yTZTMZy : ID?1Zyj( < AJ,
and let p = Zy, where D is a positive diagonal scaling matrix and A E [A1, Au] for
some given 0 < A? < Au. Equivalently, the direction p is the solution to
(2.9) m?in ??(?) %?ef pTg + 21pTMp : Ap = 0, IID?'pII < AJ.
Note that subproblems (2.8) and (2.9) have the disturbing property that some
elements of M may approach infinity as some diagonal components of D go to zero.
However, substitution of the scaling matrix D can ameliorate this problem. Specifi-
cally, let Z be a matrix with orthogonal columns such that
(2.10) <Z >= nnll(A), where A = AD.
Assume that D-1Z has full column rank, we solve
(2.11) mp?in??(P)d??fPTZTg+ 1-TZTMZ% : lip < Al,
2
where
(2.12)
M = DMD = DHD + diay(IyI), and g =
Then
(2.13)			p=DZp.
Note that all the elements of M are bounded. Also,
(2.14)			?(p) =
It is well known (see, e.g., [20]) that if p is the solution to (2.11), then ?A > 0
such that
(2.15)			(zTMz + AI)p = zTg,
(2.16)			z?TA?z? + Al > 0,
and
(2.17)			A(A --H IlPil) = 0
By the definition of Z, ?Pw ? ?m such that
(2.18)			(M + AI)Zp + g + ATpw = 0,
or equivalently,
(2.19)			(M + AD?2)p+ g + ATPw = 0.
If (AAT)--H1 exists, then
(2.20) Pw = ?(A?AT)-lA( (M + AI)2p + g?),
which is the least square solution to (2.18).
When moving in the direction p, starting from a feasible point, a variable may
reach a bound. Therefore, to maintain strict feasibility and yet allow the solution
(which may have some of the variables at their bounds) to be approached asymp-
totically (sufficiently fast) we define a "step back" procedure as follows. For each k,
compute pk and ?k by a function stepback:
(2.21)
then,
(2.22)
where2
(2.23)
[pk, ok] = stepback(xk,pk,0k,1,u,T1,r2),
k+1 =?k +0kpk,
0kdef vk.*gk?+?k(pk)
1 + 11vk. * gk?? + ?k(pk)1,
and r1 E (0, 1) is given, and r2 = 1.
2 ?? Lemma 3.10, 0 = 0 if and only if the first-order and second-order necessary optimality condi-
tions are satisfied. So 0 is a good measure of the optimality.
6
The function stepback is defined as follows:
function stepback
fnnction [P, o?j = stepback(x,p,O,1,tt,r1,r2,k)
BI? = max?(1--Hx)./p,(u--Hx)./pJ,
= 1wi<n??BRj),
p = ma4r?,1--HO),
= min(r2,pP).
(2.24)
(2.25)
(2.26)			(if k = 0 , set p =
(2.27)
A possible way to update the multiplier is based on the fact that at a local mini-
mizer,
(2.28)			diag(v) (Hx + c + ATw) = v. * g = 0.
So if A diag(v) . ATis nonsingular, the multiplier can be expressed as
(2.29)			w = (Adjag(v)AT)?lAdjag(v)(Hx + c).
Hence we may compute
(2.30)			wk = --H(A. djag(vk) . AT)--Hi A diag(vk)(Hxk + c), Vk.
In this case,
(2.31)
g = Hx+c+ATw
--H (I --H AT(A . dia?(v) . AT)?lA . diag(v)) (Hx + c)
which is a projection of Vq(x) = H + c into the null space of A diag(v), and
diag(v?1). g is the orthorgonal projection of diag(v?21) . (Hx + c) into the null space of
A. diag(v?1).
Global convergence properties require sufficient reduction in the objective func-
tion. In general, the truncated step akpk, where pk solves (2.9) and ?k is determined
by function stepback, does not guarantee sufficient reduction; therefore, similar to
the strategy used in [5], a (truncated) scaled gradient computation is required to
"augment" the truncated step 0?kpk To this end let w solve the following equation:
(2.32)			AD2? = AD2(Hx + c + ATw) = 0,
which is satisfied if AD2AT is nonsingular and w is the solution to
(2.33)			(AD)Tw L=S --HD(Hx + c).
Let
(2.34)			y--H 9
IIgll'
and
(2.35)			_ ____
= ?Dy --H ?Dy
11911 `
7
where, to match with the formulation ill (2.11), ? is defined as a solution to the
following problem:
(2.36) ?mJ?fQ(?) ffi?ef ?yTg? + fYTMY : I?I < A J.
It is clear that
(2.37)			`k(P9) = `k9(It), Ap9= 0,
and, similar to (2.15) --H (2.17), ?A9> 0 such that
(2.38)			(yTMy + A9)? = ?yTg,
yTMy+4 >0,
(2.39)
and
(2.40)			A9(A --H liii) = 0
Also, we may compute P9 and a9 using function stepback:
(2.41) [p9k, agk] = stepback(xk,p9k,ok?1,I,u,r1,r?),
where ?, r2 E (0, 1) are given. We use 0k--H1 here because ?k(pk) may not be available
yet when we compute p9k Note that by (2.27),
(2.42)			a? < r2 < 1, Vk.
We can now state the basic generic algorithm.
Algorithm Interior-Newton
Let x0 E int(?) be given.
Fork = 0,1,2,..
1. Determine Dk, k
and compute w , e.g., by (2.33).
2. Compute sk such that xk + 8k ? int(?),
e.g., skis an affine combination of akpk and a9kp9k,
defined by (2.13), (2.27), (2.35) and (2.41), respectively.
3. Update xk+1 = ?k + 5k
Algorithm Interior-Newtonallows for considerable freedom in its definition; for
example, the positive diagonal matrix Dk is unspecified. Two natural choices are
Dk --H (vk)-21 and Dk --H vkl-2. We consider each choice in turn. In the next two
sections we consider the convergence properties of algorithm Interior-Newtonwhen
Dk --H (vk)21: we call this version of the basic algorithm, Interior-Newton1.
3.			Convergence of Algorithm Interior-Newton1.			In this section
we establish global convergence properties of algorithm Interior-Newton when the
diagonal scaling matrix, Dk, is chosen Dk --H (vk)21; Interior-Newton1 denotes this
specification of the basic algorithm.
In this and the following section, the notations fsky, fxk), wk, and fDk --H (vk)-21J
are reserved for the sequences in algorithm Interior-Newton1. In each iteration,
(3.1)			Ak E [At, An],
where 0 < At < An < 0o are two given scalars.
We will drop the superscript k sometimes when there is no confusion. All the
norms considered here are II 12 except where otherwise specified. When needed, we
denote fk = f(xk) (or f(xk, wk)) for any function f(x) (or f(x,w)). Sometimes, we
- def
use the capital letter to denote diag(.)for a given vector. For example, V = dia?(v)
and IG = diay(lgI).
The following lemma defines a few basic equalities used throughout the remainder
of this paper.
LEMMA 3.1. Letp, p andP be defined by (2.11), (2.13) and (2.25). Let?, y, p?
andP9 be defined by (2.36), (2.34), (2.35) and (2.41). Then
(3.2) q(x)--Hq(x+ 5) = --H?(5) + I5TD?2IGIs;
2
(3.3) `k(ap) = ?(ap) = --Ha(1 --H ?2)pT(z?TM?z + AI)p --H a; A 1p112 < 0,
Va E [0, min(1,P)];
(3.4)
= Q(a?) = --Ha(1 --H ?2)(yTM?y + Ag)?2 --H a2 A ? <0
Va E [0, min(1,P2)].
Proof (i) is a direct consequence of the definition of ?. (ii) and (iii) are true by
(2.15), (2.16), (2.38), and (2.39).			I
Throughout this paper, the following two assumptions will be in force.
(ASi) The level set fl = [x E ? : q(x) < q(x0)) is compact.
(AS2) AVAT is nonsingular Vx E .
The assumption (AS 1) is standard and condition (AS2) is a nondegeneracy
assumption. By (AS 1) and (AS 2), ?C1 > 0 such that
(3.5)			II(AvAT)?lII < C1, Vx E ,
By (1.15) and (AS2), we see that AIVIAT is nonsingular and we may choose C1
sufficiently large so that
(3.6)			II(AIvIAT)?lII < Ci, ,Vx E , Vw E ?m
To distinguish from the computed wk, for any x E , we let Wj, = wi8(x) denote
the solution to
(3.7)
(AD)Tw L=5 -D(Hx + c),
and
(3.8)			Yts = gis(x) = Hx + c + ATwis.
For a specific x*, we write
(3.9) w75 def wt5(x*) and g75 d4 Hx*+c+ATwi*8.
By (AS2), we have
(3.10) wIk5 = ?(AvkAT)?lAvk(Hxk + c), Vk.
Accordingly,
(3.11)			9ik3=Hxk+e+AT%,
while
(3.12)
gk=Hxk+c+ATwk.
A condition we need for ?wk) is as follows:
(ODi) We say that fwky satisfies (ODi) if wkj 4? whenever xkj ?* ?
where C is a complementarity set which is defined by
(3.13) C def fxEf : v.*(Hx+e+ATw)=o for some wE?my.
Clearly (CD1) can be satisfied in practise. For example, choose wk = w1k5 for all
To ensure convergence we need to guarantee that the reduction in the quadratic
model `kk is a fraction of the reduction obtained by the (truncated) scaled gradient
step. The following condition formallzes this.
Let ? > 0 be given.
We say that a vector ? satisfies (CD2) if
(CD2)			?k(?) <
where Q9k and p9k are defined by (2.41) and (2.35).
We note that similar conditions have been used in [5] and [14] for box-constrained
and unconstrained problems respectively.
THEOREM 3.2. Let [sky be generated by Algorithm Interior-Newton1. Suppose
5k satisfies (CD2) for all k sufficiently large. Then [q(xk)y converges and
(3.14)			?kg1k8			0
Proof For all k sufficiently large, by (3.2), (CD2) and (3.4),
(3.15) q(xk) --H q(xk+l) > --H? ?(,k(a,kp9k) > 0
l0
So ?q(xk)? is monotonically decreasing, therefore, ?q(xk)) converges by (AS1). To
show (3.14), using (3.15) and (3.4) again, we have
(3.16)			a?k(1			(Q29k)2 )((yk)TMkyk+A9k)(?k)2??o
Using (2.38) and (2.34), we have
(3.17)			(yTA?Jy + 4)? = yT?g?8 = IlD9isli.
So if (3.14) is false, then a?> 0 and a subsequence f&J such that
(3.18)			?Dkjgjk?? > (
Applying (3.17), we see that ((ykj)TMkiyki + A9kJ)([tk?)2 is bounded away from zero.
Hence (3.16) and (2.27) imply
(3.19)			?9kj
Therefore, by (2.27), p9ki 0. Since n is finite, from the definition of P9, we may
kj
(without loss of generality) assume that pk> --H U1?kXI. Note that II??JI < AuIIDkII is
(pg')i
bounded by (2.35) and (AS 1), so u1 --H xikj 0 and when kj is sufficiently large,
vik? =			--H xikj			0. But by (2.35),
(3.20)
??kjg1k8i? _ viki ___
________ _			0.
1k>(g?i)1 --H (p9ki)1
Therefore, we must have j1?k,g1k?J1 0 since 1?kj(g1k?)11 is bounded above. This
contradicts (3.18). I
CoRoLLARY 3.3. Let fxkl be generated by A(gorithm Interior-Newton1. Sup-
pose x* is any limit point of ?xkJ and xkj x*. If 5k satisfies (CD2) for all k
sufficiently large, then
(3.21)			v*. * gt*s = v* * (Hx* + c + ATw7s) = 0.
Furthermore, if fwky satisfies (0D1), then
(3.22)			wkj			Wts
Proof. It is clear that wik: w1*8. So by Theorem 3.2, (3.21) holds, and then,
(3.22) is true by (CD1).
Next we show that ?xky is convergent. We define
(3.23)			S =			: x is a llmit point of ?xk?),
and, Vx* E S,
(3.24)			13*			=			13(x*) = fx E f : Xj = xt*, Vi E A*J,
(3.25)			13?*			=			13i(x*) = ?x E 13* : v? $ 0, Vi such that vi*. $ 0).
11
By Corollary 3.3 and (Asi), c ? s # ? if 5k satisfies (CD2) for all k sufficiently large.
It is easy to see that B* is the face of ? which includes x* while B*1 is the (relative)
interior of B*. Usually, x* E BI* C B?. But when x* is a vertex, x* = B1* =
We will use the following notations: given any index set 1, any matrix A E 1Zm>Cn,
and any vector x E )?n, let A1 denote the submatrix of A which consists of those
columns of A with their column indices in 1, and let xi denote the subvector of x
which consists of those components of x with their row indices in 1. We assume the
following strict complementarity condition throughout this paper:
(AS3) Vx E C, if (Hx + c + ATWIs)i = 0, then Vj $ 0.
Under assumption (AS3), if x* E C, then
(3.26)			=			: ii < Xt* < Uj 1 =			: (Yi*s)i =
where A is defined by (1.7).
The following crucial result depends on the fact that q is a quadratic function.
LEMMA 3.4. Suppose x* E Cflfl. Then B1*A(CAP =
Proof. The lemma is clearly true if x*is a vertex. Now assume x* is not a vertex.
If the lemma is false, then ?x* E B1* fl(C fl ) and x* $ x*. Let
(3.27)			x(t) = x* + t(x* --H x*), Vt > 0.
Then
(3.28)			Ax(t) = b and x(t)A* = xA*., Vt > 0.
By (AS2), AA*c has full row rank. Hence the projection matnx into flull(AA*e)
(3.29)			P* d=ef ? --H AAT*c(AA.cAAT?c)?1AA.c
is weWdefined. By (3.13), a?* and w* such that
(3.30)			D*(Hx* + c + ATw*) = 0,
(3.31)			D*(Hx* + c + ATw*) = 0.
Since x*, x* E Bi*, we have VA*?c > 0 and VA*?c > 0. Hence
(3.32)			(Hx* + c + ATw*)A*c = 0,
(3.33)			(Hx* + c + ATw*)A?c = 0.
Note that (ATw)A.c = AAT?cw, Vw E ?m, we have
(3.34) P*(Hx* + c)?*c = P*(Hx* + c + ATw*)A?c = 0,
(3.35) P*(Hx* + c)?.c = P*(Hx* + c + ATw*)A*c = 0,
and,
(3.36)
P*(H(x* --H x*))A*c = 0.
12
It follows from (3.34), (3.36) and (3.27),
(3.37) P*(Hx(t) + c)A*c = P*(Hx* + c)A*c + tP*(H(x* --H x*))A?c = 0,Vt > 0.
So Vt> 0, ?w(t) E ?m such that
(3.38)			(Hx(t) + c)A*c = AAT.cw(t),
(3.39)			(Hx(t) + c + ATw(t))A.c = 0, Vt > 0.
Applying (3.39) and the second equality of (3.28), we have
(3.40)			D(x(t))(Hx(t) + c + ATw(t)) = 0, Vt > 0.
Note that x(0) = x*, x(I) = *, VA*$c > 0, and VA**c > 0, we must have
(3.41)			v(x(t))A.c > 0, Vt > 0
since otherwise, there would be a to > 1 such that x(t0) E F and v(x(to))i = 0 for
some ? E A*c. But then (AS3) would be violated by (3.39) and (3.40). Therefore
(3.42)			lA?c < x(t)A.c < UA*c, Vt > 0
and, by (3.28), (3.42), we have
(3.43)			x(t) E f Vt > 0.
Note that
(3.44) (Hx* + c)T(x* --H x*) = (Hx* + c)?T.c(x* --H x*)A*c (since X*A* =
= (ATw*)AT?c(x* --H x*)A?c (by (3.32))
= (ATw*)T(x* --H
= ?(w*)TA(x* --H x*) = 0, (since A(x* --H *) = 0)
and
(3.45)
--H x*)TH(x* --H
--H x*)AT*c(H(x* --H x*))A*c
--H x*)AT.c(?AT(w* --H w*))A?c
--H x*))T(w* --H w*) = 0.
(by (3.32) and (3.33))
Therefore, by (3.27), (3.44) and (3.45), we have
t2
(3.46) q(x(t)) = q(x*) + 1(Hx* + c)T(r* --H x*) + --H2(x* --H x*)TH(x* --H
--H q(x*) < q(x0), Vt > 0.
Hence,
(3.47)
x(t) E 			Vt> 0,
13
which is a contradiction to (AS1).
LEMMA 3.5. Let ?5kj be generated by Algorithm Interior-Newton1 Suppose 5k
satisfies (CD2) for all k sufficiently large. Then 5 is finite and ?x* E 5 such that
B*AS =
Proof. It is clear that 5 c C An. So by Lemma 3.4, the interior of any face or
edge of ? includes at most one element of 5. Since the total number of faces, edges,
and vertices of ? is finite, 5 is finite. Choose x* E 5 satisfying
(3.48)			IA* =			IA(X)I1
We now show that `3* fl5 = r*. In fact, if A* = ?, then by the definitions of `3* and
`31*, `31* = `3* So by Lemma 3.4, our claim follows. The claim is certainly true if x*
is a vertex. Now suppose that A* $ ? and that x* is not a vertex. If the claim is
false, then ?x* E `3*A5, x* $ x*. By Lemma 3.4, x* ? `31*A5 Hence, XA*. =
and Si E A*C such that Vs* = 0 Therefore IA* > A* , which is a contradiction to the
choice of x*			I
To establish convergence of ?xk), we need to first establish a boundedness condi-
tion on the steps 5k Let ? > 0 be given. (Recall that by (3.1), Ak is bounded above
and below.)
We say that a vector ? satisfies (CD3) if
(CD3)			II(Dk)?1?II < ?Ak.
Condition (CD3) is satisfied, for example, if ? = ahpk or = 09kp9k, where ?k, pk,
?2k and p9k are defined by (2.27), (2.13), (2.41) and (2.35).
THEOREM 3.6. Let ?5ky be generated by Algorithm Interior-Newton1 and sup-
pose 5k satisfies (CD2) and (CD3) for all k sufficiently large. Then txkj converges
to x* which satisfies (?.?8).
Proof. Let x* satisfy (3.48). By Lemma 3.5, `3*A5 = x*. If A* = ?, then `3* =
and ?A5 = x* and the theorem is true. So we assume that A* $ ?.
If ?xk1 does not converge to x*, then let
(3.49)			5'			def
(3.50)			E*			(Lef t??A??s ?I(g7s)iIY > 0,
(3.51)			def			1
--H 3(1+?oAu)
Since5isfimteand'3*A5=*,?0<e<?* suchthat0<C?< land
(3.52) V* E 5', ?i* = i(x*) E A*, such that lx*i* --H ?;*l > E.
Therefore, ?K1 > 0 such that for each k > K1,
(3.53) either x,k. --H x5*.J21 < C?e, Vi E A*, or x,k. --H x,*.l > -32e, for some i E A*.
Also, if E is sufficiently small, ?K2 > K1, such that Vk > K2,
(3.54) if x5k, --H xs*l?21 <C?E, Vi E A*, then v4 = x,k --H x5*l, Vi E A*
14
and by (CD3),
(3.55) 1(5k)j1 < ?0?u(V?)21, Vi E A*, if Ixtk --H x;I2' < C?e, Vi E A*.
Since x* E S, ?k0 > K2, such that
(3.56)			Ix,ko --H 4121 < C?e, Vi E A*-
Then by (3.55), we have
(3.57)			IxtkO+1 x*iI <
(1 + ?Au)IxtkO --H x*iI21
1			Vi E A*.
3C? IxtkO --H 4I?21
3'
So, by (3.53),
(3.58)			IxtkO+1 --H Xt*I21 < C?E, Vi E A*
Hence, by induction, we have that Vk > k0,
c
(3.59)			Ixtk --H xt*I < Jx1k. --H xt*I21 < C?e <			Vi E A*.
This is a contradiction to (3.49) and (3.52).
We can now establish a first-order convergence result.
LEMMA 3.7. The sequence ?xky generated by Algorithm Interior-Newton1 con-
verges to a first- order point3 x* if for all k sufficiently large, 5k satisfies (CD2),
(CD3) and the following:
(3.60)
sign((5k)j) --Hsign((g1*?)j), Vi E A*.
Proof Let x* be the point in Theorem 3.6. We only need to show that x* satisfies
(1.4). In fact, by (3.21), v?(g78)j = 0, Vi. So, if l? < xt* < Uj, then (g78)? = 0. Suppose
--H l?. Then we must have (gts)i > 0. Since otherwise, (g78)? < 0 and by (3.60),
x,?+T> xkj for all k sufficiently large. Then, limxk? $ l?, which is a contradiction to
that lim xkj = X*j = li Similarly, we can show that if X*j = Uj, then (gts)i <0 I
Condition (3.60) is crucial to the first-order convergence result; therefore, it is
important to realize it can be satisfied with the scaled gradient direction, p9k This we
prove next.
LEMMA 3.8. The vector pgk satisfies (CD2), (CD3) and (3.60) Vk sufficiently
large, where p9k is defined by (2.35).
Proof. (CD2) and (CD3) are obviously satisfied. To prove (3.60), note that
(9t*s)i $ 0, Vi ? A*. So when k is sufficiently large, sign((g?k8)j) = sign((g??)?), and by
(2.39) and (3.17), ?k <0. Hence, by (2.35), we have
(3.61) sjgn((p9k)j) = sign(jikiv;bk(4k;,)(1) = si9n((?)j) = --Hsign((g1*8)?), ViE A*.
3i.e., z* satisfies condition (1.4).
15
One implication of the previous two results is that aIgonthm Interior-Newton1
guarantees first-order convergence if we replace 5k with the scaled gradient direction
whenever (3.60) is not satisfied.
Now, we turn to the second-order optimality condition. The following lemma
reveals the relation between the definiteness of H in Ar and that of z?TA?z?.
LEMMA 3.9. Suppose x* satisfies (3.21). Let
M* def (z*)TM*2* = (2*)T(D*HD* + I&1*81)z*.
Then
(i) il >0 inAr* if and only if M* >0.
(ii) H >0 inAr* if and only if A?* >0.
Proof. We will prove (ii). (i) can be proved similarly. Assume first that H > 0 in
Af* Let 5 E 1Z?-? and 5 $ 0. By (AS2), AD* has full row rank. So Z* E ?nx(n-m)
and Z*s $ 0. Let
i, d=ef sTM*s = (D*z*s)TH(D*2*s) + (z*s)TIGi*5Iz?*s.
Note that AD*Z* = 0 and (D*Z*s)j = 0, Vi E A*, we have D*Z*s E Ar* So
(D*z*s)TH(D*z*s) > 0. Hence v > 0. If D*Z*s $ 0, then v > 0, we are all set.
Otherwise, (Z*s)j = 0, Vi E A*c. Since 2*s $ 0, ?i E A* such that (Z*s)j $ 0. But
by (AS3), (gts)i $ 0, Vi E A*. So v > 0, and therefore, M* > 0.
Now assume that M* > 0. Let x E Af* and x $ 0. Define x E 1t? as follows:
ViEA*
Xj=			otherwise.
Then x = D*x. Since AD*j = Ax = 0, ?x such that x = Z*x. Note that x E )V* and
x $ 0, we have x $ 0 and x $ 0. Hence
0 < xTM*x = x?T(z?*)TD*HD*z*i + iT(z*)TIG;aIz*x = xTHx,
which implies that H >0 in Ar*			I
Before proceeding to the second-order optimality conditions, we justify the fact
that 0 defined by (2.23) is a good measure of optimality.
LEMMA 3.10. Let 0 be defined by (2.23). Then 0 = 0 if and only if x satisfies the
first-order and second-order optimality conditions.
Proof. We first assume that 0 = 0. Then v. * g = 0 which implies that x satisfies
the first-order condition. In addition, `k(p) = 0. So by (2.14) and (2.15), we have
(3.62)			1 Tz?TMzp +			= 0.
--H2p			Alipli2
By (2.15) and (2.16), pTzTg < 0. So ?(p) = ?(p) = 0 yields
(3.63)			?21pTzTM?z?p > 0.
16
Hence,
(3.64)			AJIp1j2 = 0.
Then, by (2.17) and A> 0 A = 0. Therefore, by (2.16),
(3.65)			zTMz?> 0
which implies that x satisfies the second-order condition by Lemma 3.9.
Now we assume that x satisfies the first-order and second-order conditions. Then
first, IIv. * ?II = 0 Also, g = 0. So (2.15) implies
(3.66)			pTz?TM?zp+ Ailpil2 = 0.
Then by Lemma 3.9 and the second-order conditions, we see that pTz?TM2p > 0
which by (3.66) implies that
(3.67)			?(p) = ????			1 TzT
=-2p			MZp=0.
The next result claims that if the point of convergence is a complementarity point,
i.e., (3.21) is satisfied, then (a? defined by (2.27) is bounded away from zero.
LEMMA 3.11. Suppose ?wky satisfies (CDl). Suppose ?xkJ converges to a point
x* that satisfies (3.21). Then aa0 > 0 such that Qk > a0, Vk, where a? is defined by
(2.27).
Proof. If the lemma is false, then ? a subsequence ?kjJ such that
(3.68)			ak)			0.
Therefore, by (2.27), pk,			0.
Since n is finite, from the definition of p, we may
(without loss of generality) assume that pkj --H Ul?kX?. Note that 1pkj < AuIIDkII is
(p i)1
of ?xkJ, so u1 --H xiki 0 and when kj is
Hence, v1* = 0 and by (AS3), I(g7?)iI > 0.
bounded by (2.13) and the convergence
sufficiently large, viki = tti --H xikj 0.
But by (2.19),
(3.69)
and so
D2Hp + IGIp + Ap + D2g + ?2?Tp? = 0,
(3.70)			?g1ki?+Aki			= v1k?			0.
(Hpkj)1 + g1ki + (?Tp?ki)1 (pkj)1
Therefore, by the fact that g1k3 (gI*s)1? we must have
(3.71)			(Hpki)1 + 91k, + (?Tp?ki)1			00,
which is impossible since by (AS2), (CDl), (2.20) and the convergence of ?xkJ,
IIH? + gk + AT?wII is bounded.
17
Guaranteeing a reduction in ?k which is a fraction of the reduction obtained by
the truncated trust region step, 0Lkpk, where a? and pk are defined by (2.27) and
(2.13), allows us to establish convergence to a second-order point. This is the reason
for the following condition.
(CD4)
We say that a vector ? satisfies (CD4) if
<
where ?k and pk are defined by (2.27) and (2.13).
THEoREM 3.12. Suppose ?wk1 satisfies (CDl) and 5k satisfies (CD4) for all k
sufficiently large. Suppose (xky converges to a point x* that satisfies (3.21). Then x*
satisfies (1.5).
Proof. By (3.2), (CD4) and (3.3), for all k sufficiently large,
(3.72)			q(xk) --H q(xk+l) > ??k(akpk) > ??ffiAkPkII			0.
By Lemma 3.11, we see that ?k11pk1			0, and by (2.17),
(3.73)			Ak 0.
Hence, by (2.16),
(3.74)			(z*)TM*z* > 0.
Therefore, (1.5) holds at x* by Lemma 3.9.
To sum up, we have the following theorem:
TllEoREM 3.13. Suppose fwkj satisfies (CD1) and 5k satisfies (CD2), (CD3),
(CD4), and (3.60) for all k sufficiently large. Then the sequence ?xkJ generated by
Algorithm Interior-Newton1 converges to a point x* which satisfies the first-order
and second-order necessary optimality conditions. Further more, if H > 0 in Af*, then
x* ts a solution to (1.1).
As pointed out before, (CD1) can be satisfied by letting wk = w1k8. We now show
how (CD2), (CD3), (CD4), and (3.60) can be satisfied simultaneously. First, we
define a parameter as follows: let 0 < r < 1 be given, let ?, 7? E ??,
if ?k(?) < r
otherwise.
(3.75)			a = ak(?, TI) %?ef
Clearly, if a?, pk, Q9k and p9k are defined by (2.27), (2.13), (2.41) and (2.35), and
5k = akak? + (1 --H ak)a9kp9k, (CD2), (CD3) and (CD4) are satisfied. Next, we show
that (3.60) is also satisfied.
LEMMA 3.14. Suppose fwkj satisfies (CD1) and 5k satisfies (CD2), (CD3),
and (CD4) for all k sufficiently large. Then for all subsequences ?pk,? such that
pk? 0, we have
s?gn(p?k.J) = --Hsign((g78)?), Vi E A*, Vk5 sufficiently large,
18
(3.76)
where pki is defined by (2.13).
Proof. By Theorem 3.6 and Corollary 3.3, ?xkJ converges to * where x* satisfies
(3.21). Also, by (AS2), (ODi), (2.20) and the convergence of ?xky, IIHpk + 9k +
ATp?k? is bounded, and by (As3), I(97s)iI > 0, Vi E A*. So, by (3.69), ?C2 > 0 such
that for all k sufficiently large,
(3.77)
i?k1 1??(Hpk)i+g2k+(AT?wk)i <c2vk. Vi EA*,
g?k?+Ak
Pi I --H?
and consequently, by (2.13),
1(Zkpk)il = p?k?1 <c2(v?2k.)21 0, Vi E A*.
(v?ik)?2 --H
So,
(3.78)			IGkiz?kpk			0, since 92k			0, Vi E A*C.
Hence by the assumption that Dk, zkipk; = pk3 0, we have
(3.79)			Mk,zk,pki --H DkiHDk,z?kipk? + lck,lzk3pki			0.
Therefore, by (2.20), (AS2), (3.73), and the fact that gk 0, we have
k
(3.80)			pU;			0.
Finally, using (3.69) again, for all k5 sufficiently large and i E A*, we have
(3.81) s?gfl(p2k.J) = 8ign(???k.J (HpkJ)i+g?+(?Tp?k?)j
g,k.iJ + Ak,
--H sigfl(?g?k.J) = --Hsign((g7?)j).
THEoREM 3.15. Suppose for all k sufficiently large, wk = wiks and sk = akakpk +
(1 --H ak)o9kp9k, where ck, pk, Q9k and p9k are defined by (2.27), (2.13), (2.41) and
(2.35). Then the sequence ?xk) generated by Algorithm Interior-Newton1 converges
to a point x* which satisfies the first-order and second-order necessary optimality con-
ditions. Further more, if H > 0 in )v*, then x* is a solution to (1.1).
Proof. By Theorem 3.13, we only need to show that 5k satisfies (3.60) for all k
sufficiently large. By (3.61), it suffices to show that
(3.82) sign(p,k.) = --Hsign((g1*5)?), Vi E A*, Vk sufficiently large, and a? = 1.
In fact, for all k such that ak = 1, we have 5k --H 0kpk So by Lemma 3.11 and the
convergence of fxk
(3.83)			?pk a? = 1?			0.
Hence, by Lemma 3.14, we see that (3.82) holds.			I
19
We complete this section with two observations.
First, we remark on the computational cost of each iteration. If Algorithm
Interior-Newton1 is implemented with wk = wjk8 and sk = akak? + (1 --H ak)a9kp9k,
then the main cost in each iteration is the QR-factorization of ADh (to get zk and
wk) and a trust region subproblem solution. This is reasonably practical for small and
medium-sized problems.
Second, we note that our choice of a in (3.75) is just one of numerous possible ways
to allow for an effective combination of the scaled gradient and trust region directions.
Another possibility, for example, is to follow the "dogleg" idea, (e.g., [18J), a popular
idea used in unconstrained minimization settings. Specifically, let
(3.84) a=ak(?,? def ?O< a <1 : ?k(a)d=ef?k(a?+(1?a)?=minJ.
4. Quadratic Convergence of Algorithm Interior-Newton1. In
this section we consider the local convergence rate properties of algoritlim Interior-
Newton1: we establish superlinear and quadratic convergence results.
Recap: For ease of reference we summarize here the global assumptions used through-
out this section and the next. Recall that V is a diagonal matrix, defined by (1.12),
and C is the set defined by (3.13), i.e.,
(4.1)
def			for some wE)ZmJ.
C =			: v.*(Hx+c+ATw)?o
The least-squares multipliers, Wis, are defined in (3.9) and (3.10).
(ASi) The level set fl = ?x E ? : q(x) < q(x0)J is compact.
(AS 2) AVAT is nonsingular Vx ?
(AS3) Vx E C, if (Hx + c + ATwt ). --H 0, then Vj ? 0.
We also summarize the various conditions that will be referred to in different
results to follow. Bear in mind that ultimately the conditions will be satisfied by
appropriate and computable quantities.
(CDl) We say that ?Why satisfies (CD1) if wkj			?1*5 whenever xkj			x* E C.
A vector ? satisfies (CD2) if
(CD2)			?k(?) <
where O9k and p9k are defined by (2.41) and (2.35).
(CD3)			A			satisfies (CD3) if
< ?Ak.
A vector ? satisfies (CD4) if
< ?`kk(okpk),
where 0k and pk are defined by (2.27) and (2.13).
We assume that for all k sufficiently large, wk satisfies (CD1), sk satisfies (CD2),
20
(CD4)
(CD3), (CD4), and (3.60). Therefore, by Theorem 3.13 and Lemma 3.10,
(4.2)			wk			` W7s,
(4.3)xk x* satisfying the first-order and second-order optimality conditions,
and
(4.4)			0k			0,
where 0k is defined by (2.23).
LEMMA 4.1. Assume H > 0 in Ar*. Then pk 0, p?k 0 Ak --H 0, and
(zh)TMkzk > 0 for all k sufficiently large, where pk and p?k are defined by (2.13) and
(2.20), Ak is defined by (2.15).
Proof. Since H> 0 in Ar*, by Lemma 3.9, M* > 0. So by (3.73), ?C3 > 0, such
that for all k sufficiently large, (2k)TAIk2k + Aki> 0 and
(4.5)			jJ( (zk)TMkzk + AkI)?lII < c3.
Hence, from (2.15) and the fact that g?k			0, we have
(4.6)			pk			0.
Therefore,
(4.7)			pk --H Dkz?kpk			0,
and by (2.17),
(4.8)			Ak = 0, Vk sufficiently large.
So,
(4.9)			(z?k)TMkzk > 0, Vk sufficiently large.
Finally, similar to (3.80), we have
(4.10)			pk?			0.
LEMMA 4.2. Suppose H > 0 in Af* Then ck 1, where ak is defined by
(2.27).
Proof. Vi E A*c, Vt* > 0. So by (4.7),
411			BRk			1			xk			--Hx,k. __
= max?			k			k			00, Vi E
pi
Recall that (Yts)i 96 0, Vi E A*. Without loss of generality, assume that (9i*s)i > 0.
Then Xt* = Ii and similar to (3.81), when k is sufficiently large, p?k. < 0. By considering
(3.69), (4.7), (4.8), and (4.10), we see that
l.?xk			-k
412			BRk. --H 2			--H _ Vj _			IgtkI+Ak
--H p2k. --H p7k. --H (Hpk)i + g?k + (ATp?k)j
21
1, Vi E A*
Therefore, by (2.25), (2.26), (2.27), and (4.4), we have pk			1, pk			1, and
1.			I
Next we cite a standard result (see, e.g., [7]), used subsequently.
THEOREM 4.3. Let V C ?? be an open convex set, F: ?? ?n y* E V,
= 0, VF(y*) nonsingular, and VF is Lipschitz continuous at y* in V. Let ?TkY
be a sequence of nonsingutar matrices in ?nxn, and suppose for some y0 E V that
the sequence of points generated by yk+1 = yk TElF(yk) remains in V, and satisfies
yk $ y* for ati k, and yk
If 11Th --H VF(y*)ll 0, then fyk) converges supertinearly to y*.
If 11Th --H VF(y*)11 = 0(11yh --H y*ll), then [yh) converges quadraticatly to y*
Theorem 4.3 can not be applied directly to F or F since these systems are not
differentiable at some points. Instead, we apply Theorem 4.3 to the following function:
(4.13)			P(x,w)dJ ?V;ffi?? :?m+n??1n+n,
where
?+?			if x* --H _____
(4.14)			v(x)i def ??,2 `			othetrwlse2.
It is clear that v(x*) = v(x*). So
(4.15)			P(x*,w1*8) = 0.
Also, VF exists and continuous everywhere ?? ?m+n Since
= 0, Vi E A
(Vv*)ii = sign((g??)2*.), Vi E A*,
we have, in a neighborhood of (x*, w1*3),
(4.16) VF(x, w) = VVGA+ VH VAT
0
- IGI+VH VAT
A			0
LEMMA 4.4. `7F(X*,Wt*s) is nonsingular.
Proof Let VF*s = 0, where 5 = (5x, 5w) has appropriate size. Then by (4.16),
(4.17)			Asx = 0
(4.18)			(IGi*s + V*H)sx + V*ATsw = 0.
It follows from (4.18) that VA** = 0, we have
(4.19) l(gt*5)??l * (Sx)A = VA*.. * (H5x)A + VA*.. * (ATsw)A = 0.
But (g1*5)? $ 0, Vi E A*, so (Sr)A = 0 and by (4.17), we see that
(4.20)			?x E Af*
22
Also from (4.18), and that (gts)A?c = 0, v2* ? 0, Vi E A*c, we have (Hsx)A*c +
(ATsw)A*c = 0. Thus
(4.21)sfHsz = sfHsx + sf(ATsw) = (sx)AT.c(Hsx)A*c + (sx)AT.c(ATsw)A*c = 0.
By the assumption that H > 0 in Ar*, we have ?x = ? Then from (4.18) and (A.3),
5w = 0. So 5 = 0, which implies that VF* is nonsingular. I
LEMMA 4.5. ?? > 0 and a neighborhood Bs(x*, Wt*a) of (x*, wt*s), such that VF
is Lipschitz continuous at (X*,W7s) in Bs(x*,wt3), and V(x,w) E Bs(x*,wt3), VF is
nonsingular.
Proof. The existence of (VF(x,w))-1 in B6(x*,%) is obvious by Lemma 4.4.
For the Lipschitz continuity, V(x, w) E B6(x*, W7s), ?C6 > 0, such that
(4.22)
IIVF(x, w) --H VP(x*, wt*s)II
VvG+VH -Vv*G1*3 -V*H
0
< C(3 wX$*s
(v?v*)AT
0
It is not clear how to directly apply Theorem 4.3 to the sequence ?(xk, wk)y
since fwkl is not updated in the form of wk+1 = wk + pwk Therefore, to establish
the convergence rates of f(xk, wk)) and ?xk1, intermediate steps need be inspected.
Specifically, we consider the sequence f(xk, wk)y where fwky is defined by
(4.23)			wk =wk?1 +pwk?1, Vk=1,2,...,
and ?pwkl is defined by (2.20). By considering (4.10) and (ODi), we see that
(4.24)			wk			Wts
By (2.19), note that Ak --H 0 for all k sufficiently large, we have
(4.25)			(H + D?2IGI)p+ ATpw =
So,iffork= 1,2,..., we let
(4.26)			Pwk			=			wk+1?wk,
(4.27)			9k			=			Hxk + c + ATwk.
then
(4.28)
Define
(H + D?2IGI)p+ Wpw =
?a'(VD?2IGI + VH) VAT
0
(4.29)			TQ %?ef
2.1
Since vk(Dk)?2IGkI			I&t*sI, by Lemma 4.2, we have
(4.30)			Tak			VF*
So Vk sufficiently large, TQk is nonsingular and
(4.31)			(akpk,p?k) = ?(Tak)?1F(xk,tVk)
Hence,
(4.32) (xk+1,wk+1) = (xk,wk) --H (T?)?1F(xk,wk).
Using Theorem 4.3 and (4.30), we have the following theorem:
THEoREM 4.6. Suppose H > 0 in Af* Suppose for all k sufficiently large,
5k --H ?kpk, where Ok andpk are defined by (2.27) and (2.13). Then (xk,wk) converges
to (x*,w1*3) superlinearly.
Next, we show that ?(xk,wk)y converges quadratically to
LEMMA 4.7 Suppose H >0 in Ar*. Suppose for all k sufficiently large, wk = wiks
and 5k --H 0kpk, where 0k and pk are defined by (2.27) and (2.13). Then 1 --H 0k1 =
O(IIxk --H r*II).
Proof. First, for all k sufficiently large,
(4.33)
wt*5 --H wk = *			k
Wj8 --H w15
(Av*AT)?1Av*(Hx* + c) --H (AvkAT)?lAvk(Hxk + c),
By (3.5), it is easy to verify that
(4.34)			IIwk --H wt*sII = o(IIxk --H x*II).
Consequently,
(4.35)			IIvk. * gkII = o(IIxk --H x*II).
By (2.13) and (2.19), we have
IGkI+vkH			vkAT			-			pk			-
(4.36)			A			0			-			- p?k			-			0
It is easy to see that the limit of the coefficient matrix is vp*. So, by (4.35),
(4.37)			?pk? + ?p?k? = o(IIxk --H x*II),
(4.38)			?k(pk) = o(IIxk --H x*II),
(4.39) Ok = o(I!xk --H x*II), and 1 --H pk = o(IIxk --H x*II).
Meanwhile, similar to (4.12), Vi E A* and all k sufficiently large,
BR2k.I = I(Hpk)j+(ATp?k)j
(4.40)			11 --H			(Hpk)j + gtk + (ATp?k)j I = O(IIxk --H x*II).
24
So,
(4.41)
and therefore, by (4.39),
(4.42)
?i pkj= o(IIxk --H x*II),
11 --H = O(IIxk --H x*II).
THEoREM 4.8. The sequence ?(xk, wk)1 converges to (x*, Wi*s) quadratically under
the assumptions of Lemma 4.7.
Proof. We show that
(4.43)			IlTak --H vF*II = o(IIxk --H x*II).
Then by Theorem 4.3, the result follows. In fact, let II IF denote the Frobenius norm,
we have
(4.44) IIT0k --H vP*IIF = ??vk(Dk)?2IGkI --H ic;5 IF + II(?lvk --H V*) HilF
0Lk
+II(vk --H V*) ATIlF + II(--HJk --H 1) AilF.
It is not hard to show that
(4.45) IIVk(Dk)?2IckI --H IGISI IF + IIVk --H V*IIF = o(IIxk --H x*Ii).
So by (4.42), and the equivalence of the norms, (4.43) holds.
Next we establish the convergence rate of the sequences fxky and f(xk, wk)1. To
do this, we need the following lemma which can be regarded as a complement to
Theorem 4.3. Its proof is similar to that of Theorem 8.2.4 in [7].
LEMMA 4.9. Suppose in Theorem 4.3, there is a partition y = (yi, Y2) and
IITk --H VF(y*)? = O(??y1k --H y1*II) Then ?C > 0 such that for all k sufficiently large,
(4.46)			Ilyik+l --H y;?J < c IIyk+l --H yk? Ilyik --H y1*II
THEOREM 4.10. Suppose H > 0 in Af* Suppose for all k sufficiently large,
wk = w1k5 and 5k = akpk, where 0k and pk are defined by (2.27) and (2.13). Then
?C > 0 such that for all k sufficiently large,
(4.47)			11xk+1 --H r*II < c 1xk?1 --H x*iI 11xk --H x*II,
(4.48)			II(xk+1,wk+1) --H (x*, wi*8)II
< c 1(xk?1, wk?1) --H (x*, wl*s)II 11(xk, wk) --H (x*, wt8)II.
Proof By (4.43) and Lemma 4.9, SC > 0 such that for all k sufficiently large,
(4.49) iIxk+l --H x*ii < cII(xk+1,wk+1) --H (xk,wk) 11xk --H x*II.
25
Therefore, (4.47) and (4.48) follow by (4.23), (4.34), and (4.37).
Theorem 4.10 shows that the convergence rates of fxky to x* and ?(xk, wk)y to
(x*, Wi*a) are almost quadratic, stronger than both superllnear and 2-step quadratic.
The last result of this section shows that 5k = ?kpk, where Pk is given by (2.13)
and a? is given by (2.27), will ultimately hold for all k. Therefore, the convergence
rate results mentioned above will apply to algorithm Interior-Newton1.
THEOREM 4.11. Suppose H > 0 in Ar*. Suppose for all k sufficiently large,
5k = ?k?kpk + (1 --H ak)a9kp,k, where ?k is defined by (3.75) and 0k, pk, 09k and p??k
are defined by (2.27), (2.13), (2.41) and (2.35). Then 5k --H akpk for all k sufficiently
large.
Proof. By (3.75), We need to show that
(4.50) ?k(akpk) < 7?k(??p?), Vk sufficiently large.
In fact, by (2.15) and (4.8), we have
(4.51)			?k(akpk) = ?k(2 --H ak)?k(pk).
On the other hand, j?(Dk)?1o9kp2k? < ?k, ??
(4.52)			?k(a?kp,k) > ?k(pk) = ?k(pk)
Therefore, by Lemma 4.2,
(4.53) ?k(?kpk) = ak(2--H0k)?k(pk) > Q?(2 --H o?k)
`kk(a9kp?)			?k(a?kp??)
Since r < 1, we see that (4.50) is true.
5. An Accelerated Version of Algorithm Interior-Newton and Numer-
ical Experiments. We have shown above that Algorithm Interior-Newton1
has strong convergence properties. However, this version of Interior-Newton uses
Dk = V?21 which is based on the nonllnear system (1.17). System (1.17) is weaker than
(1.16). In particular, (1.16) includes the "sign condition" for optimality, whereas (1.17)
is derived solely from feasibility and complementary slackness conditions. Therefore,
an algorithm based on (1.16) would be expected to outperform a similar algorithm
based on the weaker system (1.17): non-optimal points, satisfying complementary
slackness and feasibility, are attractors for (1.17) but not for (1.16).
This logic leads, apparently, to the use of Dk --H vk(21 in algorithm Interior-
Newton. Unfo?tunately, the unbridled use of Dk --H vkl-21 does not appear to yield a
convergent process. Nevertheless, computational performance using Dk --H Vk I 21 often
yields a significant improvement over the use of Dk --H V?21. Consequently, we have
developed an algorithm that mixes the use of these two different scaling matrices. We
use Dk --H vkl-21 when progress is good --H in most cases this scaling accelerates progress
toward a neighborhood of the solution. If progress is weak Dk --H V?21 is used. In
this fashion impressive computational performance is achieved whilst maintaining the
26
strong global convergence properties of algoritlim Interior-Newton1. The second-
order convergence rate is maintained. We denote this algorithm, Interior-Newton2.
Specifically, algorithm Interior-Newton2 is defined to be Interior-Newton with
Step 1 and Step 2 being described below.
Let 0 < r1, r2, T3, Th, T5 < 1, and ?, ?, A1, A? > 0 be given. For each k, let
(5.1)			Ak E [A1, Au].
(5.2)
Step 1. First, select a trial Dk:
D0 --H (V0)21, and fork>1
`kk?1(?k?1pk?1)			________________
If			< min(r.?,r4ok?1) or
`kk?1(agk?1p9k?1)
Dk = (vk)?1;
else
(5.3)			9-k			--H			Hxk+c+ATwk?1;
(5.4)			vk			=			Mxk(gk);
(5.5)			Dk			=			IvkI2l;
(5.6)
end.
Then, perform an acceptance test to determine Dk:
If Dk#(vk)?21
jk = 11vk. * 9-k1 / (1+ 1vk. * gk11);
compute wk, zk; e.g., by (2.33);
and compute p9k; e.g., by (2.35);
if ?k(09kp,k) / ?k(p,k) < min (T3, ?
Dk - (vk)21;
end
end
Finally:
If Dk (vk)21
compute wk, z?k;
and compute p?k;
end
e.g., by (2.33);
e.g., by (2.35);
Step 2 First, compute pk; e.g., by (2.13); then compute sk such that
(5.7) `kk(sk) < min(?1?k(akpk), ?2?k(a9kp9k))
In the Appendix, we verify that algorithm Interior-Newton2 has the same strong
convergence properties as algorithm Interior-Newton1. Here we make a few remarks
before discussing our computational experiments.
1. As we show in the Appendix, due to the acceptance test in Step 1 for safe-
guarding, the convergence of ?xkJ is independent of the choice between Dk --H
IV??I?21 and Dk --H (vk)-21. However, a proper choice is important for improved
practical performance: the rule we specify has done well in our numerical
experiments.
2. Note that the computation of 9k is based on xk and wk-1 since wk is not
available yet at this stage. (We could compute gk based on xk and wiks but
an additional QR-factorization would be required.)
3. We show in the Appendix that, ultimately, only one computation is needed
for wk, zk and p9k in each iteration; therefore, ultimately there is no extra
cost for the acceptance test.
4. To satisfy (5.7), we compute sk = akakpk + (1--H ok)%kp9k, where ak is defined
by (3.75).
We have implemented our algorithms in Matlab 4.0 and conducted some prelim-
inary testing to investigate the practical viability of our approach. The experiments
were performed on a Sun (5parc) workstation. In the remainder of this section we
present and discuss our preliminary numerical results.
Problem Generating: The More/Toraldo [15] QP-generator was adapted to gen-
erate the test problems. All the problems reported here are dense, and of moderate
size. For positive definite problems the single global solution, with prescribed charac-
teristics, was generated. For indefinite problems, the local minimizer determined by
the algorithm(s) may have little relationship with point generated (with prescribed
characteristics) by the test problem generator: indefinite problems have many local
minimizers.
Starting and Stopping: For each test problem, a feasible starting point was found
using a feasible point determination algoritlim [13]. This procedure does not involve
the matrix H nor the vector c; therefore, the feasible starting point can be considered
somewhat arbitrary with regard to problem (1.1).
As usual, choosing a robust stopping criteria is not easy. We have used stopping
criteria based on three computations: the relative difference in the objective function
values of two successive iterations, the size of o?, and the size of () which is defined by
(2.23). We terminate the iteration if:
(5.8)
or
(5.9)
or
(5.10)
q(xk) --H q(xk+1) ? tol * (1 + q(xk)?)
and
0Lk > ? 1
0k <tot
k = 100.
28
Criterion (5.9) is reasonable since we have proven that 0 = 0 if and only if the first-
order and second-order necessary optimality conditions are satisfied at x. Criterion
(5.8) is a commonly used one which means no significant progress can be made. We
introduced a? > 0.1 since we did not want to stop the iteration if (5.8) was caused by
a very small step length. In our experiments, tot --H 10--H12 This stopping criteria was
successful for our experiments since for every problem we tested, the first-order and
second-order optimal conditions were confirmed.
Computation and Parameter Setting: Our implementation is straightforward: i.e.,
wkis computed by (2.33), p?k is computed by (2.35), and pkis computed by (2.13),
etc..
We compute 5k = akakpk + (1 --H qk)a,kp9k, where akis given by (3.75) and we set
r = 0.95.
Here are the settings we used in our experiments:
(5.11)
ri = 0.8; r2 = 0.9999; r? = 0.001; r4 = r5 = 0.5.
We adjust Ak as follows:
Updating Ak
Let 0 < r6 < r7 < 1, 0 <6? < 1 <?, and 0 < At < Au be given,
A0 = 1;
Fork = 0,1,2,..
bdk --H okak + (1 --H 0k)?,k;
If bdk < r6
Ak+l --H max (At, ?Ak);
end.
If bdk > r7
Ak+l --H min (Au, ?Ak);
end.
The corresponding parameters are set as follows:
r6 = 0.5; T7 = 0.9;
6i = 0.75; ? = 1.25;
At = 0.1; Au = 10;
Experimental Results: We tested four groups of problems using Algorithm Interior-
Newton2 and the results are tabulated in Tables 1 --H4. For each case, three problems
were attempted. The table entries are the numbers of iterations required to satisfy the
stopping criteria. Positive definite problems are reported in Tables 1 and 2; indefinite
problems are reported in Tables 3 and 4. For each problem in Tables 3 and 4, about
10% of the eigenvalues of H are negative.
The lower bounds were all set to zero. The upper bounds were all set to unity in
Table 1 and 3 while in Table 2 and 4, about 10% were set to be infinity (that is what
29
pctinf = 0.1 means). For all the problems, about 0.8 * (n --H m) components of the
generated x were set to the bounds and cond(H) = cond(A) = cond.
We did not encounter any instance where q(xk) --Hoo; however, for our gen-
erated indefinite problems, with some infinite bounds, there is no guarantee that q is
bounded below in the feasible region.
Very few iterations needed more than a single QI? factorization (see Theorem
8.10). Over all 3806 iterations (for the 216 test problems), only 18 extra QI? fac-
torizations were used. On average, Dk = (vk)21 was used about twice per problem
(remember that D1 is always set to be (V1)21). The average cost for finding a feasible
starting point was 4.30 linear system solves, each of order (m + n).
TABLE 1
Positive Definite Problems, pctinf = 0
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
100			10			18			17			18			16.3			20			17
100			50			16			13			17			15.3			23			17.3
100			90			12			10.3			13			11.3			18			16.7
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
200			20			17			16			21			18.7			23			21
200			100			20			18.3			22			19.3			22			19.3
200			180			15			13.3			16			13.3			17			15.3
TABLE 2
Positive Definite Problems, pctinf = 0.1
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
100			10			18			16.3			17			16			18			16.3
100			50			15			12.7			18			16			22			16
100			90			13			11			14			11.7			16			14
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
200			20			17			16			22			20			19			18.3
200			100			21			18.3			21			19.7			22			19
200			180			15			13.7			14			13			24			17.3
30
TABLE 3
Indefinite Problems, pctinf = 0
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
100			10			20			18.3			26			20.3			29			23.3
100			50			20			19.3			23			20			17			16
100			90			13			11.3			13			11.7			18			14.7
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
200			20			21			20.3			26			24.7			32			28.7
200			100			25			21.7			28			23.7			23			21.3
200			180			17			15.7			14			12.7			17			16
TABLE 4
Indefinite Problems, pctinf = 0.1
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
100			10			19			17.7			26			20.3			34			24.3
100			50			20			18.3			24			21.3			19			17.3
100			90			13			11.3			14			12.3			17			14.3
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
200			20			39			26			28			25.3			30			30
200			100			27			22			29			23.3			25			21.7
200			180			16			15.3			19			15.7			25			18
Observations: On the whole, the experiments indicate that algorithm Interior-
Newton2 is efficient. The iteration numbers are insensitive to problem size and
condition number. Though our experiments are limited, they clearly indicate that the
new algorithm is promising.
For purposes of comparison, we have used Algorithm Interior-Newton1 to solve
the same set of problems as in Table 1, and the results are presented in the following
table. In most cases both algorithms terminated at (almost) the same function values
for each problem; however, the required numbers of iterations are quite different. On
average, the number of iterations is 16.1 in Table 1 but is 31.0 in Table 1'. The largest
number is 23 in Table 1 but is 89 in Table 1'
31
TABLE 1'
Positive Definite Problems, pctinf = 0
(altk = 1 for all k)
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
100			10			20			18			27			20.7			35			27
100			50			60			31.3			69			36.7			28			19
100			90			53			34.3			21			15.7			89			58
size			cond=10			cond=10			cond=10
n			m			max			avg			max			avg			max			avg
200			20			20			19			43			27.3			40			31
200			100			59			50.3			56			35.7			59			35
200			180			53			44			56			28.7			40			25.7
6. Concluding Remarks. We have proposed an interior Newton method
with a new scaling strategy for general quadratic programming problems. The algo-
o+ rithm is robust and has stronger convergence properties than existing interior meth-
ods for the general quadratic programming problem. Specifically, the main theoretical
property of our proposed method can be summarized as follows. Under compactness
and nondegeneracy assumptions, i.e., (AS 1), (AS2), and (AS3), our proposed al-
gorithm generates a sequence fxky converging to a point x* satisfying first-order and
second-order necessary conditions. Moreover, if x* satisfies second-order sufficiency
conditions, then the local rate of convergence is 2-step quadratic. Nevertheless, it
may be possible to weaken some of the nondegeneracy assumptions, especially for the
first-order convergence results. This is a subject of future research.
Preliminary numerical experiments suggest that the method is efficient for dense
problems of moderate size. Inspection of the conditioning of the underlying linear
systems, in the limit, indicates that robust asymptotic behavior is to be expected
from this approach, even in the presence of near-degeneracy and ill-conditioning. This
is in sharp contrast to the limiting llnear system in Ye's approach [24] which always
approaches singularity. To further explore this remark, consider the following.
Suppose il > 0 in Af* Then for all k sufficiently large, the resulting coefficient
matrix of Ye's affine scaling method can be formulated as
(6.1)			(zk)TvkHvkzk,
where
(6.2) < zk >= null(Avk), and the columns of zk are orthonormal.
Alternatively, the underlying system can be formulated with the following coefficient
matrix:
k def vkHvk (Avk)T
=			Avk			0
(6.3)
32
The limit matrix of (6.3),
(6.4)			?1* = lim			= - V*HV* (Av*)T E ?(m+n)x(m+n),
?1k			AV*			0
is singular: it is easy to see that there exists a permutation matrix P1 such that
(6.5)			p1T?1*p1 = oo
0 B1
for some B1 E 1Z(m+n--HIA*J)x(m+n--HIA?I) Therefore, the limit matrix
(6.6)			(2*)Tv*Hv*z*
of (6.1) is singular. When vtk 0, the ith row and ith column of ?h1 go to zero at
the rate of O(vtk) This ill-conditioning of ?1k is unavoidable no matter how well H
and A are conditioned.
In contrast, for all k sufficiently large, the resulting coefficient matrix of our algo-
rithm can be formulated as
(6.7)			(zk)TM?k2k --H (zk)T(DkHDk + IGkI)z?k.
Alternatively, the underlying system can be solved using the coefficient matrix
(6.8)			?k <Lef DkHDk + IGkt (ADk)T
ADk			0
The limit matrix of (6.8),
(6.9) = lim ?2k = D*HD* + IG*I (AD*)T E 1?(m+n)x(m+n),
AD*			0
is well-behaved under (AS3) since there exists a permutation matrix P2 such that
(6.10)			p2T?;p2 = diag(Ig?**I) 0
0			B2
for some B2 E 1?(m+n-IA*I)x(m+n-IA?I) and B2 is nonsingular. The limit matrix
(6.11)			(z?*)T(D*HD* + IG*I)z*
of (6.7) is well-conditioned under (AS3). Any possible ill-conditioning of ?2k can only
come from the ill-conditioning of H, the rank defidency of A, and the degeneracy of
the original problem (1.1). Moreover, even in the the degenerate case, i.e., for some
Vt* + Igt*I = 0, the ith row and ith column of ?2k go to zero at the rate of O((vtk)21),
much slower than o(v1k.).
The interior approach we have explored in this paper is an exciting new way to
solve quadratic programming problems: it is theoretically interesting and practically
viable. As it stands it is not suitable for large-scale quadratic programming in that it
requires full-dimensional trust region computations and several matrix factorizations in
each major iteration. In response to this we are currently investigating a modification
33
of this approach involving iterative and approximate linear solvers. This will be the
topic of a future report.
7. Acknowledgements. We thank our colleague Yuying Li for many helpful
remarks on this work. We also thank Yinyu Ye for reading a preliminary draft of this
report and making a number of useful suggestions.
8. Appendix: The convergence properties of Algorithm Interior-Newton2.
In this appendix, we verify that algorithm Interior-Newton2 has the same conver-
gence properties as algorithm Interior-Newton1. To be specific, and simplify the
presentation, we will assume that wk is computed by (2.33) and p9k is computed by
(2.35). We are content with this assumption because it can be satisfied easily, i.e., a
single least-squares solve in each iteration. On the other hand, it appears the con-
vergence results in this appendix remain valid if this assumption is replaced by more
general conditions, e.g., (CDl), (CD2), etc..
The first result we establish is that ?q(xk)) converges if wk, p9k are computed by
(2.33), (2.35) and 5k satisfies (5.7) for all k sufficiently large. However, Theorem 3.2
may not be true since Dk is not always equal to (v?k)21. Instead, we will show directly
that (3.21) is true for any limit point of ?xky, and that (3.55) holds. These two results
imply the convergence of ?xhy. First, we give a simple general result; the proof is
straightforward and so it is omitted.
LEMMA 8.1. Suppose H > 0 is any symmetric matrix. Then
(8.1)			IIHxII2 < xTHx liHil.
LEMMA 8.2. Suppose wkis computed by (2.33). Then ?11wk111 and are
bounded, where A9k is defined by (2.38).
Proof. The boundedness of ?11wk11y can be seen from (3.5), (3.6) and the way wk
is computed. To consider ?A9kJ, note that by (2.38), we have
A9k?k = ?(yk)TMkyk?k --H (yk)Tg?k, Vk.
So, by the boundedness of ?1jwk11J and (AS1), ?C4 > 0 such that
(8.2)			A9ki1ik < c4.
Hence, by (2.40), A9k is bounded.			I
The following set will be useful subsequently:
(8.3)			K %ef			: Dk --H (vk)?2? at the end of Step 1).
We let )CC denote the complementary set of K. It is clear that k E KC if and only if
Dk --H 1vk?21 at the end of Step 1.
34
THEOREM 8.3. Suppose wk and are computed by (2.33) and (2.35) for all k
sufficiently large. Suppose ?&.) c KC and xki x*. Then (3.21) holds and for all
k5 sufficiently large,
(8.4)			= vtki? ViEA*,
where vk is defined by (5.1)
Proof First, assume that zero is a limit point of ?jkj ?, where jk is defined by
(5.6). Without loss of generality, assume jkj 0. Then
(8.5)			vkj.*9kj ??
By the boundedness of ?1wk11J and (1.14), both fwkj?11 and ?vtk)l have at least one
limit point. Assume wk>?1 w* and vtkJ v*. Then
(8.6) v*. * g* = 0, where 9* def Hx* + C + ATw*.
So by (1.15), v*. *9* = 0 which implies that w* = W?s by (AS2). Hence g* = 91*s and
(8.7)			v*.*91*s=v* *9ts=O
Now assume that fiki ? is bounded away from zero. Then for some e > 0,
(8.8)			jk, > e, Vkj.
Since ? E Kc, by the convergence of fq(xk)y, (3.2), (5.7), and by the acceptance test
in Step 1, we have
(8.9)			q(xki)?q(xki+l) >			?2?ki(?,kJpgki)
>			?2min(r3,?4e)(??ki(p?))			0.
So, by (3.3),
(8.10)
((yki)T]?kJykJ + A?)(?9ki)2 0.
Considering Lemma 8.2, we see that ?(yk?)TMkJykJ + A9kJII is bounded. So (2.38) and
Lemma 8.1 yield that (remember that ADk, z?k, --H 0)
(8.11)			II(zkj)TDkj(HxkJ + c)II = II(zk?)TgkjII
--H			(yk3)Tgk? = ((yki)TMk>yk> + A9kJ)It9kJ			0,
where Dkj --H vhi 121 . Let v* be any limit point of fvkij and assume that vhit v*
Let Z* =limz?hi?. Then
(8.12)			(z*)TD*(Hx* + c) = 0.
Since < Z >= null(Ab*), ?w* E izm such that
D*(Hx* + c) + (AD*)Tw* = 0,
35
(8.13)
f)*g* = t)*(Hx* + c + ATw*) = 0,
which gives us (8.6) again and consequently, (8.7) is true.
To show (8.4), let v* be any limit point of ?vk>1. Repeat the arguments above,
we have
(8.14)			v*. * gl*s =0.
Then, Vi E A*, vt*. = vt*. = 0. Therefore, for all kj sufficiently large, (8.4) holds.
THEOREM 8.4. Suppose wk and are computed by (2.33) and (2.33) for all k
sufficiently large. Suppose x* is any limit point of fxkj and xkj x*. Then (3.21)
holds. Furthermore, if 5k satisfies (CD3) with Dk --H (vk)-21 or Dk = vk1-21 for all k
sufficiently large, then ?xkJ converges to x* which satisfies (3.48).
Proof. If k5 E K, then 5kj is computed based on --H (vkJ )21, and otherwise,
based on Dkj --H Ivk, I?21. So if there are infinitely many kj E Kc, then by Theorem 8.3,
(3.21) holds. Otherwise, by (5.7) and Corollary 3.3, (3.21) holds also.
To see the convergence of ?xky, we first show that (3.55) is true. In fact, since 5k
satisfies (OD3) with Dk --H (vk)-2, or Dk = IvkI21 for all k sufficiently large, (3.55) is
obviously true for those k E K and is also true for those k E kC by (8.4). Note that
the conditions used by Lemma 3.4, Lemma 3.5 and Theorem 3.6 are only (3.21) and
(3.55), we see that ?xk1 converges to x* which satisfies (3.48). I
Note that the convergence of fxkj does not depend on whether Dk --H (vk)-21 or
Dk --H vk1-21 for the trial Dk in Step 1. This flexibility gives us freedom to choose Dk
with performance in mind.
The next results establish the convergence of fwkl, and the convergence of ?xk1
to a first-order point.
COROLLARY 8.5. The sequence ?wkj converges to w?*? under the assumptions of
Theorem 8.4.
Proof. Let w* be any limit point of fwkJ and assume that ?wkiy w*. If there
are infinitely many kj E K, then w* = Wi*s Otherwise, for all kj sufficiently large,
kj E KC and by (2.33),
(8.15)			wkj =			(AIvk,IAT)?lAIvk,I(Hxki + c).
Let v* be any limit point of fvk31 and assume that vkji v*. Then by (8.14),
(3.6) and (8.15), we have
(8.16)			w75 =			(AIv*IAT)?lAIv*I(Hx* + c)
--H			lim wk,? = w*
36
LEMMA 8.6. If 5k satisfies (3.60) for all k E K sufficiently large, the limit point
(X*,Wi*s) satisfies (1.4).
Proof. If there are infinitely many k E Kc, say ??Y c Kc, then gki g1*8 by
Corollary 8.5. Let v?* be any limit point of ?vkj1. By Theorem 8.3, v*. * g?*8 = 0 and
ViE A*, Vt* = V?t* = 0. So v*. * g1*? = 0. That is (1.4).
If k E K for all k sufficiently large, then we are identically using Algorithm
Interior-Newton1, and, by Corollary 3.7, the result holds as well. I
Comparing Lemma 8.6 to Lemma 3.7, we see that Corollary 8.6 is stronger since
(3.60) needs to be satisfied only by those 5k with k E K to get a first-order limit
point. As indicated in Lemma 3.8, one possible strategy to guarantee first-order
convergence is to replace the direction 5k with the scaled gradient p9k whenever (3.60)
fails. Alternatively, satisfaction of (CD4) will yield first-order convergence, and more.
The following result, with proof similar to the proof of Theorem 3.12, establishes
convergence to a second-order point.
THEOREM 8.7. Suppose wki5 computed by (2.33) and ?kpk satisfies (CD4) for
all k sufficiently large. Further, assume ?xk1 converges to a point x* that satisfies
(3.21). Then x* satisfies (1.5).
It is easy to verify that all the remaining results in Sections 3 and 4 for Interior-
Newton1 also hold for Interior-Newton2. To sum them up, we have the following
results:
THEOREM 8.8. Suppose for all k sufficiently large, wk and p,k are computed
by (2.33) and (2.35), 5k satisfies (3.60) and (CD3) with Dk --H (vk)-21 or Dk --H
1vk1-21, and ?kpk satisfies (cD4). Then the sequence ?xk1 generated by Algorithm
Interior-Newton2 converges to a point x* which satisfies the first-order and second-
order necessary optimality conditions. Further more, if H > 0 in Ar*, then x* is a
solution to (1.1).
The next result establishes that the point of convergence is a second-order point;
moreover, the rate of convergence is at least 2-step quadratic. (Recall that (4.47)
implies a 2-step quadratic convergence rate.)
COROLLARY 8.9. Suppose for all k sufficiently large, w? and p2k are computed by
(2.33) and (2.35), 5k = 0k?kpk + (1 --H ?k)a9kp??, where a? is defined by (3.75) and a?,
pk and a9k are defined by (2.27), (2.13) and (2.41). Then the sequence ?xkJ generated
by Algorithm Interior-Newton2 converges to a point x* which satisfies the first-order
and second-order necessary optimality conditions. Further more, if H > 0 in Ar*, then
?xk) satisfies (4.47) and x* is a solution to (1.1).
Our final result in this appendix is to show that, ultimately, only one computation
is needed for wk, zk, and p9k in each iteration. Clearly, this is true if we can establish:
(8.17) The trial Dk is unchanged in Step 1 for all k sufficiently large.
37
THEOREM 8.10. Suppose wk and p9k are computed by (2.33) and (2.35). Suppose
xk ?? andil>OinAt*. Then (8.17) holds.
Proof First, we show the following:
(8.18)			?co > 0 such that a2k > 00, Vk.
In fact, by (8.4), whether Dk --H (vk)21 or Dk --H jvk121 we always have that
Dtki = (v?)21, Vi E A*, Vk sufficiently large.
If (8.18) is false, then repeat the arguments between (3.19) and (3.20), note that Il91k:11
is bounded, we would have
(8.19)
?DkJg1k?J ___
k
But on the other hand, since AD2gj? = 0, ?? E ?n-m such that Dyis = Z?. Hence
(8.20) yTMy = ;?g$jj29TsD2MD2?t5 = ?TzTM?z??
II?II2
where y is defined by (2.34). By Lemma 3.9, (2*)TM*z?* > 0. So ?e > 0 such that
(8.21)			(yk)TMkyk > e
Then, using (3.17), we have
j??kjgjk8i
I'tgkjI --H
which is a contradiction to (8.19). Hence, (8.18) is true.
Now, to show (8.17), it suffices to show that ?bk(a2kp9k)/bk(p9k)? is bounded away
from zero since jk 0. This is true because it is easy to see that (p9k)Tgk < 0, so
02k(p9k)Tyk < (ogk)2(pgk)Tyk since 09k < 1, and then, by (8.18),
(8.22)
for all k sufficiently large.
?bk(c9kp,k) > (agk)2?k(p9k)
?k(p9k)			?k(p9k)			> 0o2
38
REFERENCES
[1] E. M. L. Beale, On quadratic programming, Naval Research Logistics Quarterly, 6 (1959), pp.
227-243.
[2] P.H. Calamal and J. J. More', Projected gradient methods for linearly constrained problems,
Mathematical Programming, 39 (1987), pp. 93-116.
[3] T. F. Coleman and L. A. Hulbert, A direct active set algorithm for large sparse quadratic pro-
grams with simple bounds, Mathematical Programming, 45 (1989), pp. 373-406.
[4] T. F. Coleman and Y. Li, On the convergence of reflective Newton methods for large-scale nonlin-
ear minimization subject to bounds, Tech. Rep. TR 92-1314, Computer Science Department,
Cornell University, 1992.
[5] T. F. Coleman and Y. Li, An interior trust region approach for nonlinear minimization subject
to bounds, Tech. Rep. TR 93-1342, Computer Science Department, Cornell University, 1993
[6] R. W. Cottle and G. B. Dantzig, Complementary pivot theory of mathematical programming,
Linear Algebra AppI., 1(1968), pp. 103-125.
[7] J. E. Dennis, Jr. and R. B. Schnabel, Numerical methods for unconstrained optimization and
nonlinear equations, (Prentice-Hall, Englewood Cliffs, NJ, 1983).
[8] I. I. Dildn. Iterative solution of problems of linear and quadratic programming. Dokiady
Al:ademii Naul: SSSR, 174:747--H748, 1967. Translated in : Soviet Mathematics Dokiady,
8:674--H675, 1967.
[9] R. Fletcher, A general quadratic programming algorithm, Journal of the Institute of Mathematics
and Its Applications, 7 (1971), pp. 76-91.
[10] P. E. Gill and W. Murray, Newton type methods for unconstrained and linearly constrained
optimization, Mathematical Programming, 7 (1974), pp. 311-350.
[11] D. Goldfarb, Extensions of Newton's method and simplex methods for solving quadraticprograms,
in Numerical Methods for nonlinear Optimization, F. A. Lootsman, eds., Academic Press,
London, 1972.
[12] N. Karmarkar, A new polynomial time algorithm for linear programming, Combinatorica, 4
(1984) pp. 373-395.
[13] J. Liu, Interior and Exterior Newton Methods for Quadratic Programming, Ph.D. Thesis, Center
for Applied Mathematics, Cornell University, Ithaca, NY, 1994.
[14] J. J. More' and D. Sorensen, Computing a trust region step, SlAM Journal on Scientific and
Statistical Computing, 4 (1983), pp. 553-572.
[15] J. J. More' and G. Toraldo, Algorithms for bound constrained quadratic programming problems,
Numerische Mathematik, 55 (1989), pp. 377-400.
[16] K. G. Murty and 5. N. Kabadi, Some NP-complete problems in quadratic and nonlinear pro-
gramming, Mathematical Programming, 19:200-212, 1987.
[17] P. M. Pardalos, Y. Ye, and C. G. Han. An interior point algorithm for large-scale quadratic
problems with box constraints. In A. Bensoussan and J. L. Lions, editors, Analysis and
Optimization of Systems : Proceedings of the 9th International Conference, Antibes, France,
1990, volume 144 of Lecture Notes in in Control and Information Science, pages 413--H422.
Springer Verlag, Berlin, West-Germany, 1990.
[18] M. 3. D. Powell, A new algorithm for unconstrained optimization, in Nonlinear Programming,
J. B. Rosen, 0. L. Mangasarian and K. Ritter, eds., Academic Press, New York, 1970, pp.
31-65.
[19] G. A. Schultz, R. B. Schnabel, and R. H. Byrd, A family of trust-region-based algorithms for
unconstrained minimization with strong global convergence properties, SlAM Journal on Nu-
merical Analysis, 22 (1985), pp. 47-67.
[20] D. Sorensen, Trust region methods for unconstrained optimization, SlAM Journal on Numerical
Analysis, 19 (1982), pp. 409-426.
[21] S.A. Vavasis, Nonlinear Optimization: Complexity Issues, (Oxford University Press, 1991)
[22] 5. J. Wright, An interior-point algorithm for linearly constrained optimization, SlAM Journal
on Optimization, 2 (1992), pp. 450-473.
[23] Y. Ye, An extension of Karmarkar's algorithm and the trust region method for quadratic pro-
gramming, in: N. Megiddo, ed., Progress in Mathematical Programming (Springer, New
York, 1989).
[24] Y. Ye, On affine scaling algorithms for nonconvex quadratic programming, Mathematical Pr?
gramming, 56 (1992) 285-300.
39
