BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1342
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: An Interior Trust Region Approach for Nonlinear Minimization 
        Subject to Bounds
AUTHOR:: Coleman, Thomas F. 
AUTHOR:: Li, Yuying
DATE:: May 1993
PAGES:: 28
ABSTRACT:: 
We propose a new trust region approach for minimizing a nonlinear function 
subject to simple bounds. By choosing an appropriate quadratic model and 
scaling matrix at each iteration, we show that it is not necessary to solve a 
quadratic programming subproblem, with linear inequalities, to obtain an 
improved step using the trust region idea. Instead, a solution to a trust 
region subproblem is defined by minimizing a quadratic function subject only 
to an ellipsoidal constraint. The iterates generated by these methods are 
always strictly feasible. Our proposed methods reduce to a standard trust 
region approach for the unconstrained problem when there are no upper or lower 
bounds on the variables. Global and quadratic convergence of the methods is 
established; preliminary numerical experiments are reported.
END:: CORNELLCS//TR93-1342
BODY::
An Interior Trust Region Approach for
Nonlinear Minimization Subject to Bounds
Thomas F. Coleman
Yuying Li*
TR 93-1342
May1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*Research partially supported by the Applied Mathematical Sciences Research
Program (KC-04-02) of the Office of Energy Research of the U.S. Department of
Energy under grant DE-FG02-86ER25013.A000, and in part by NSF, AFOSR, and
ONR through grant DMS-8920550, and by the Advanced Computing Research
Institute, a unit of the Cornell Theory Center, which receives major funding from the
National Science Foundation and IBM Corporation, with additional support from New
York State and members of its Corporate Research Institute.
An Interior ?ust Region Approach for
Nonlinear Minirnization Subject to Bounds
Thomas F. Coleman and Yuying Li1
Computer Science Department
Cornell University
Ithaca, New York 14853
April 28, 1993
Abstract. We propose a new trust region approach for minimizing a non1inear function subject to simple
bounds. By choosing an appropriate quadratic model and scaling matrix at each iteration, we show that
it is not necessary to solve a quadratic programming subproblem, with linear inequalities, to obtain an
improved step using the trust region idea. Instead, a solution to a trust region subproblem is defined by
minimizing a quadratic function subject only to an ellipsoidal constraint. The iterates generated by these
methods are always strictly feasible. Our proposed methods reduce to a standard trust region approach
for the unconstrained problem when there are no upper or lower bounds on the variables. Global and
quadratic convergence ofthe methods is established; preliminary numerical experiments are repofled.
1 Research partially supported by the Applied Mathemalical Sciences Research Program (KC-O4-02) of the Office of Energy
Research of the U.S. Departtnent of Energy under grant DE?CO2-86ER25013AOOO. and in part by NSF, AFOSR. and ONR
through grant DMS-892o55o, and by the Advanced Computing Research Institute, a unit of the Cornell Theory Center which
receives major funding from the National Science Foundation and IBM Corporation, with additional support from New York
State and members of its Corporate Research Institute.
1. Introduction. In this paper, we consider the problem of locating a local minimizer of a smooth
nonlinear function subject to bounds on the variables:
(1.1)			:??1? f(x) 1< x <t,
where 1 E ?? U (--HooJJ?, t' E ?? U ?ooiJ?, 1 < i, and f : ??. We denote the feasible set
def			def
= ?x: 1< x <t? andthestrictinteriorint(F) = ?x: 1< x < tJ.
We propose a new stricfly feasible trust region approach for problem (1.1). Global convergence to a
second-order point is achieved, under reasonable assumptions, and a local quadratic convergence rate is
also obtained.
Minimization problems with upper and/or lower bounds on some of the variables form an important
and common class of problems. There are many algorithms for this type of optimization problem, some
of which are restricted to quadratic (in some cases convex quadratic) objective functions and some are
more general (e.g., [1,4,5, 7, 8,9, 10, 12, 14, 15, 16,20,21,22,26]). Almost all ofthe existing methods
for problem (1.1) are active set methods.
Trust region methods form a respected class of algorithms for solving unconstrained minimization
problems. Their high regard is partially due to their strong convergence properties, partially due to their
naturalness, and partially due to the recent development of reliable, efficient software.
The ideabehind atrust regionmethod torminx??n f(x) is very simple. The increment sk = Xk+1--H Xk
is a solution to a quadratic subproblem with a bound on the step:
(1.2)
min??(s)d=ef Ts + 1			: I?DksII2 < Aki.
jE&?			9k			?2sTBks
Here Bk is a symmetric approximation to the Hessian matrix ?f(xk), r)k is a scaling matrix and Ak is a
positive scalar representing the trust region size.
A general scheme for unconstrained minimization of f(x) is described in Figure 1. An iteration with
P?k > ? is said to be succes?uL Otherwise, the iteration is unsucces?uL The aim of trust region size
updatingistolorcep1? > jand hence ensure sufficient reduction of the objective function.
Computing a solution to the trust region problem (1.2) in a reliable and efficient way is a nontrivial
task. There are several papers on this topic, e.g., [2, 3, 7, 8, 13, 19, 23, 24, 25].
Trust region methods have also been developed for the solution of linearly constrained optimization
problems (e.g., [9] and [11]). A quadratic trust region subproblem with linear inequalities is usually
approximately solved to obtain an improved point [9]. An iterative procedure must to be used to solve the
subproblem. For example, Fletcher [11] proposed an algorithm for the linearly constrained optimization
subproblem
min?f(x) : ETx < dJ
in which the subproblem is of the form
1T
(1.3) min fvf(:k)Ts + --Hs Bks : ET(xk + s) < d, IIDksII < AkY.
sE&?			2
2
Algorithm 0: Unconstrained mist Region Method
Fork =0,1,..
1. Compute J(xk) and the model `kk.
2. Define an ayProximate solution Sk to subproblem (1.2).
3. Compute Pk = (f(xk + sk) --H f(xk))/?k(sk).
4. If P1k > ? then set Xk+1 = Xk + Sk. Otherwise set Xk+i = Xk.
5. Update the model `[`k, the scaling matrix ?k and Ak.
Updating mist Region Size
Leto <? < ?` <1 and? <1 <?begiven
I. 1?P?k <!thensetA?+1 E (0,?Ak].
2. If P1k E (i,?)thensetA?+1 E [?Ak,Akj.
3. If P?k > ?thensetA?+1 E [Ak,?Aki.
flo. 1. TruslRegwnMethodfor UnconstrainedMini,nizatton
As pointed out in [18], convergence theory for trust region methods based on quadratic programming
subproblems --H such as (1.3) --H usually require that the computed trial step be a global solution to the
subproblem. However, the subproblem is usually solved by methods which guarantee local optimality
at best. Therefore, there is a mismatch between theory and practise for trust region methods based on
quadratic programming subproblems (with linear inequalities).
In this paper, we propose trust region methods for (1.1) without the need to solve a general quadratic
programming subproblem at each iteration. These methods are related to the line search based reflection
methods proposed by Coleman and Li [7] --H scaling strategies and requlrement of strict feasibility are
common to both approaches. The primary difference is that in [7] a line search based method is proposed,
along with a "reflection strategy to guarantee sufficient descent, whereas here we propose a pure trust
region method.
The proposed methods are developed by forming a quadratic model with an appropriate quadratic
ftinction and scaling matrix such that there is no need to handle the constraints explicitly. In particular, it is
possible to obtain an approximate trust region solution which can guarantee second-order convergence by
simply solving an unconstrained trust region subproblem and then satisfying the feasibility requirement
through further restricting the step if necessary. Our proposed methods become standard trust region
algorithins for unconstrained minimization when 1 = --Hoo and t' = +oo. Moreover, all the convergence
proofs essentially reduce to established proofs for the unconstrained trust region approach when 1 = --Hoo
and t = +oo.
We motivate the method in 2 and establish the convergence in 3. In 4, preliminary numerical
results for small dense problems are presented. The computational investigation of the method for large
problems will be presented in a subsequent paper.
2. mist Region Method for Bound-Constrained Pi?bIem? In this section, we propose a trust re-
gion method for bound-constrained problems by choosing an appropriate scaling matrix flh and quadratic
3
model ?k(x). We motivate our choice by examining the optimality conditions for (1.1).
First, we define a vector flinction v(x): ?fl ?? as follows.
DEFINmON 1. The vector v(x) E ?? is defined:
(i). IfVf(x)i <Oandu? <oo thenv? def Xj --H `Li.
(ii). ffVJ(x)?>?Oandl? > --Hoothenv? d=d xj?4.
(iii). ffvAx)i<oandui=oothenvi `? --H1.
(iv). ffvAx)i >Oand 4=--Hco then Vj def
Following Matlab notation, for any s E ?? diag(s) denotes an n-by-n diagonal matrix with the
vector s defining the diagonal entries in their natural order. In addition, we define
(2.1)			D(x) def diag(iv(x)i??)
i.e., D-2 is a diagonal matrix with the jth diagonal component equal to Ivi(x)I.
Optimality conditions for problem (1.1) are well-established. Assuming feasibility, first-order nec-
essary conditions for x* to be a local minimizer are:
(2.2)			first order:
Vf(x*)i = 0 ifli < (x*)j < flj?
Vf(x*)i < 0 if(x*)j = nj,
Vf(x*)i >0 if(x*)j = ii.
Equivalently, D*-2g* = 0. Second-order conditions involve the Hessian matrix of f, H = H(x) def
V2f(x). Let Free* denote the set of indices corresponding to "free" variables at point x*:
Free* = fi: ii < (x*)j < ni).
Second-order necessary conditions can be written2: if a feasible point x* is a local minimizer of (1.1)
then D*?2g_ --H 0 and H*Free* > 0 where H*Free* is the submatrix of H* = H(x*) corresr)ondin? to tne
index set Free*
These conditions are necessary but not sufficient. Sufficiency conditions that are achievable in
practise often require a nondegeneracy assumption. This is the case here.
DEFINI?oN 2. A point x E ? is nondegenerate if. for each index i:
(2.3)			Vf(x)i = 0 ? 4 < Xj < Uj.
A problem (1.1) is nondegenerate tf(23) holdsfor every x E ?
2 Notation: if a matrix A is a symmetric matrix then we write A > 0 to mean A is positive defirnte A > 0 means A is
positive semidefinite.
4
With this definition we can state second-order sufficiency conditions:
point x* satisfies (2.2) and ji*Free* > 0, then x* is a local minimizer of(l.l).
Similar to [6], we consider the following diagonal system ofnonlinear equations:
(2.4)			D(x)?2Vf(x) = 0.
Fe??ih1
1 a nonciegenerare ????e
It is easy to see that system (2A) is an equivalent statement ofthe first order necessary conditions. System
(2A) is continuous but not everywhere differentiable. Nondifferentiability occurs when Vj = 0; we avoid
such points by restricting Xk E int(?). Strictly speaking Vj may nor De differentiable at a point wh??
Vfj = 0; however, D?2(x)Vf(x) is continuous at such points. Moreover, Coleman and Li [6] show that
it is easy to define a Jacobian matrix at such points to allow for a second-order Newton process.
Assume that Xk E int(Y). A Newton step for (2A) satisfies
(2.3)			(Dk?2Hk + diag(Vf(x?))J??)d? = Dk?2Vf(xk),
where Jv(x) E ?nxn plays the role of the "Jacobian" matrix of Iv(x)I. If all the components oft and tt
are finite, we define J? = diag(sgn(Vf)). If variable Xj has a finite lower bound and an infinite upper
bound (or vice-versa) and Vfj = 0, we define JtVi = 0 at such a point
def			def
In our presentation, B(x) is an approximation to H(x) = ?f(x) and g(x) = Vf(x). Based on the
Newton step for system (2A), we define our quadratic model in the same way as in [6]:
(2.6)			??(s)??sTg?+?2lsTMks
where
def
C(x) = D(x)diag(g(x))Jv(x)D(x),
def
M(x) = B(x) + C(x).
Note that C(x) is a positive semidefinite diagonal matrix.
Define
9k d=ef Dk19k = diag(?v??21)g?,
def			--H1--H1			o+?o+?
Mk = kkk = iagv???1agvkt +diag(9?)j?V,
`kk(w)d=efgkTw+IwTMkw.
2
The following lemma can be easily proved.
LEMMA 1. Assume that x* E J and B(x*) =
(a) 9* = 0 ifand only if(2.2) is satisfied.
(b) M* ispositive definite and 9* = 0 ifand only ifthe second order sufficiency conditions
are satisfied at x*.
5
(c) M* is positive semi-definite and 9* = 0 ?f and only ?f the second order necessary
conditions are satisfied.
to
(2.7)
The results of Lemma limply that x* is a local minimizerof(l.l) if and only if w = 0 is a solution
n??n???(w): 11w112 <Ak?
where xk = x*. Hence solving the subproblem (2.7) is a reasonable step to attempt when Xk is not a
local minimizer. Let s = Dk?' w. Subproblem (2.7) is equivalent to the following problem in the original
variable space:
(2.8)			min??k(s) : IIDksII2 < AkJ.
Moreover, in the neighborhood of a local minimizer, the Newton step to (2A) is a solution to the trust
region subproblem (2.8) if the trust region size Ak is sufficiently large.
The purpose of the scaling matrix Dk in (2.8) is distinctively different from that in an unconstrained
trust region method. The scaling matrix Dk measures the distance to the boundary of the feasible region.
Its purpose is to prevent a step directly towards a boundary point. In contrast, a scaling matrix used in the
unconstrained trust region method is usually employed for numerical reasons: the scaling matrix helps to
improve the conditioning of the problem.
If a bound?nstrained problem (1.1) is badly scaled, the subproblem (2.8) can be replaced by
min???(s) : IIDkflksII2 < AkJ,
where Dk is chosen to improve the scaling and is a diagonal matrix with the propefly that ?Dk?'Y is
bounded and ?Dk) is unifonnly bounded. To emphasize the role of the scaling matrix Dk, we assume
that Dk = 1 in our presentation.
Next we illustratethat it is possibleto develop trust region methods forthe bound-constrained problem
(1.1) based on (2.8). First we introduce a few notations and assumptions.
In our presentation, Pk denotes a solution to (2.8). Assume that dk E ??. The scalar 0Lk denotes the
stepsize along dk to the boundary:
(2.9)			= min?maxf1?--HXki flj--H Xki J :1 <i <
dkj `			dki
If problem (1.1) is unconstrained, i.e., I = --Hoo and u = oo, we define ak = +oo. We use ?k*[dki to
denote the minimum value of `kk(s) along the direction dk within the feasible trust region, i.e.,
(2.10) `k*[dk] det ?k(sk*) = min???(s) : s = rdk, IIDksII < Ak, Xk + s E ?).
Since we always require Xk E int(?), a possible step-back may be necessary to stay strictly feasible.
We use a?*[d?] to denote the step obtained from dk with a possible step-back. The exact definition
6
of a?*[d?] is given below. Let r?* denote the minimizer along dk within the feasible trust region, i.e.,
Tk* = ar?pnin???(rd?) : IIrDicdkII < Ak,xk + Tdk E FJ, 0k E [0i,lJ for some 0 < Ot < 1 and
--H 1 = O(IIdkII). `Then
(2.11)			def O?r*d? def			Th*dk			ifrk + Tk*dk E int(?),
Okrk*dk otherwise.
The above definition implies that 0k = 1 if Xk + rk*dk E int(?).
A few assumptions:
(AS.l)
(AS.2)
(AS.3)
Given an initial point xo E ?, it is assumed that  is compact, where fl is the level set,
i.e.,  = fx: x E Y and f(x) < f(x0)J.
There exists apositive scalar XB such that IIBkII < XB for all k.
There exists a positive scalar x9 such that for x E , II?II? <Xg
Assumption (AS.2) is also required in the convergence analysis of trust region methods for uncon-
strained problems. Assumption (AS.l) is needed for the boundedness of the scaling matrix fDk?1 1
Condition (AS.3) is a weak assumption and is satisfied if the grnlient Vf(x) is continuous on fl. As-
sumptions (AS.l) and (AS.2) imply that there exist positive scalars XD, XM such that
IIDk?'II < XD, IIMkII < XM
Note that ?MkJ is unbounded in general.
Next we will present two trust region algorithms for the bound?constrained problem (1.1). The first,
called the double-trust region method, is theoretically interesting. It illustrates the essential ideas. The
second method represents a more efficient approach.
2.1. The Double-Thust Region Method. Our objective is to develop a trust region method for (1.1)
based on the trust region subproblem (2.8): a solution Pk to the trust region subproblem (2.8) is obtained
and then truncated, i.e., Sk = ak* [p?], to ensure strict feasibility.
The essential idea of trust region methods is to use the trust region size to ensure sufficient decrease
of the objective flinction. Consider the trust region approach to unconstrained problems: the trust region
size is updated to ensure that the reduction of the nonlinear flinction f(x) is at least a fraction of the
reduction of the quadratic model within the trust region. Specifically, the updating ofthe trust region size
forces the condition
f _ f(xk+sk)--Hf(xk)
Pk --H			?k(sk)
for some constant! > 0. (We use the superscript f to emphasize the dependence on f(x).) To obtain first
order convergence of unconstrained trust region methods, a sufficient reduction of the quadratic model
`[;k within the trust region is guaranteed if
`kk(8k) < Pmin???(s): = rD?Tg?, IIDksII < Ak), IIDkskII < PoAk
7
(2.12)
for some constants P, ? > 0. In our notation, (2.12) is the same as
(2.13)			?k(sk) < P?k*[?Dk?Tgk], IIDkskII < ?k
Since our quadratic model `t)k(s) is defined to include the constraint infonnation, a natunil extension
of the definition of pk?is given by
J def f(xk+sk)--Hf(xk)+?skTC(xk)sk
Pk =			`kk(sk)
Similar to unconstrained trust region methods, Pk? measures the agreement between the nonlinear flinction
f(x) and its quadratic approximation. As we will prove from Lemma 2 in 3, for our trust region
approaches, if sk satisfies
(2.14)			v,k(sk) <P??*[--HD?2g?],
the trust region model is sufficiently reduced for first order convergence.
Unfortunately a truncated step along an exact solution Pk of (2.8) may not sufficiently reduce the
quadratic model `kk, due to the effect oftruncation, i.e., (2.14) is not guaranteed when 8k = Qk*tPk] and Pk
solves (2.8). However, a step along the scaled steepest descent direction --HD??2g? does produce enough
reduction of `kk(s). Since, for any trust region subproblem with nonzero gradient, the trust region solution
approaches the gradient direction if the trust region size is reduced to zero, the trust region size can be
used to ultimately guarantee sufficient deerease. In particular, the trust region size updating can be used
to force the condition
pCk =			`t)k(sk)
4,k*[--HDh?2gk]
> ?.
t.e., ?k(sk) <P'k?*[--HD?2g?].
Hence, we can adjustthe trust region size so that both the quadratic model flinction ?k(s) and the nonlinear
ftmction f(x) are sufficiently reduced. This gives us the trust region algorithm described in Figure 2. We
call it the double4rust region method because the trust region size is adjusted for both nonlinearity and
feasibility. For this method, an iteration is succes?ul if both Pk1 > ? and pCk > T). Otherwise, an iteration
is unsuccessful.
In 3, we will prove that the double-trust region method has reasonable convergence propenies under
the nondegeneracy assumption.
2.2. A Practical Trust Region Method. In the last section, we proposed a double-trust region
method for bound-constrained problems (1.1) through solving an ellipsoidal trust region subproblem
(2.8). In this algorithm, sufficient decrease of the quadratic model is achieved through monitoring the
ratio pkC and adjusting the trust region size accordingly. Since pkC is determined by `kk(sk), an exact
solution to the subproblem (2.8) is assumed in order for pCk to be reliable. However, for large problems,
the assumption that 8k be in the direction ofthe exact trust region problem (2.8) is impractical. Moreover,
the convergence of the method is established under the assumption that problem (1.1) is nondegenerate.
In this section, we suggest a more practical model algorithm.
8
Algorithm 1: The Double-Thust Region Method
x0 E int(?)
Fork =Q1,??
1. Compute f(xk), gk, Hk, and C?; Define the quadratic model `t;k(s) =
gkTs + zlsT(Hk + Ck)s.
2. Compute Pk? a solution to (2.8).
3. Compute
8k =
pCk --H			?k(sk)
`kk*[--HDk?2gk]'
j _ f(xk+sk)--Hf(xk)+zlsTkC(xk)sk
Pk --H			`kk(sk)
4. IfP?k > ?andpkc > Pthensetx?+1 = Xk+Sk. Otherwise set Xh+1 = Xh.
5. Update the model ?k? the scaling matrix Dk and Ak as specified.
Updating Ak for the Double-Thust Region Method
Leto < i,P < fl <1 and? <1 <?begiven
1. ifp'k <ithensetAk+1 E (0,?Ak].
2. if Pk? E (it, ii) then set Ak+i E [?Ak,Ak].
3. ??P?k > ?7then
ifpck > ThsetAk E [Ak,?Ak],
if? < pkC < ?, set Ak E [?Ak,Ak],
ifpkc < ?, setA? E (0,7iAk].
FIG. 2. Doubte?Thi&st Region Methodfor Minimization Subject to Bound?
It is clear that sufficient reduction of the quadratic model within the feasible trust region is not
difficult to achieve: for example, moving along the scaled gradient D??2g? guarrntees this. If we assume
the availability ofa step which sufficiently decreases the quadratic fiwction within the feasible trust region,
the trust region size is only needed to force the condition P1k > it
In Figure 3, we describe a trust region method for bound-constrained problems in which the trust
region size is primarily updated according to P?k However, we have allowed more freedom than usual in
the adjustment of Ak to permit further reduction in Ak thus encouraging the use of the trust region step
(2.8).
In order to satisfy the first order necessary conditions, given two positive constants P and Po it is
required that the approximate trust region solution 8k satisfy
(ASA) `t)k(sk) <P'k?*[--HD??2g?j
IIDkskII < PoAk, Xk + sk E int(J).
In other words, the condition pkc > P is assumed. More explicitly, we require that ?k(sk) to be less
9
Algorithm 2: A More Practical Model
x0 E int(?)
Fork =%1,..
1. Compute f(zk), 9k' Hk, and C?; Define the quadratic model `t)k(8) =
gkTs + 21J(Hk + Ck)8.
2. Compute Sh, based on (2.8), such that Xk + Sk E int(?).
3. Compute
P?k = f(xk+sk)--Hf(xk)+2IskTC(rk)sk
?k(sk)
4. If P?k > ?then set Xk+i = Xk + 8k Otherwise set rk+i = Xk.
3. Update the model `t;k, the scaling matrix Dk and Ak as specified.
Updating Trust Region Size Ak
Leto <? <71< 1,? <1 <?ando < Atbegiven
1. If P?k <ithensetA?+1 E (0,?Ak].
2. If P?k E (i, 71)then set Ak+i E [?Ak,Ak].
3. Ifpf > 71then
ifAk > At then
set Ak+i E either [?Ak,Ak] or [Ak, ?2Ak],
otherwise,
set Ak+i E [Ak, ?Ak].
FIG. 3. TrustRegion Methodfor Minimzotion Subject to Bounit
than a fraction ofthe minimum of `t;k(s) along the scaled gradient --HDk29k withinthe feasible trust region.
We point out that condition (ASA) is satisfied for every successfiil step of Algorittirn 1. For Algorithin 2,
an iteration is successfulif the condition P1k > 71holds. Otherwise, an iteration is unsuccessfuL
Condition (ASA) can be easily satisfied for 0 < P < 1. Let Wk be the solution to min[?(s) D2s --H
z'9k, IIDksII < Ak, Xk + s E FJ. Then sk = wk satisfies (AS.4) except for the possible violation
of xk + sk E int(?). Assume that Xk + wk ? int(?). Since ?k(8) is continuous, a small step?ack
Sk = Ow?where0 <0< lcanensureboththeconditionx?+s? E int(?)and?(s?) <P?[--HDk?29k].
Assumption (ASA) will not necessarily guarantee a solution at which the second order necessary
condition is satisfied. To achieve this we make the following stronger assumptions on the quadratic model
and the approximate solution:
(AS.5)
(AS.6)
= vf(xk)Ts+ ?2IsT(?f(xk)+C(xk))s.
Assume that Pk is a solution to ???sE&n fV'?(s): IIDksII < Ak) and p? and p0qaretwo
positive constants. Then Sk satisfies ?k(sk) < ????*[p?j, IIDkskII < p0???, Xk + Sk E
int(?).
Since both conditions (ASA) and (AS.6) can be satisfied by simply solving a quadratic trust region
to
subproblem min, f'k?(s);IIDksII < Ak1? it is not necessary to solve a quadratic programming subproblem
to achieve convergence. For example, one can first compute a solution Pk to the following unconstrained
trust region problem
min???(s) : IIDksII < Aki,
and then choose sk so that ?k(sk) is the minimum of the values `kk*[PkJ and `k?*[--HDk?2g?]. However,
requirements (ASA) and (AS.6) are not restrictive. There are many ways of computing such approxima-
tions. As another example, one can consider the reflection techniques used in [6]. It is also possible to
have a subspace adaptation of this trust region approach in which low?imension subspace trust region
problems are solved. We leave investigation of these computational issues to a subsequent paper.
Before we study the convergence properties of the two trust region methods proposed, we make the
following important observation. If we assume that! = --Hoo and t = +oo then C(x) = 0 and D(x) = I
and the quadratic model is the same as that for unconstrained problems. Moreover, the conditions (ASA)
and (AS.6) are the same as the conditions required for unconstrained trust region methods (e.g., [18])
since the feasibility constraints are always satisfied.
3. Convergence Pi?)pertieL The convergence proofs forthe double-trust region algorithm in Figure
2, and the practical algorithm in Figure 3, follow the same main steps. Lemmas 3 and 4 make it possible to
present the proofs for both algorithins simultaneously in a clean fashion. The major results are Theorems
5,6,11, and 12.
The main difference between the double4rustregion method and thepractical method is that condition
(ASA) is satisfied through the ratio phC for the first while assumed by the latter. However, a common
propefly of the two methods is that, for any successful step, (ASA) is satisfied. Moreover, condition
(AS.6) is always satisfied for the double-trust region method.
The convergence results of the two methods are similar: However, the assumptions required by the
double-trust region method are stronger: For Algorithm 1 we assume mat problem (1.1) is nonclegenerare.
This nondegeneracy assumption is not needed for Algorithm 2.
Notational note: In all expressions to follow, the norm symbol without subscript, ??, refers to the
2-norm.
The following result is required to express (ASA) in a manageable form. It is similar to Lemma (4.8)
in [18].
LEMMA 2. Assume that (AS.1)-(AS.3) are sadsfied. If sk sadsfies (AS.4) then
--H?k(sk) > --H21Plly?llmintA?, IIg?II			119k
llMkll' ll9klloo?
Proof Define? : ?? ?bysetting4 = D?'(jjtji)and
`? `1)kdk].
11
Let r? betheminimizerof?on [O,min?A?,a?J] where ?k isthe firststepsize, along dk, to the boundary:
= min?max? dkXik?? t'j--HXkij :1 < i < nJ.
dki
Since ?k > 0 and the components of 4 have the same sign as that of --Hg?, the absolute value of the
numerator equals Vj , i.e.,
Hence
By definition of ?(r)
= IVkiI _ IvkjIIIg%II
14i1			Vkjllgkjl
IIg%II
IIg?IIoo
= --Hr?IgkII + ?lr2!k?
If Tk* E (O,min?A?,ot?iJthenr?* = IIYkII/!k and thus
1 IIg?II2			1 119k112
--H? !k < --H? IIMkII
def 9kTj?kg?
Ik = IIgk112
If r? = Ak then jkAk < IIg?II and hence
= ?Ak) < --H--H21A?IIg%II.
If Tk* = ?k, ?k?k < IIg?II and hence
= ?(Qk) < ?Ia?IIg?II < 1 Ilyk112
--H? II9kIIoo
Since `t;k(sk) < P?(r*), the result follows from the above estimates.
Assume that Sk is a suceessfiil step from either Algorithm 1 or Algorithm 2. From Lemma 2,
f(xk)--Hf(xk+1) --H
1T
2SkCkSk > --H??(8k)
--H21P?IIg?IIminfA?, \\?I?)II1, 119k
IlYkIlco
Hence, under assumptions (AS.2), (AS.3) and (ASA),
f(xk) --H f(xk+1) > 2lsTkcksk + --H21P?llg?llmintA?, llg%ll 119kllj
XM			X9
Notice that sfCksk > 0. The reduction in f is guaranteed to be better than a multiple of the reduction
achieved in the (negative) scaled gradient direction, i.e.,
f(xk) --H f(xk+1) > --H21P?llg%ll minfA?, 119klI, IlykIl
XM Xg
12
(3.1)
This inequality is important for the convergence proof.
Next, in Theorems 5 and 6, we prove that the first order necessary conditions are satisfied at every
limit point of ?xk1. Several technical results are rtqw.red first
Recall that Pk is a solution to the trust region subproblem (2.8). Using Theorem (3.11) in [18], there
exists a parameter Ak and upper triangular matrix Rk E ???? such that
(A?k + Aki) = RTkRk, (A?k + AkI)Dhph = --Hg?, Ak > 0,
with Ak(Ak --H IIDkpkII) = 0. Equivalently, Pk is the solution to
(3.2) (AkI+ Dk'CkDk1)pk = Dk?2(yk + Bkpk).
LEMMA 3. Suppose that fxk) ts a sequence generated by Algonthm 1. Assume that problem
(1.1) ? nondegenerate, (AS.1) and (AS.2) hold and ?xk) converges. If fAk) converges to zero and
lim ?nfk?oo f IIg?II) > 0 then pkC > 1 for suffictently large k.
Proof Since ?Dkpk) converges to zero and IIPkII < xDIIDkpkII, fPkY converges to zero. By the
assumption that liminfk?oo?IID??19kII1 > 0 and (3.2), it is clear that ?AkY converges to +00. But
?Dk?2(gk + Bkpk)Y is bounded and the problem is nondegenerate. We claim that lim??co ?k = +00
where ak is the stepsize to the boundary of the constraints. This can be easily secn from the additional
fact that, for k sufficiently large, the components of --Hg? and Pk have the same sign if the corresponding
component of the limit point g? is nonzero. Subsequently, 8k = Qk*[Pk] = Pk. Thus, for k sufficiently
large
Hence
min???(s) : s = rPk, IIDksII < Ak, Xk + s E ?J = min???(s) = rpk, IIDksII < Aki.
`kk(Pk) = minf??(s) : s = rp?, IIDksII < Ak, Zk + s E FJ >
and therefore
`kk(sk)			_			?k(Pk)			>1
pCk =			--H `kk*[--HDk?2gk] --H
LEMMA 4. Assume that ?Akj ts updated by Algortthm 2. IfPk1 > TIfor sufficiendy large k then ?Ak1
is bounded awayfrom zero.
Proof By assumption, there exists k such that when k > k, P1k > 1?. We prove, by induction, that for
k>k
(3.3)			Ak > min??iAi,A?J.
First, it is clear that (3.3) is true when k =
13
Assume that (3.3) is true fork, i.e., Ak > minf?iAi,A?J. If Ak <A1, Ak+1 > Ak > min??iA1,A?J.
If Ak > A1, Ak+1 > minf?A1,A?i.
Hence (3.3) is true for all k> k. Therefore (Ak) is bounded away from zero.
The proof of the following theorem is a slight modification ofTheorem (4.10) in [18].
THEOREM 5. Assume that f : ? ts continuously dLfferendable and bounded below on fl,
(AS.1) and (AS.2) hold. ForAlgorithm2, tf(sk) satIsfies (AS.4), then
(3A)			liminfdiag(v?21)y? = 0.
k?oo
For Algortthm 1, (3.4) is true under thefurther assumption thatproblem (1.1) Is nondegenerate.
Proof We need to show that (IIgkI?J is not bounded away from zero. Assume that there is an E > 0
such that 119k II > E for all sufficiently large k. We now show that
(3.5)
ZAk <+00.
k=1
If there are a finite number of successfiil iterations then Ak+1 < ?1Ak for all k sufficiently large and then
(3.5) clearly holds. Ifthere is an infinite sequence (kjj ofsuccessfiil iterations then inequality (3.1) shows
that
ZAki <+00.
i=1
Now the updating rules of Algorithm 1 & 2 imply that
and thus (3.5) holds in this case as well.
Next we prove that (3.5) implies that (lP?k --H 111 converges to zero. First,
IIxk+1 --H xkII < IskIl < xD?()Ak
and hence (3.5) shows that (xk) converges. Now (AS.l) and (AS.2) imply that
l'kk(sk) --H vf(xk)Tsk --H 1T
2skCkskl --H
lgkTsk + 2lsTkBksk --H vf(xk)TskI
21XBYD IIDkskII2.
But IIDkskII < Ak. Therefore
If(xk + sk) --H f(xk) + 2lsTkcksk --H ?k(sk)1 < 21xB4A2k
14
Since Lemma 2 implies that
--H`t)(sk) >?-2PEAk
we readily obtain that ?IPk1 --H 111 converges to zero.
For AIgorithm 2, usingLemma4, fAk1 cannot converge to zero. This contradicts (3.5) and establishes
the result
For AIgorithm 1, using Lemma 3, Ak is not decreased for sufficiently large k. Thus fAkY cannot
converge to zero. This contradicts (3.5) and establishes the result.
The next theorem establishes that ?diag(iv?i? )Vf(xk)J converges to zero. This result is established
despite the fact that ?Dk1 is not uniformly bounded. This may be somewhat surprising, perhaps, since
the analogous convergence result in the unconstrained setting requires the sequence of diagonal scaling
matrices to be uniformly bounded.
THEOREM 6. Assume (AS.1-2) hold and Vf(x) is umformly continuous on . If fxkl is generated
by Algorithm 2 and (AS.4) holdsfor Sk. Then
(3.6)			klL?oo diag(v?z1)Vf(x?) = 0.
Result (3.6) holdsfor Algorithm 1 when problem (1.1) is nondegenerate.
Proof The proof is by contradiction and is the same for AIgorithm 1 and AIgorithm 2. (The
nondegeneracy assumption is needed for AIgorithm 2 because the proofuses Theorem 5.)
Let El in (0, 1) be given and assume that there is a sequence ?m?J such that Ilym II > El. Theorem 5
guarantees that for any E2 in (0, El) there is a subsequence of ?m?J (without loss of generality we assume
that it is the fill' sequence) and a sequence ?l?J such that
(3.7)			II?)kII > E2, m? < k < l?, IIg??II < E2.
If the k-th iteration is successfiil then
f(xk) --H f(xk+l) > 21P??2minfA?, E2,ffi21, m? < k < l?.
XM X9
Since ?f(xk)1 converges, ?J(zk) --H f(xk+l )J converges to zero. From IIxk+1 --H xkII < PoxDAk, it
follows that, for sufficiently large i,
(3.8)			f(xk) --H f(rk+1) > E3IIxk+1 --H xkII m < k < l?,
where E3 = (?P?E2)/(PoxD). Using (3.8) and the triangle inequality,
f(xm?) --H f(xk?) > E3IIXki --H xm?II, m? < kj < li.
The uniform continuity of Vf and the convergence of ?f(xk)Y can now be used to deduce that
(3.9)			IIgm? --H gt?II < E2,
15
for i sufficiently large.
Consider a subsequence of ii (without loss of generality assume that it is the flIll sequence) such that
) converges to x*. Then fxmj J converges to x*. Based on the definitionof v(x), ifthe j-th component
of g(x*) is norizero, then, for i sufficiently large, the corresponding component of IIVmi I --H Vii is no
greater than that of IXmi --H Xii'. Thus (diag(IvrnjI? --H vi?I?)gi?J converges to zero. Therefore, for i
sufficiently large,
(3.10) IIDm?'i(Dmi --H Di?)Dp?1gi?II = diag(jvi,I? --H IvmiI?)?ijI < E2.
Using the triangle inequality for any m and 1
(3.11) ligmil < ?Dm1?I?gm --H ?iII + IIDm?'(Dm --H Di)D?'giII + IIg?II
Combining (3.11) with (3.7), (3.9) and (3.10), we obtain that
Ej < (xD + 2)E2.
Since E2 can be any number in (0, El), this is a contradiction.
Next we consider the second order necessary conditions. As mentioned in 2, we assume the
following quadratic model: ?k(s) = vf(xk)Ts + ?sT(?f(xk) + C(xk))s. Moreover, in addition to
(ASA), the condition (AS.6) holds, i.e., the reduction of the quadratic model satisfies
`Pk(sk) < Pmin???(s) : s = TPk, iiDksii < Ak, Xk + s E ?J,
IIDkskII < ?k, Xk + Sk E int(F),
where Pk is a solution to the unconstrained subproblem
min[??(s): iiDksiI < AkJ.
Before we state the second order convergence result, several technical lemmas are required. First,
we quote Lemma (4.10) in [19] below.
LEMMA 7. Let x* be an isolated limit point ofa sequence [xkJ in ??. If ?xkY does not converge
then there is a subsequence Xij which converges to x*, and an E > 0 such that
IIxi,+i --H Xi> II > E.
Now we examine the consequences of (AS.6) in greater detail. Using Theorem (3.11) in [18], if Pk
is a solution to (2.8), there exists a parameter Ak such that
(Mk + Aki) = RkTRk, (Mk + AkI)Dkpk = ?k, Ak > 0
16
with Ak(Ah --H IIDkpkII) = 0. Furthermore, as mentioned before, Pk satisfies
(Ahi + Dk2Ck)pk = Dk?2(9k + Bkpk).
LEMMA 8. Assume that (AS.6) is satisfied. Then
--H`kk(8k) > --H??[min?i,a2?JA?A2? + minfl,a?JIIR?D?p?II2i
where ?k is the stepsize along Pk to the boundary and Pk is a solution to the trust region subproblem
(2.8).
def
Proof Let ?r) = `kk(rpk) and r E [O,min?l,a?J] where ?k is the stepsize along Pk to the
boundary. Let r* to he the minimizer of ?(r) in [0, min? 1, a?J].
Itis easy to see that
= rg?Tp? +1 2T
-2r pkMkpk
= ?g?TD?p? --H 12			12
--H2r g?TDkpk --H --H2r AkIIDkpkII2
= --HTIIRkDkpkII2 + --H21r2IIRkDkpkII2 --H
2T2AkA2k.
But r2 < r sinee r < 1. Henee, from (AS.6),
--H`kk(sk) > p??(r*) > ?f [min?l,a2?JA?A2?+min?l,Q?JIIR?D?p?II2].
The following lemma provides estimates of the reductions in the objective ft'nction and quadratic
model.
LEMMA 9. Assume that the conditions of Theorem 6 and (AS.6) hold. Furthermore, ?xkY is any
sequence generated by eitherAlgorithm 1 or Algorithm 2. Ifevery limitpoint of fxkj is nondegenerate,
then there &ists 0 < Eo < 1 such that,for k sufrciently large.
2
and (f8k is successful.
f(i
[(x9+A2kJo			A2k2.J?.A2
AkxBxA)x2Di2, [(x9 + AkxBxD)i
min? 1,
A2kJo			A2k
2 [(x? +AkxBxD)x2D]2, [(x9 +AkxBxD)]2
inil
j ?--Hk) --H j ?Xk+1j			Im???,			-			-			- J
Proof Using Lemma 8,
--H?k(sk) > %?minf l,a2klAkA2k
17
where Ok is the stepsize along Pk to the boundary:
ii--HXkj Uj--HXkij :1 < i <
(3.12)			0k = min?max(			Pki
Pki
Since the problem is nondegenerate at every limit point and fxkj is bounded, there exists 0 < Eo < 1 and
2Eo <min(tt --H 1), such that, for sufficiently large k,
min(x? --H			--H xk) + Iy(xk)I > 2Eoe, e = [1,..., 1jT E ??.
(Otherwise, there would be a degenerate limit point of (xkj).
Following Theorem 6, ?diag(?v??z1 )9(xk)) converges to zero. Hence, for sufficiently large k,
Idiag(Iv?I)g(x?)IIoo <
Assume that k is sufficiently laige. If Igki I < 0, then
minfxkj --H lj? flj --H XkiY > 0
Hence, from (3.2), (3.12) and
119k + BkpkIIoo < x9 + xBllDk?'llllDkpkll < x9 + xBxDAk,
we obtain:
Ok>			Ak0
(x2 + AkxBxD)x2D
II l9kil > 0, then IVkjI < 0. ff Ok = 1Vki1/1Pki I, then from (32) and (312),
Ak
Ok >
x9 + AkxBxD
If 0k $ IVkiI/IPkiI, the numerator determining 0k is greater than 0. Hence, using (3.12)
0k >			Ak0
(xg+AkxBxD)x2D
When Mk is positive definite, we denote the Newton step for (2.8) by
(3.13) sNk (? Dk'Mk?'gk, i.e. MkDkskN = --H9k
LEMMA 10. If the sequence oftrust region subproblem (2.8) soludon ?Pk) converges to zero, ?xkY
converges to x* and ]?* is positive definite, then
liminf ?(0k*\Vk])>1, liminf??1 >1.
k?oo			-			k?oo ?(Pk) -
18
Moreover,for suffidently large k,
I?k*[pkiI > Emin(A2?, IIDksNkII2J
for some constant E > 0.
Proof Let ak he the stepsize, along Pk? to the boundary:
ii--HXkj Uj--HXki
= min?maxf			Pki			:1 <i <
Pki
Sincep? isasolutionto(2.8), r* = minfl,a?J in (2.11). By definition (2.11), 0i S Ok < 1,0k --H 1 =
O(IIpkII). Since Mk is positive definite for sufficiently large k, we have that pTkMkpk > 0. Therefore,
liminf?k(Qk*[Pk])			rk*OkgkTpk+?2Irk*2?pTkMkpk
k?oo ?[pk] = li?m??lnt r?*g?Tp? + 1 *2pTMkpk
?Tk k
> lim 0k
k?+oo
=			1.
Since M* is posinve definite, x* is non(1egenerate. If all variab?s are free ar a limit point x*, inen from
the assumption that fPkJ converges to zero, it is clear that
liminfa? > 1.
k?oo
IVkjI --H			i9k?I+Ak
If there exist variables on the boundary at x*, since ?PkJ converges to zero, ak = Pkj --H 19kj+(BkPk)jI
for some vj(x*) = 0 and g; $ 0. This means that
(3.14)			li????aoflf(Qk) > 1.
Assume that Eo > 0 is a lower bound on the eigenvalue of ]?*. Since Pk = sNk if Ak = 0, using Lemma 8,
I'kk*[pk]I > 214)min?1,o2?)minfA2?, IIDkskNII2)
where DksNk = Mk?1Yk. LetO < E < 21Eo. Then, forksufficiently large,
I?k*[pkiI > Emin?A2?, IIDkpkII2l.
In addition,
Hence
`pk*[pk]
liminf
k?oo ?(Pk) k?oo
* T			1 *2 T
rkykpk+2rk pkMkpk
liminf
??`1?Pk + 2lpTkMkpk
li,iii?i?minta?, 1)
1.
limirif ?k*[Pk] > 1.
k?oo ?(Pk) --H
19
The next theorem indicates that the first order and second order necessary conditions can be satisfied.
THEOREM 11. Letthelevelset=(xE?? : ?x)<?x0),xE?Jbecompactandf:F??
be twice continuously dtfferentiable on fl. Let ?xkJ be the sequence generated by Algorithm 2 under
assunption (AS.5) on the model `kk and under assumptions (AS.4) and (AS.6) on the step sk. Then
(a) The sequence ?9k1 converges to zero.
(b) If every limit point is nondegenerate, then there is a limit point x* with M* positive
semidefinite.
(c) If x* is an isolated nondegenerate limitpoint, then M* ispositive semidefinite.
(d) If M* is nonsingularfor some limitpoint x* of ?xkY then M* is positive definite, fxkY
converges to x*, all iterations are eventually succes?u4 and ?Ak1 is bounded away
from zero.
Under the additional assumption that problem (1.1) is nondegenerate, equivalent results holdfor the
sequence generated by Algorithm 1.
Proof We prove each result in order.
(a) The sequence ?gk = diag(Iv?I?)Vf(x?)J converges to zero: this has been proved in
Theorem 6.
If fAhi is not bounded away from zero, the result immediately follows.
We prove that ?AkJ is not bounded away from zero by contradiction. Assume that
Ak > E > 0. First we show that ?AkJ conve?es to zero.
From Lemma 9, we have that, for sufficiently large k,
where
--H?k(sk) > P2?2?,
= mm?1,			E2J0			E2
[(x9 +AkxDxB)x2Di2, [(x2 +AkxBxD)]2?
Moreover, for sufficiently large k and successfiil steps
(3.13)			f(xk) --H f(xk+1) > ???E?EA2?.
--H2
The rest of arguments are similar to the proof ofLemma 5. If there are finite number of
successful steps, ?Ak1 conveiges to zero. Otherwise, let fkjl be the infinite sequence
of successful iterations. Inequality (3.13) implies that there exists a constant El > 0
such that E?j > El. This fact and inequality (3.13) imply that
LAL <00.
20
Now the updating rules of Algoritlim 1 & 2 imply that
A2k<(l+			2
1??2?)?A2ki
Hence ?AkJ converges to zero. Since Ilskil < IIDk'DkSkII < ?????A? and IIPkII <
XDAk, we conclude that both ?sk1 and ??J converge to zero.
From the fact that ?AkJ converges to zero,
Hence
Now a standard estimate is that
E?>E? i0rsome?>0.
--H`kk(sh) >
If(xk + sk) --H Axk) +
2Sk Cksk --H `kk(sk)I
IIshII2om<?i II?f(xk + ?sk) --H ?f(xk)II
and thus the last two inequalities show that ?P?k --H 1 J converges to zero. We conclude
that the entire sequence converges to unity.
For AIgorithm 2, using Lemma 4, fAk) cannot converge to zero, which is a contradic-
tion.
Now we consider AIgorithm 1. Let ak be the stepsize to the boundary along pk;
ii--HXki `Lj--HXki
= minfmaxf
Pki			Pki
If all variables are free at a limit point x*, then a* = +00. Otherwise, consider a limit
point x* with some variables at their bounds. Since this limit point is nondegenerate
and lim??oo diag(iv?i)g? = 0, using (3.2), we have
= IVkjI = Igkjl+Ak
IPkjI			Igkj + (Bkpkbl
for sufficiently large k with Y*j $ 0. But Ak > E. Thus the corresponding limit a? > 1.
Hence
liminfa?> 1.
k?oo
In other words, 8k = a?*[P?] = Pk for sufficiently large k. Therefore pCk > 1 for
sufficiently large k. Hence, fAki cannot converge to zero. This is again a contradiction.
In conclusion, there is a limit point with M* positive semidefinite.
21
(c) If ?xkJ converges to x*, the result follows fiom (b). If ?xkJ does not converge then
Lemma 7 applies. Thus, if fXtj J is the subsequence guaranteed by Lemma 7 then
A1 > ?E. From Lemma 9, fAijAiil converges to zero. Hence ?A1?J converges to
po
zero. Thus M* is positive semidefinite.
(d) If M* is nonsingular at a limit point x* of fxkl, then x* is an isolated limit point and
hence M* is positive definite following (c). Since ?k(8k) = 9kTsk + 2isTkMksk <0 we
have that
--H2IIDkskII < ?M?'IjIjg?jj
whenever Mk is positive definite. This means that
2lIIskII < --H21xDIIDkskII < xDIIM?1IIIIg?II
whenever Mk is positive definite. But fDk?1Yk = diag(?v??21)VJ(?)J converges to
zero. Following Lemma 7, ?xkJ converges. Since llm??oo g? = 0, fxhl converges to
x* and M* is positive definite, fPkJ and ?sk) converge to zero.
Next we prove that ?Ak) is bounded away ftom zero. Assume that E > 0 is a lower
bound on the eigenvalues of Mk. Using Lemma 10, for sufficiently large k,
I?k*[pkiI > (minfA2?, IIDksNkII2J.
But recall that, whenever Mk is positive definite,
--H2IIDkskII < IIMk?1IIII9kII.
Let ? be an upper bound on the condition number of Mk. From 9k = ]?kDksNk,
--H21IIDkskII < ?IIDksNkII.
Hence, there exists E?> 0 such that
--H`kk(sk) > EIIDkskII2 > (?IIskII2.
x2D
This estimate and
If(xk + sk) --H f(xk) + 1T ?k(sk)I
2Sk Cksk --H
IIskII2om?1 II?f(xk + ?sk) --H ?f(xk)II
yield that P?k > ? for k sufficiently large.
For Algorithm 2, using Lemma 4, we immediately conclude that ?AkJ is bounded away
from zero.
22
Now we consider Algoritlirn 1. Using Lemma 10, for k sufficiently large,
`Ph(Pk)			`kk*[pk] ? ?h(ah*[pk]) > ?
pkC = ??*[--HD?2g?j ? `t)k(Pk)			`t)k*[pk]
Therefore all the iterations are eventually successfiil and ?AkJ is bounded away from
zero.
THEOREM 12. Assume the conditions of Theorem 11 hold and ]?* is nonsingularfor some limit
point x* of ?xkY. tt skN be the Newton step (3.13) when it exists. Moreover, sk = ak*[sNk] whenever
IIDksNkII <Ak. Then fxkl converges to x* quadratically
Proof From Theorem 11, ?AkJ is bounded away from zero. But under our assumptions, ?D??1g?J
converges to zero and ?xk) converge to x*. Hence fDkskNj converges to zero where sNk is the Newton
step:
MkDk?lsNk =
Using definition (2.11), for sufficiently large k,
sk?sNk=?[skN]?sNk =r?*oks?N?r?*sN? +rk*sNk sNk
where r?* = minfl,a?1. From Lemma 11 in [6], Ir? --H 11 = O(IIxk --H r*II). But 10k --H 11 = O(IIskNII).
Hence Ilsk --H sNkII = O(IIxk --H z2*il). UsingTheorem lOin [6], ?xk) converges quadratically to x*. I
4. Preliminary Numerical Experiments. In this section we repon on preliminary experiments with
the practical trust region method, Algoritum 2, on a set of standard test problems of low dimension. The
method solves these problems quite satisfactorily indicating that this approach has practical potential.
We implemented the "practical trust region algorithm" described in Figure 3 in a straightforward
manner. Either Sk = Qk*[Pk] where Pk is a solution to the trust region subproblem (2.8), or Sk =
?k* [--HDk?2gk]. The exact implementation is described below.
and
The computed step Sk satisfies the condition
`kk(8k) < P't)k(Qk*[--HDk?2gk]), IIDkskII < Ak, Xk + Sk E int(Y),
`kk(sk) < `k?(a?*[p?i), IIDkskII < Ak, Xk + Sk E int(Y).
Note: It is easy to verify that condition (ASA) can be replaced with
`kk(sk) < P'k?(a?*[--HD?2gk]), IIDkskII < ?[)Ak, Xk + Sk E int(Y)
23
The Method Implemented
Let ? = o.25,p = 0.1,Tj = 0.75, ?6 = o.O625,? = 0.5,? = 2, Ai = 1 and
x0 E int(?)
Fork =o,i,..
1. Compute f(xk)1 9k? Hk, and C?; Define the quadratic model `kk =
gkTsk + 2IskT(Hk + Ck)sk.
2. Computeasoiuflonp?of(2.8). ? ?putep? = ?(fkk(?*D[Pkk219)k1) IfpCk > ?,
-c
5k = a?*[Pk]. Otherwise, Sk = ak*[--HDk?29k].
3. Compute
P1k = f(xk+?k)--Hf(xk)+2IsTkC(xk)sk
?k(sk)
4. If P?k > j then Xk+i = Xk + Sk. Otherwise Xk+i = Xk.
5. Update the model ?k the scaling matrix Dk and Ak as specified.
Updating Trust Region Size Ak
1. 1fP1k <OthenA?+i = ?oAk
2. IfO<p1k <!thenA?+i =max???k,?IID?'skII1.
3. 1?P1k > ?then
ifpkc> ?then
Ak+i = max?Ak,??IID??1skII1
else
ifAk >A?andpkc < ?thw
Ak+i = maxf?A?, IIDk1skIIJ.
and condition (AS.6) can be replaced with
`t)k(sk) < ????(a*?[p?]), IIDkskII < p0?A?, Xk + Sk E int(T).
Thus, the implemented method has the convergence properties listed in Theorems 11 and 12.
The experiments were done on a Sun (Sparc) workstation using MAThAB 4.0 [17]. The stopping
criteria used are:
Mk > 0 and ?k(sk) <0.5 * l0?12
The test problems are taken from [9]. However, the starting points as described in [9] may not be strictly
feasible. Assume that Xatart is the starting point specified in [9]. We modify the starting points as follows:
XOi = (? + 0.1 * (uj --H I?), if Xstarti < i? + lOOE,
XOi = 1j --H 0.1 * (tj --H i?), if Xstarti > Uj --H lOOE,
where E 10--H16 is the machine precision.
In Table 1, we repon the number of flinction and gradient evaluations taken by the method to obtain
the required accuracy. The number of flinction evaluations required by the method in [9] is cited in the
24
last column as a relative comparison. We point out that the starting points and the stopping criteria of the
two methods are different. The approximate solutions obtained by the two methods may also differ. The
stopping accuracy of the method in [9] is not usually as stringent as that used in our experiments --H the
method in [9] terminates when the norm of the projected gradient is less than 10-6
5. ConcIusionL We have proposed a trust region approach to the bound-constrained nonlinear
tI,lern Thi?			nrt1vf?2??P1t?			`i
mIrL?????auonprc??			approacn generares SL???j			??rares an possesses strong convergence
characteristics. In particular, we have established second-order convergence properties. Moreover,
the convergence results match the implementation in the sense that a global solution to a quadratic
programming problem, with linear inequality constraints, is not required by the theory. Instead, an
d.ppw? flaw rniiiimization of a quadratic fi1Ii?uu'i ?ubj?L LU WI ellipsoidal constraint is required (and
11
achievable).
The computatlonal expenments repofled, on a well-known test collection of small-dimensional
problems, indicate that AIgorithm 2 has practical potential. However, from a practical computational
point ofview we believe the real promise ofthe underlying ideas presented here is inthe large-scale setting.
The method as described is not directly suitable for large-scale problems --H computation of a (suitably)
accurate solution to the trust region problem in high dimensions is probably too costly. Nevertheless,
there is considerable scope for modifying and adapting the basic idea, with efficiency in mind, to the
large-scale setting. This is a subject of current investigation.
Finally, we remark that the trust region ideas developed in this paper, for box constraints, can be
extended to the case where there are also linear equality constraints present, i.e., min?f(x): Ax --H b 1 <
x < u). This generalization is a subject of current research.
25
Number ofFuiictionI6radient?EvaIuationsI
PROB n NEW			CGT
GENROSE U			8			45			38			42
GENROSEC			8			15			14			15
CHAINROSE U			25			23			19			20
CHAINROSE C			25			28			23			18
DEGENROSE U			25			30			30			95
DEGENROSE C			25			28			23			17
GENSING U			20			27			27			10
GENSING C			20			23			22			4
CHAINSING U			20			26			26			18
cHMNSING C			20			22			21			3
DEGENSING U			20			26			26			155
DEGENSING C			20			35			34			3
GENWOOD U			8			68			57			107
GENWOOD C			8			10			9			5
CHAINWOOD U			8			57			48			77
cHMNWOOD C			8			10			9			5
HOSC45 U			10			28			27			19
HOSC45 C			10			10			9			12
BROYDENlAU			30			14			14			11
BROYDENlAC			30			14			13			8
BROYDEN1BU			30			7			7			7
BROYDEN1BC			30			9			8			6
BROYDEN2A U			30			19			16			14
BROYDEN2A C			30			24			22			10
BROYDEN2B U			30			9			9			9
BROYDEN2B C			30			15			14			9
TOINTBROY U			30			8			8			8
TOINTBROYC			30			12			11			8
ThIGU			10			13			11			7
ThIGC			10			16			14			8
TO?R1GU			10			8			6			13
TOINlRRlGC			10			8			6			10
CRAGGLEVYU			8			33			31			24
CRAGGLEVY C			8			30			29			20
PENALTY U			15			24			24			27
PENALTY C			15			29			29			80
AUGMLAGNU			15			29			26			31
AUGMLAGNC			15			46			44			31
BROWN1U			10			20			20			27
BROWN1C			10			31			30			27
BROWN3U			10			9			9			7
BROWN3C			10			10			9			6
BVPU			10			21			21			4
BVPC			10			20			20			4
VARU			12			12			6
VARC			20			12			11			6
T?i1
Experiinenls with a Prxtical Thust Region Methodfor Bounded Constrained Problems
REFERENCES
[1] A. BJ?RCK, A direct methodfor sparse least squaresproblemswith tower and ?per bounds, Numerische Mathematik, 54
(1988), pp. 19--H32.
[2] R. H. BYRD AND R. B. ScuNAnsr,Approximatesolution o/the trust regionproblemby minLesization over two-dimensional
subspaces, Mathematical Programming, 40(1988), pp.247--H263.
[3] T. F. Co?'N AND C. HEMPrL, Computing a trust region step/or a penaltyjitection, SlAM Journal on Scientific and
Statistical Computing, 11(1990), pp. 18?201.
[4] T. E COL??AND LA. HuH3ERT,A directactive set algorithmfor large sparse qwdratlcprogramswith simple bounds,
Mathematical Programming, 45 (1989), pp.373406.
[5] , A gLobally and superlinearly convergent algorithm for convex qwdratic programs with simple bounds, Tech.
Rep. TR 90-1092, Computer Science l)eparrmen? Cornell University, February. 1990 (to appear in SlAM Journal on
Optirnization).
[6] T. F. Cou?? AND Y. Lt, On the convergenceofreflectiveNewtonmethods/or targe-scale nonlinearminimization subject
to bounds, Tech. Rep. TR 92-1314, Computer Science Ilepanment, Cornell University, 1992.
[7] , A reflectiveNewton method/or minimizing a qusiraticfimction subject to bounds on the variables, Tech. Rep. TR
92-1315, Computer Science Eepazznent, Cornell University, 1992.
[8] A. R. CoNN, N. L M. GouLi), AND P. L. ToiNT, Global convergence ofa class oftrust region algorithmsfor optimization
with simple bounds, SlAM Journal on Numerical Analysis, 25(1988), pp.433460.
[9] , Testing a class ofmethodsfor solving minimization problems with simple bounds on the variables, Mathematics
of Computation, 50(1988), pp.399430.
[10] R. 5. D?MBo AND U. TtnnwnzKi, On the minimization ofquttraticfiwctions subject to box constraints, Tech. Rep. B
71, Yale University, 1983.
[11] R. F?? An algorithmfor solving linearly constrained optimizationproblems, mprog, 2(1972), pp. 133--H165.
[12] R. nElOJER AND M. P. JACKSON, Minimization ofa qutstraticfimction o/many variables subject only to lower and ?pper
bounds, Journalof the Institute for Mathematics and its Applications, 14(1974), pp. 159--H174.
[13] D. M. GAY, Computing optimal Locally constrainedsteps, SlAM Journal on Scientific and Statistical Computing, 2(1981),
pp. 186197.
[14] P. Gui AND W. MURRAY, Minimization subject to bounds on the variables, Tech. Rep. Report NAC 71, National Physical
Laboratory, England, 1976.
[15] J. J. JflDIcE AND E M. PIRES, Direct methods/or convex qutIfralicprograms subject to box constraints, departamento de
matemAtic? Universida'1e de Coimbra, 3000 Coimbra, Portugal, 1989.
[16] P. LoTsmDT, Solving the minimal least squaresproblemsubject to boundson the variables, BIT, 24(1984), pp.206224.
27
[17] C. B. MoiJ? J. Lmi, 5. BANGERT, AND 5. KU?1MAN, ProMatlab User's guide, MathWorks, Sherborn, MA, 1987.
[18] J. J. MOR? Recent developments in algorithins and softwarefor trust regionmethods, in Mathematical Programming: The
State of the Art, M. G. A. Bachem and e. B. Done, eds., Springer Verlag, Berlin, 1983.
[19] J. J. MO? AND D. SORi?SEN, Con?puting a trust region step, SlAM Journal on Scientific and Statistical Computing, 4
(1983), pp. 553--H572.
[20] J. J. Mosul AND 6. ToRAu)o, Algorithmsfor bound constrained qadrattc programmingproblems, Numeiische Mathe-
matik, 55 (1989), pp.377400.
[21] D. P. O'U?Y, A generalized conjugate gradient algorithm for solving a class of quisiratic programning problems,
Linear Algebra and its Applications, 34 (1980), pp. 371--H399.
[22] U. OREBORN, A direct methodfor sparse nonnegative least squaresproblems, PhD thesis, Department of Mathematics,
Link?ping University, Link5ping, Sweden, 1986.
[23] GA. SCHULTZ, R. B. ScHNABEk AND R. H. BYRD, A family oftrustjegion.basedalgorithmsfor unconstrainedmini-
nization with strong global convergenceproperties, SlAM Journal on Numerical Analysis, 22(1) (1985), pp.4767.
[24] D. SoRENSEN, Trust region methodsfor unconstrained optimization, SlAM Journal on Numerical Analysis, 19 (1982),
pp.409426.
[25] T. SmEnAun, The conjugate gradient methods and trust regions in large scale optimization, SlAM Journal on Numerical
Analysis, 20(1983), pp.626637.
[26] E. K. YANG AND J. W. Toii? A class ofmethodsfor solving large convex quadeaticprograms subject to box constraints,
tech. rep., Department of Cperations Research, University of North Carolina, Chapel Hill, North Carolina, 1988.
28
