BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1268
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: On The Local Convergence of The Byrd-Schnabel Algorithm For 
        Constrained Optimization
AUTHOR:: Coleman, Thomas F. 
AUTHOR:: Liao, Ai-Ping    
DATE:: February 1992
PAGES:: 12
ABSTRACT::
Most reduced Hessian methods for equality constrained problems use a basis 
for the null space of the matrix of constraint gradients and posess 
superlinearly convergent rates under the assumption of continuity of the 
basis. However, computing a continuously varying null space basis is not 
straightforward. Byrd and Schnabel [2] propose an alternative implementation 
that is independent of the choice of null space basis, thus obviating the 
need for a continuously varying null space basis. In this note we prove that 
the primary sequence of iterates generated by one version of their algorithm 
exhibits a local 2-step Q-superlinear convergence rate. We also establish 
that a sequence of ``midpoints'', in a closely related algorithm, is (1-step) 
Q-superlinearly convergent.

Key words: constrained optimization, null space, superlinear convergence, 
reduced Hessian.
END:: CORNELLCS//TR92-1268
BODY::
On The Local Convergence of
The Byrd-Schnabel Algorithrn
For Constrained Optirnization
Thomas F. Coleman*
Al-Ping Liao**
TR 92-1268
February 1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*Computer Science Department, Advanced Computing Research Institute and Center
for Applied Mathematics, Cornell University, Ithaca, NY 14853. Research partially
supported by the Applied Mathematical Sciences Research Program (KC-04-02) of
thee Office of Energy Research of the U.S. Department of Energy under grant DE-
FG02-86ER2501 3.AOOO and by the Computational Mathematics Program of the
National Science Foundation under grant DMS-8706133.
** School of Operations Research aand Industrial Engineering, Cornell University,
Ithaca, NY 14853. Research partially supported by NSF, AFORS, and ONR through
NSF grant DMS-8920550.
ON THE LOCAL CONVERGENCE OF
THE BYRD-SCHNABEL ALGORITHM
FOR CONSTRAINED OPTIMIZATION
Thomas F. Coleman' and Ai-Ping Liao2
Abstract
Most reduced Hessian methods for equality constrained problems use a basis
for the null space of the matrix of constraint gradients and possess superlinearly
convergent rates under the assumption of continuity of the basis. However, com-
puting a continuously varying null space basis is not straightforward. Byrd and
Schnabel [2j propose an aiternative implementation that is independent of the
choice of null space basis, thus obviating the need for a continuously varying
null space basis. In this note we prove that the primary sequence of iterates
generated by one version of their algorithm exhibits a locai 2-step Q-superlinear
convergence rate. We aiso establlsh that a sequence of "midpoints", in a closely
related aigorithm, is (i-step) Q-superlinearly convergent.
Key words. constrained optimization, null space, superllnear convergence,
reduced Hessian.
1Computer Science Department, Advanced Computing Research Institute, and Center for Applied
Mathematics, Cornell University, Ithaca, NY 14853. Research partially supported by the Applied
Mathematical Sciences Research Program (KC-04-02) of the Office of Energy Research of the U.S. Dfr
partment of Energy under grant DEFG02-86ER25013.A000 and by the Computational Mathematics
Program of the National Science Foundation under grant DMS-8706133.
2School of 0perations Research and Industrial Engineering, Cornell University, Ithaca, NY 14853.
Research partially supported by NSF, AFORS and 0NR through NSF grant DMS-8920550.
1 Introduction
The reduced Hessian methods for equality constrained optimization problems usually
use a basis for the null space of the matrix of constraint gradients. Consider the problem
min f(x)			(1)
subject to c(x) = 0
where f : R and c : Rt are smooth nonlinear functions. Suppose that
A(x) is the n x t matrix whose columns are the gradients of the constraint functions
c(x). We assume that A(x) is of full column rank. Let Z(x) be an orthonormal basis
for the null space of A(x)T; hence Z(x) is an n x --H t) full rank matrix satisfying
A(x)TZ(x) = 0. If L(x, A) = f(x) --H c(x)TA is the Lagrangian for problem (1) then the
reduced Hessian matrix can be expressed as Z(x)TVz2L(x, A)Z(x). The reduced Hessian
is dependent on the choice of null space basis Z(x). Many reduced Hessian algorithms,
e.g., Coleman and Conn [5], Nocedal and Overton [9], assume continuity of Z(x). But,
as pointed out by Coleman and Sorensen [6], the standard implementation of the QR
factorization of A(x) via Householder matrices does not necessarily yield a matrix Z(x)
with continuously varying elements. Coleman and Sorensen [6] propose factorization
schemes which guarantee local continuity. In contrast, Byrd and Schnabel [2] propose
an algorithm which is independent of the choice of the null space basis. In section 2 we
present the Byrd-Schnabel algorithm, and in section 3 we prove that their algorithm is
locally 2-step Q-superlinearly convergent.
2 The Byrd-Schnabel algorithm
In this section we describe the Byrd-Schnabel algorithm.
2
Algorithm
0. Choose an initial invertible matrix B0 with the form z0TQz0 where Q is a symmetric
matrix and Zo is a basis for the null space of A(xo)T, and an initial point x0; let
k =0.
1. Compute
where
dk = hk + vk,			(2)
hk = ZkBk?lZkTVf(xk),
Vk = Ak(AT?Ak)?lc(x?).
Set Xk+1 := xk + dk.
2. Compute Zk+l, Tk := ZkTZk+l and Pk.
3. Let
4. Compute
where
5k :=
Yk :=
Bk =TkT(Bk?PkI)Tk+PkI.
ZkT+l(xk+l --H xk),
ZkT+l[V:L(xk+l,Ak+l) --H VxL(Xk+l --H Zk+lZt+idk,Ak+l)],
(3)
(4)
Ak+1 = (ATk+lAk+l)?lAkT+lVf(xk+l).			(5)
Update Sk using the DFP or BFGS update1, Bk+l = U(S?;s?,y?), with secant
equation Bk+lsk = Yk.
5. Set k to k + 1 and go to step 1.
c
We note that dk is the solution to
min Vf(xk)Td + 2idTZkBkZkTd			(6)
subject to c(x?) + A(xk)Td --H 0.
The scaling factor Pk can be regarded as an approximation to IIVx2L(Xk, A?)II; for ex-
ample one can takes Pk = IIBkII (see Byrd and Schnabel [2]). Here we just assume that
[Pk) is bounded.
1See, for example, Dennis and Schnabel [8].
3
The algorithm we have described above is actually a member of the set of algorithms
(or implementations) proposed by Byrd and Schnabel. In this set Byrd and Schnabel
allow for a variety of choices for 5k and Yk We note that Byrd and Schnabel [2] do
not give any convergence result for any member of their set of algorithms. In the next
section we prove that the algorithm described above, which we call the "Byrd-Schnabel
algorithm", is locally 2-step Q-superlinearly convergent.
Next we show that if Pk is restricted to be positive the update formula in this
algorithm preserves positive definiteness.
Theorem 2.1 If Bk is positive definite and ykTsk > 0, p? > 0, then Bk+1 is also
positive definite.
Proof. We need to show that Sk is positive definite if Bk is. For any given x we
have,
(Tkx)T(Tkx) = IITkxII2 = IIZkTZk+lxII2
<--H II4T?I2IIzk+lxII2 = 114T112(xTx) = xTx.
(We use to denote the 2-norm.) If Tkx # 0, then, since Bk is positive definite,
If Tkx = 0, then
xTSkx = (Tkx)TBk(Tkx) + Pk(xTx --H (Tkx)T(Tikx))
> Pk(xTx --H (Tkx)T(Tkx)) > 0, using (7).
xTSkx = (Tkx)TBk(Tkx) + Pk(xTx --H (Tkx)T(Tkx))
--H Pk(xTx) > 0.
4
(7)
(8)
(9)
c
Thus Sk is also positive definite.
We will show below that if we only assume that (PkJ is bounded, then the update
will preserve positive definiteness locally.
3 Superlinear convergence of the Byrd-Schnabel
algorithm
In this section we discuss the local properties of the Byrd-Schnabel algorithm. We
assume that there is an open convex region, say D, containing a point x* and the
following statements hold:
Al: x* is a local minimizer of problem (1).
A2: The functions f and c are twice continuously differentiable in a neighborhood of
A3: A* := A(x*) is of full column rank t.
A4: Vx2L(x*, A*) is positive definite on the null space of At, null(At).
Since the Byrd-Schnabel algorithm is independent of the choice of Zk, we can assume
that Zk = Z(xk) in D where Z(x) is a continuous differentiable function on D. We
assume that Z(x), V2f(x) and V2c(x) are Lipschitz continuous functions of x in D.
We make extensive use of the "0" notation, where ?k = O(?t'k) means that the radio
?k/'kk remains bounded as k tends toward infinity. Coleman and Conn [5] prove the
following result.
Theorem 3.1 JfIIBkII and IIBk111 are bounded, then IIxk+1 --Hx*II = O(IIxk --Hx*II) and
there exist positive scalars K0 and K1 such that
5
(i) IlAk --H A*ll < I?oIIxk --H x*ll,
(ii) IIZkTV:2L(xk,Ak)Zk --H H*I < KlIIxk --H X*II,
where Ak is defined by (5). If in addition, xk x* and
II(Bk--HH*)ZkT+l(Xk+l--HXk)II ___
IIdkII			,. 0,			(10)
where dk = Xk+1 --Hxk and H o+--H ZTVx2L(X*, A*)Z*, then xk			x* 2-step superlinearly.
0
Lemma 3.2 Assuming that IIBkII and IIBk?1II are bounded, 5k is given by (3) and Yk is
given by (4), and there exists a positive scalar ? such that if IIxk --H X*II < ?, then
IlMyk --H M?1skII < --H31IIM?1skII,
where M = H* 2
Proof. First we note that
By Taylor's theorern,
where
IlMyk --H M?1skII < IlAill Ilyk --H H*skII
VrL(Xk+l, Ak+1) --H VxL(Xk+l --H Zk+lZkT+ldk, Ak+1)
= V:2L(Xk+1,Ak+l)Zk+lZkT+ldk + EkZk+lZkT+ldk
(11)
(12)
IlEkIl = O(IIZk+lZkT+ldkll) = O(IIxk+i --H xkIl) = O(maxfIIr?+i --H x*lI, IIXk --H X*II1).
yk = ZkT+l(V?L(Xk+l, Ak+1) --H VxL(Xk+i --H Zk+lZkT+ldk, Ak+1))
--H ZkT+lV:2L(Xk+l, Ak+1 )Zk+l ZkT+ldk + ZkT+l EkZk+1 ZkT+l4.
6
So
Thus, by Theorem 3?1 and provided ? is sufficiently small,
Ilyk --H H*skII < (IIZkT+lV:2L(xk+l,Ak+l)Zk+l --H H*I + JIZkT+lEkZk+lII)IIskII (13)
< (K0 + Ki) O(max(IIx?+i --H x*II, IIxk --H X*II1)II5kII (14)
Hence it follows that for s sufficiently small,
which implies, by (11)
Ilyk --H H*skII < 115kII
311M112
IlMyk --H M?1sk1l < --H31llM?1sk11.
0
Lemma 3.3 If IlMyk --H M?1skll < -31l1M?1skll with 5k # 0, then ykTsk > 0 and thus
Bk+1 is well-defined in this algorithm. Moreover, there are positive constants a0, ol
and ?2 such that
IIBk+l --H H*IIM < [(1 --H ?o0k2)?21 + Q1?k]llBk --H H*IIM + ?2?k,
where a? E (0,1], ?k max?llx?+i --H x*II, IIxk --H x*llJ, and
IIM[Dk--HH.]skII
0k			iIBk--HH.IIMIIM--HlakII			for Bk # H*,
0			otherwise.
Proof. We first note that
Thus,
IITk --H `11 = llZkTZk+i --H ZkTZkll = llZkT(Zk+l --H Zk)II =
llSk --H BkII = llTkTBkTk --H Bk --H Pk(TkTTk --H 1)11
7
< IITkTBkTk --H BkII + IPkIIIi?Tk --H Ill
= IITkTBkTk - TkTBk + TkTBk - BkII + IPkIIITkTTik - Tk + Tk -11
< (IITkTBkII + IIBkII)IITk --H Ill + IPkI(IITkTII + 1)IITk --H Ill
--H O(a?) + O(a?) = O(a?). (Since (Pkl is bounded.)
(15)
This implies
IISk --H H*I < IIBk --H H*I + O(?k).			(16)
Noting (14), this lemma thus follows from the Lemma 3.1 of Dennis and More' [7]. c
Theorem 3.4 Assume that ? IIxk --H X*II < 00, IIBkII and IIBk?1II are bounded. Then
we have
II[IBXkkjH?;?'\II			0.
Proof. Since (1 --H Qo0k2)?21 < 1 Lemma 3.3 of [7] guarantees the existence of
By Theorem 3.4 of [7],
?l%?oo IIH* 2BkH7? --H IlIF.
WOk2IIBk?H*IIM <00.
Either fIIBk --H H*IIMJ converges to zero and we done, or [0k) converges to zero. In the
latter case we have by (15),
II(Sk--HH*)skII ,.
lIskIl
Noting that Il5k II < IIxk+i --H XkII, we thus have
II(??4 H)$III			0,
8
which, by (15), implies
IJ(IIBxkkYlH;$II			0.
From Theorem 3.1, we now need to show that ? IIxk --H x*JJ < oc and IiB?II and
IIBk?'II are bounded. The following lemma is Corollary 3.14 of Coleman and Conn [5].
Lemma 3.5 Provided the smallest eigenvalue of Bk?l and Bk is greater than a positive
scalar K, there exist positive scalars e and ? such that if
then
IIr??i --H x*II < ?, IIxk --H x*II < ?, IIBk1 --H H*?'JJM < ?
IIxk+1 --H x*II < 2111			--H x*II.
c
With the above lemma and Lemma 3.3, using the same technique employed in [1]
and [5], we thus have the following result.
Theorem 3.6 Suppose that the sequence [xk, Bk) is generated by the algorithm with
the initial quantities x0, B0, where B0 is symmetric positive definite, and fPk] is bounded.
Then there exist positive scalars ? and S such that if
lixo --H x*li < s, and liB0 --H H*llM < S,
then llBk--HH*ll <2?fork=O,1,..., and
?IIr??x*II <00.
9
0
Theorem 3.7 Suppose that the sequence (x?, Bk) is generated by the alqorithm with
the initial quantities x0, B0, where B0 --H zTQz0 and Q is a symmetric matrix, and
?Pk1 is bounded. Then there exist positive scalars ? and ? such that if
lixo --H x*I! < ?, and JJQ --H V?2L(x?, A*)II ?
then IIBk --H H*I < 2?, for k = 0, 1,..., and fxk) converges to x* at a 2-step Q-
superlinear rate.
Proof.			Redefine ? if necessary so that
lixo --H x*II < ?, and IIQ --H V?2L(x*, A*)II < ?
imply IIBo--HH*IIM <?. The result follows immediately from Theorem 3.1, Theorem 3.4
and Theorem 3.6			E
As a consequence of Theorem 3.7 we can further restrict ? and ?, if necessary, so
that (Tkx)TBk(Tkx) > ?IIxII2, for some ? > 0, and IIT?xII > (1 --H jt?-1)21IIxII, for all
k = 0, 1,..., and x E R?, where IPkI <tc and we can assume that ?> 1. Thus, by (8),
xTS?x = (Tkx)TBk(Tkx) Ar p?(xTx --H (Tkx)T(Tkx))
> (T?x)TB?(T?x) --H tc(IIxII2 --H (1 --H jttc?1)J?x??2)
> !IIxII2 --H itIIxII2 = 0.
Therefore if we assume that tPkJ is bounded, then the update preserves positive defi-
niteness locally.
4 Concluding remarks
We note that Byrd and Schnabel [2] also suggest that one can take dk = hk + vk
where vk = A?(AT?A?)?l?x? + hk). Since IIvk --H vkII < O(IIhkII2), for this choice
10
of dk, by further restricting ?, if necessary, Lemma 3.2 holds and so do Lemma 3.3
and Theorem 3.4. Noting that Lemma 3.5 is valid for this choice of dk (see Coleman
and Conn [5]), Theorem 3.6 and 3.7 follow. Therefore the algorithm is still 2-step Q-
superlinearly convergent. Moreover, by Theorem 2.5 of Byrd [3], the sequence fxk + hk 1
is (i-step) Q-superlinearly convergent.
Our result applies to our particular choices of sk and Yk. However, other choices are
also possible. For example we can choose sk ZkT(xk+l --H xk) and Yk ZkT[VxL(xk +
hk, Ak) --H VxL(Xk, Ak)], as suggested by Coleman and Conn [5], and it is easy to prove
that all the above results are also valid for this modification (provided the algorithm is
changed by putting step 4 before step 2).
Finally, we note that Coleman [4] suggests a slight generalization of the Byrd-
Schnabel algorithm: in step 3 let
Sk = TkT(Bk - Ck)Tk + Ck,			(17)
where Ck is symmetric but otherwise arbitrary. It is easy to show that, if fCkJ is
bounded, i.e., liCkli < tc?, k = 1, 2,..., for some ?? > 0, then the algorithm is still
lo?ally 2-step Q-superlinearly convergent.
11
References
[1] C. G. Broyden, J. E. Dennis, and J. J. Mor6. On the local and superlinear conver-
gence of quasi-Newton methods. J. Inst. Math. Appi., 12:223--H245, 1973.
[2] R. Byrd and R. Schnabel. Continuity of the null space basis and constrained opti-
mization. Mathematical Programming, 35:32--H41,1986.
[3]
R. H. Byrd. On the convergence of constrained optimization methods with accurate
Hessian information on a subspace. SlAM Journal on Numerical Analysis, 27:141--H
153, 1990.
[4] T. F. Coleman. On characterizations of superlinear convergence for constrained
optimization. In Eugene L. Ailgower and Kurt Georg, editors, Computational So-
lution of Nonlinear Systems of Equations (Lectures in Applied Mathematics, v. 26),
pages 113--H133, Providence, Rhode Island, 1990. American Mathematical Society.
[5]
T. F. Coleman and A. R. Conn. On the local convergence of a quasi-Newton method
for the nonlinear programming problem. SlAM Journal on Numerical Analysis,
21:755--H769,1984.
[6] T. F. Coleman and D. C. Sorensen. A note on the computation of an orthonormal
basis for the null space of a matrix. Mathematical Programming, 29:234--H242,1984.
[7] J. E. Dennis and J. J. More'. A characterization of superlinear convergence and its
application to quasi-Newton methods. Math. Comp., 28:549--H560,1974.
[8] J. E. Dennis, Jr. and R. B. Schnabel. Numerical Methods for Unconstrained Opti-
m?ation and Nonlinear Equations. Prentice-Hall, Inc., 1983.
[9] J. Nocedal and M. Overton. Projected Hessian updating algorithms for nonlinearly
constrained optimization. SlAM Journal on Numerical Analysis, 22:821--H850, 1985.
12
