Minimization of a function.
In many cases we are interested in finding the minimum of a
function
where
is a vector of
coordinates. We shall motivate the study of this problem by considering the
linear problem:
where
is a matrix and
is a vector. This is widely known and used task. A formal
solution to this problem can be written as
. The inversion of the matrix
is required.
Inversion of a matrix can be an expensive operation if the size of the matrix
is large and is (in general) proportional to
where
is the dimensionality of the matrix.
Clearly, for large matrices the computations become formidable.
It is possible to rewrite the above linear problem as a minimization problem in which we seek the minimum of the function
![]()
a minimum of
is when
, which is what we are looking for. The gradient of
-
is the vector of
derivatives of
with respect to all the components of
,
, it defines the direction of maximum change for
. In the simplest approach we are performing a search for a
minimum along one dimension defined by the direction of
.
If we come back to the specific example above we have
![]()
The new, partially optimized
coordinate is searched along the
direction
![]()
We need to determine the single
unknown
such that the
function
will be at a minimum
along the line defined by
and
.
At the minimum along that line the
scalar product of the gradient of the function and
is zero. We therefore
have

Exactly the same procedure can
be repeated at the new point
. We will have
and can search for
such that
is a minimum along
the line defined by
and
. Such a search defined by the local gradient is referred to
steepest descent search. This search is not efficient since the minimization
along the
direction may bring
us into a new point with a component of the gradient along
. This means that at some point we will have to go back and
minimize along
. It would be nice if we could set the search direction in
such a way that if we minimize along the
direction, we would
never required to minimize along that direction again. If we have such a
wonderful algorithm, a system of N dimensions will minimize to the global
minimum after N linear optimizations. The computational complexity of the above
problem is to operate with a matrix on a coordinate vector N times. If the
matrix is sparse, this can be more efficient than matrix inversion.
Conjugate Gradient (CG) algorithm is doing exactly that for a quadratic system. The system we have above is indeed quadratic. There is no such guarantee for systems that are not quadratic in the variables. However close to minima any system is indeed quadratic and the CG seems to work quite well in general.
How is CG doing it?
Rather than minimizing along
, we minimize along a new direction
. The direction
is set in such a way
that at the minimum along the line
the function is
minimized with respect to the
direction as well.
is determined as a mix of the previous direction and the
current gradient.
. The unknown
parameter
is determined from
the requirement that the gradient of the function at
will be orthogonal to
. Hence we have two unknowns -
and two conditions to satisfy
. Solving for the conditions we have
and 
The next low point is

and so on…