Minimization of a function.

 

In many cases we are interested in finding the minimum of a function  where  is a vector of coordinates. We shall motivate the study of this problem by considering the linear problem:

 

*     

 

where  is a matrix and is a vector. This is widely known and used task. A formal solution to this problem can be written as . The inversion of the matrix  is required. Inversion of a matrix can be an expensive operation if the size of the matrix is large and is (in general) proportional to  where is the dimensionality of the matrix.

Clearly, for large matrices the computations become formidable.

 

It is possible to rewrite the above linear problem as a minimization problem in which we seek the minimum of the function

 

 

a minimum of  is when , which is what we are looking for. The gradient of   -   is the vector of derivatives of   with respect  to all the components of , , it defines the direction of maximum change for  . In the simplest approach we are performing a search for a minimum along one dimension defined by the direction of .

 

If we come back to the specific example above we have

The new, partially optimized coordinate is searched along the  direction

We need to determine the single unknown  such that the function  will be at a minimum along the line defined by  and .

At the minimum along that line the scalar product of the gradient of the function and  is zero. We therefore have

Exactly the same procedure can be repeated at the new point . We will have  and can search for  such that  is a minimum along the line defined by  and . Such a search defined by the local gradient is referred to steepest descent search. This search is not efficient since the minimization along the  direction may bring us into a new point with a component of the gradient along . This means that at some point we will have to go back and minimize along . It would be nice if we could set the search direction in such a way that if we minimize along the  direction, we would never required to minimize along that direction again. If we have such a wonderful algorithm, a system of N dimensions will minimize to the global minimum after N linear optimizations. The computational complexity of the above problem is to operate with a matrix on a coordinate vector N times. If the matrix is sparse, this can be more efficient than matrix inversion.

 

Conjugate Gradient (CG) algorithm is doing exactly that for a quadratic system. The system we have above is indeed quadratic. There is no such guarantee for systems that are not quadratic in the variables. However close to minima any system is indeed quadratic and the CG seems to work quite well in general.

 

How is CG doing it?

 

Rather than minimizing along , we minimize along a new direction . The direction  is set in such a way that at the minimum along the line  the function is minimized with respect to the  direction as well.  is determined as a mix of the previous direction and the current gradient.  . The unknown parameter  is determined from the requirement that the gradient of the function at  will be orthogonal to . Hence we have two unknowns - and two conditions to satisfy . Solving for the conditions we have

 and

The next low point is

and so on…

*