Project 3: Adaptive Splines
Due: 2022-04-22
A polyharmonic spline is a type of function approximator for functions from
where
Common examples include cubic splines
We will touch on two optimization questions in this project. First, how do we use splines to help optimization? Second, how do we best choose the coefficients
Code setup
In the interest of making all our lives easier, we have a fair amount of starter code for this project. You are not responsible for all of it in detail, but I certainly recommend at least becoming familiar with the interfaces!
Basis functions and evaluation
As a point of comparison, we will want to look at the ordinary spline procedure (where there are as many centers as there are data points, and data is given at the centers). To start with, we need
Hϕ (generic function with 1 method)
When evaluating the spline will be convenient to refer to
spline_eval (generic function with 1 method)
spline_eval (generic function with 2 methods)
We are frequently going to care about the residual on our test function (over many evaluation points), so we also provide small helpers for that computation and for plotting the spline and the residual.
plot_spline (generic function with 1 method)
Fitting the spline
We consider two methods for fitting the spline. For both, we need to compute two matrices:
The kernel matrix
whereThe polynomial part
consisting of rows of the form
spline_Π (generic function with 1 method)
Later, we will also have need for the Jacobian of
Dspline_K (generic function with 1 method)
Standard spline fitting
In the standard scheme, we have one center at each data point. Fitting the spline involves solving the system
The first
spline_fit (generic function with 1 method)
spline_fit (generic function with 2 methods)
Least squares fitting
When the number of data points is large, we may want to use a smaller number of centers as the basis for our approximation. That is, we consider an approximator of the form
where the locations
spline_fit (generic function with 3 methods)
spline_fit (generic function with 4 methods)
Sampling strategies
Splines are fit to sampled data, so we want a number of ways to draw samples. Our example is 2D, so we will stick to 2D samplers.
We start with samplers on a regular mesh of points
meshgrid (generic function with 1 method)
meshgrid_uniform (generic function with 1 method)
meshgrid_cheb (generic function with 1 method)
In higher-dimensional spaces, it is tempting to choose random samples. But taking independent uniform draws is not an especially effective way of covering a space – random numbers tend to clump up. For this reason, low discrepancy sequences are often a better basis for sampling than (pseudo)random draws. There are many such generators; we use a relatively simple one based on an additive recurrence with a multiplier based on the "generalized golden ratio".
kronecker_quasirand (generic function with 2 methods)
We will provide both random and quasi-random samplers.
qrand2d (generic function with 1 method)
Optimizers
Most of the time if you are using standard optimizers like Levenberg-Marquardt or Newton, you should use someone else's code and save your ingenuity for problem formulation and initial guesses. In this spirit, we provide two solvers that we will see in class in a couple weeks: a Levenberg-Marquardt implementation and a Newton implementation.
We will see the Levenberg-Marquardt algorithm for nonlinear least squares as part of our regular lecture material, but we also include it here. This version uses logic for dynamically scaling the damping parameter (courtesy Nocedal and Wright).
levenberg_marquardt (generic function with 1 method)
The trust-region Newton solver optimizes a quadratic model over some region in which it is deemed trustworthy. The model does not have to be convex!
solve_tr (generic function with 1 method)
tr_newton (generic function with 1 method)
Incremental QR and forward selection
The forward selection algorithm is a factor selection process for least squares problems in which factors are added to a model in a greedy fashion in order to minimize the residual at each step.
For fast implementation, it will be convenient for us to be able to extend a QR factorization by adding additional columns to an existing factorization. This is easy enough to do in practice, but requires a little poking around in the guts of the Julia QR code.
windowed_qr (generic function with 1 method)
windowed_qr (generic function with 2 methods)
We also will want to compute residuals for many of our algorithms.
resid_ls! (generic function with 1 method)
resid_ls! (generic function with 2 methods)
solve_ls! (generic function with 1 method)
forward_selection (generic function with 1 method)
Some sanity checks on a random test case (keep first 5, choose 5 more):
- 1
2
3
4
5
11
12
20
17
6
selected
Does A[:,I] = QR? Relerr 3.733265243763775e-16
Does r = b-A[:,I]*x? Relerr 5.105187043779148e-16
Does A[:,I]'*r = 0? Relerr 1.2869532583151618e-17
Test function
The Himmelblau function is a standard multi-modal test function used to explore the properties of optimization methods. We will take the log of the Himmelblau function (shifted up a little in order to avoid log-of-zero issues) as our running example.
The Himmelblau function has a local maximum at
Tasks
Express the Himmelblau function as a nonlinear least squares function and use the provided Levenberg-Marquardt solver to find the minimum at
from a starting guess at . Use the monitor function to produce a semilog convergence plot.Run ten steps of an (unguarded) Newton solver for the Himmelblau function. It may save you some algebra to remember the relation between the Hessian of a nonlinear least squares function and the Gauss-Newton approximation. What does it converge to from the starting guess
? Plot convergence (gradient norm vs iteration) again.Run the same experiment with the Newton solver, but this time using the trust region version that is provided.
Optimizing on a surrogate
Let's now fit a surrogate to the Himmelblau function sampled on 50 samples drawn from our quasi-random number generator.
Does the spline have minima close to the minima of the true function? The answer depends how "nice" the minimizers are and how good the approximation error is. Concretely, suppose
(We can get better bounds if we can control the error in the derivative approximation, but we will leave this alone in the current assignment.)
Putting aside theoretical considerations for a moment, let's try to find the minimizer (or at least a local minimizer) of the spline approximation
Best point found at 3.473684210526316, -0.9473684210526315
Tasks
Let
be a strong local minimizer of , where is twice continuously differentiable and the Hessian has a Lipschitz constant in the operator 2-norm, and that . Using a Taylor expansion about , show that is sufficient to guarantee that .Compute the smallest singular value of the Hessian of the log-transformed Himmelblau function at
. Also compute the approximation error in the spline fit at and use that as an approximate and plug into the estimate
Take ten steps of Newton to find minima the minimum of the spline close to the point found above. Plot the gradient norm on a semilog scale to verify quadratic convergence. What happens with a starting guess of
?
For the first task, you may use without proof that
Non-adaptive sampling strategies
If we want to use a spline as a surrogate for optimization, we need it to be sufficiently accurate, where what is "sufficient" is determined by the behavior of the Hessian near the desired minima. For the rest of this assignment, we focus on making more accurate splines.
We start by defining a highly refined "ground truth" mesh of
6.80239
6.67264
6.56744
6.4912
6.44594
6.43065
6.44158
6.47325
6.51967
6.57524
6.63529
6.69625
6.75551
6.81124
6.86228
6.90786
6.94756
6.9812
7.00873
7.03024
6.98323
7.02582
7.0773
7.13835
7.20922
7.28979
7.37953
7.47761
7.58297
7.69439
We would like to choose a sample set that is going to minimize the RMS error. We will talk about adapting the sample later; for now, let's consider different ways that we could lay out a set of about 50 sample points:
We could choose a 7-by-7 mesh (uniform or Chebyshev)
We could choose 50 points at random
We could choose 50 points from a low-discrepancy sequence (quasi-random)
Let's set up an experimental framework to compare the RMS error on the ground truth mesh for each of these approaches.
test_sample (generic function with 1 method)
test_sample (generic function with 2 methods)
Tasks
Run experiments with the test_sample
functions for both the standard spline fitting and least squares fitting against the 4000-point data set. Comment on anything you observe from your experiments. Feel free to play with the parameters (e.g. number of points).
Forward stepwise regression
Forward stepwise regression is a factor selection method in which we greedily choose the next factor from some set of candidates based on which candidate will most improve the solution. We have provided you with a reference implementation of the algorithm. Your task is to use this method to choose centers from a candidate set of points in order to improve your least squares fit.
Tasks
Complete the spline_forward_regression
function below, then run the code with 500 points chosen from our quasirandom sequence. Run the test harness above to see how much the RMS error measure improves.
spline_forward_regression (generic function with 1 method)
Levenberg-Marquardt refinement
In the lecture notes from 4/11, we discuss the idea of variable projection algorithms for least squares. We can phrase the problem of optimizing the centers in this way; that is, we let
where
is treated as a function of the center locations. The regularization parameter
To save us all some trouble, I will provide the code to compute
spline_rproj (generic function with 1 method)
spline_Jproj (generic function with 1 method)
As usual, we need a finite difference check to prevent programming errors.
Finite difference relerr in Jacobian computation: 1.6565625055732684e-7
Tasks
Use the provided Levenberg-Marquardt solver to refine the center locations picked by the forward selection algorithm. This is a tricky optimization, and you may need to fiddle with solver parameters to get something you are happy about. As usual, you should also provide a convergence plot.
Newton refinement
Levenberg-Marquardt is popular for nonlinear least squares problems because it only requires first derivatives and often still gives very robust convergence. But the convergence behavior for this problem is disappointing, and we have all the derivatives we need if we only have the fortitude to use them.
As in the Levenberg-Marquardt case, I will not require you to write the function evaluation and derivatives.
spline_ρproj (generic function with 1 method)
Finite difference checks:
For the gradient: 7.256850871224047e-8
For the Hessian: 2.4552525270252068e-8
Tasks
Use the provided Newton solver to refine the center locations picked by one of the earlier algorithms (I recommend using the Levenberg-Marquardt output as a starting point). Give a convergence plot – do you see quadratic convergence?