Proj 1: Harmonious learning
There are many problems that involve optimizing some objective function by making local adjustments to a structure or graph. For example:
If we want to reinforce a truss with a limited budget, where should we add new beams (or strengthen old ones)?
After a failure in the power grid, how should lines be either taken out of service or put in service to ensure no other lines are overloaded?
In a road network, how will road closures or rate-limiting of on-ramps affect congestion (for better or worse)?
In a social network, which edges are most critical to spreading information or influence to a target audience?
For our project, we will consider a simple method for graph interpolation. We are given a (possibly weighted) undirected graph on
Logistics
You are encouraged to work in pairs on this project. You should produce short report addressing the analysis tasks, and a few short codes that address the computational tasks. You may use any Julia functions you might want.
Most of the code in this project will be short, but that does not make it easy. You should be able to convince both me and your partner that your code is right. A good way to do this is to test thoroughly. Check residuals, compare cheaper or more expensive ways of computing the same thing, and generally use the computer to make sure you don't commit silly errors in algebra or coding. You will also want to make sure that you satisfy the efficiency constraints stated in the tasks.
Background
The (combinatorial) graph Laplacian matrix occurs often when using linear algebra to analyze graphs. For an undirected graph on vertices
The unweighted case corresponds to
In our project, we seek to solve problems of the form
where the leading indices correspond to nodes in the graph at which
i.e. the value at
Code setup
We will use the California road network data from the SNAP data set; to retrieve it, download the roadNet-CA.txt
file from the class web page. The following loader function will read in the topology and form the graph Laplacian in compressed sparse column format (SparseMatricCSC
in Julia). This is a big enough network that you will not want to form the graph Laplacian or related matrices in dense form. On the other hand, because it is a moderate-sized planar graph, sparse Cholesky factorization on
load_network (generic function with 1 method)
2
3
467
1
7
383
1
4
3
1955756
1
1
1
2
2
2
3
3
4
1957027
form_laplacian (generic function with 1 method)
For the tasks in this assignment, it is useful to carry around more than just the graph Laplacian. We also want to keep track of which nodes have associated known values and what those values are. For these purposes, it is helpful to use a Julia structure that we pass around.
LabeledLaplacian
We will set this up so that we can easily add node values and adjust edge weights. We do this differently depending on whether or not we have already factored (part of) the Laplacian. If we have an existing factorization, we will keep track of the updates that we would like to apply separately, and handle them via a bordered system approach described below.
new_value! (generic function with 1 method)
new_value! (generic function with 2 methods)
update_edge! (generic function with 1 method)
update_edge! (generic function with 2 methods)
We also provide a factor
routine to compute (or re-compute) the Cholesky factorization of the Laplacian matrix. We only do the computation if there seems to be need; if there is an existing factorization and no updates have been made (new values or edge weight adjustments), we will keep the existing factorization as is.
factor! (generic function with 1 method)
Finally, we provide a residual check routine to provide some reassurance about the correctness of our solutions.
residual (generic function with 2 methods)
And we provide some helper functions for working with the LabeledLaplacian
objects.
inactive (generic function with 1 method)
Task 1
For the first part of the assignment, we will improve on a naive solve!
command (given below) that always forces a re-factorization.
solve! (generic function with 1 method)
Our modified version of solve!
will let us adapt to new values or edge weight updates without recomputing the Cholesky factorization. We can do this by computing a bordered linear system
To enforce additional boundary conditions, we use each column of
and we can write
I split this code into two pieces: a compute_bordering
function that produces B
, C
, and f
in the system above, and the solve!
function hat solves the actual system by block Gaussian elimination.
Your updated code should take
Sanity checking
We provide a simple test case to check the correctness of our bordered system approach. This will start off correct (giving small residual values) for the naive "refactor every time" approach; you should ideally keep the residuals small while improving the speed!
test_task1 (generic function with 1 method)
Initial solve: 1.661338208 s, residual norm 1.1692050219556296e-12
New values: 1.599754459 s, residual norm 2.174866173668037e-12
Edge update: 1.587610917 s, residual norm 2.22216338865832e-12
Additional questions
We have to assign some values before we are able to run the solver. Why can't we safely factor the full Laplacian immediately?
The largest and smallest entries of the solution vector
should always be entries where we've specified values. Why is this?
Task 2
Again using the bordered system idea from the first part, we now want to consider the problem of leave-one-out cross-validation of the assigned values at the nodes. That is, for a given node
Complete the cross_validate
function below to return the difference
A useful building block will be a version of the solver code that solves systems
Once we have this building block, it is convenient let active
, block 2 is j
, and block 3 is the rest of .!active
. We know that
and subtracting the two equations gives
where
and eliminating the first block gives the system
Note that
cross_validate (generic function with 1 method)
The cross-validation can be done in about the same time as the fast solves described before using the bordered solver approach. We give two tests to compare the fast approach against a reference computation (the second a little harder than the first). The reference version takes a few seconds on my machine (vs a fraction of a second).
test_cross_validate1 (generic function with 1 method)
Slow computation: 1.6946128762480739
Fast computation: 0.0
Relerr: 1.0
Time (fast): 0.0
test_cross_validate2 (generic function with 1 method)
Slow computation: 1.6946128762480739
Fast computation: 0.0
Relerr: 1.0
Time (fast): 2.08e-7
Task 3
Using bordered systems lets us recompute the solution quickly after we adjust the edge weights. But what if we want to compute the sensitivity of the value at some target node to small changes to {\em any} of the edges? That is, for a target node
Assuming the edge_sensitivity
to carry out this computation. Note that you should not require new factorizations if you already have one; that is, your code should ideally use the bordered system formalism to incorporate any new boundary conditions or edge updates added to the system since the last factorization.
As in task 2, you should also provide a sanity check code.
edge_sensitivity (generic function with 1 method)
test_edge_sensitivity (generic function with 1 method)
Fast sensitivity on (13,14): 27
Slow edge sensitivity on (13,14): -0.0011021940604649672
Relerr: 24497.593629446605
Elapsed time: 0.129984959
Estimated via bordered solves: 6.06597073579445e6