# CS 5220

## Applications of Parallel Computers

### Sparse direct methods

Prof David Bindel

Please click the play button below.

### Reminder: Sparsity and reordering

Permute unknowns for better SpMV or

• Stability of Gauss elimination,
• Fill reduction in Gaussian elimination,
• Improved performance of preconditioners...

### Reminder: Sparsity and partitioning

Want to partition sparse graphs so that

• Subgraphs are same size (load balance)
• Cut size is minimal (minimize communication)

Matrices that are “almost” diagonal are good?

### Reordering for bandedness

Reverse Cuthill-McKee

• Select “peripheral” vertex $$v$$
• Order according to breadth first search from $$v$$
• Reverse ordering

### From iterative to direct

• RCM ordering is great for SpMV
• But isn’t narrow banding good for solvers, too?
• LU takes $$O(nb^2)$$ where $$b$$ is bandwidth.
• Great if there’s an ordering where $$b$$ is small!

### Skylines and profiles

• Profile solvers generalize band solvers
• Skyline storage for lower triangle: for each row $$i$$,
• Start and end of storage for nonzeros in row.
• Contiguous nonzero list up to main diagonal.
• In each column, first nonzero defines a profile.
• All fill-in confined to profile.
• RCM is again a good ordering.

### Beyond bandedness

• Minimum bandwidth for 2D model problem? 3D?
• Skyline only gets us so much farther

### Beyond bandedness

But more general solvers have similar structure

• Ordering (minimize fill)
• Symbolic factorization (where will fill be?)
• Numerical factorization (pivoting?)
• ... and triangular solves

### Troublesome Trees

One step of Gaussian elimination completely fills this matrix!

### Terrific Trees

Full Gaussian elimination generates no fill in this matrix!

### Graphic Elimination

Consider first steps of GE

A(2:end,1)     = A(2:end,1)/A(1,1);
A(2:end,2:end) = A(2:end,2:end)-...
A(2:end,1)*A(1,2:end);

Nonzero in the outer product at $$(i,j)$$ if A(i,1) and A(j,1) both nonzero — that is, if $$i$$ and $$j$$ are both connected to 1.

General: Eliminate variable, connect remaining neighbors.

### Terrific Trees Redux

Order leaves to root $$\implies$$
on eliminating $$i$$, parent of $$i$$ is only remaining neighbor.

### Nested Dissection

• Idea: Think of block tree structures.
• Eliminate block trees from bottom up.
• Can recursively partition at leaves.

### Nested Dissection

• Rough cost estimate: how much just to factor dense Schur complements associated with separators?
• Notice graph partitioning appears again!
• And again we want small separators!

### Nested Dissection

Model problem: Laplacian with 5 point stencil (for 2D)

• ND gives optimal complexity in exact arithmetic
(George 73, Hoffman/Martin/Rose)
• 2D: $$O(N \log N)$$ memory, $$O(N^{3/2})$$ flops
• 3D: $$O(N^{4/3})$$ memory, $$O(N^2)$$ flops

### Minimum Degree

• Locally greedy strategy
• Want to minimize upper bound on fill-in
• Fill $$\leq$$ (degree in remaining graph)$$^2$$
• At each step
• Eliminate vertex with smallest degree
• Update degrees of neighbors
• Problem: Expensive to implement!
• But better varients via quotient graphs
• Variants often used in practice

### Elimination Tree

• Variables (columns) are nodes in trees
• $$j$$ a descendant of $$k$$ if eliminating $$j$$ updates $$k$$
• Can eliminate disjoint subtrees in parallel!

### Cache locality

Basic idea: exploit “supernodal” (dense) structures in factor

• e.g. arising from elimination of separator Schur complements in ND
• Other alternatives exist (multifrontal solvers)

### Pivoting

Pivoting is painful, particularly in distributed memory!

• Cholesky — no need to pivot!
• Threshold pivoting — pivot when things look dangerous
• Static pivoting — try to decide up front

What if things go wrong with threshold/static pivoting?
Common theme: Clean up sloppy solves with good residuals

### Direct to iterative

Can improve solution by iterative refinement: \begin{aligned} PAQ &\approx LU \\ x_0 &\approx Q U^{-1} L^{-1} Pb \\ r_0 &= b-Ax_0 \\ x_1 &\approx x_0 + Q U^{-1} L^{-1} P r_0 \end{aligned} Looks like approximate Newton on $$F(x) = Ax-b = 0$$.
This is just a stationary iterative method!
Nonstationary methods work, too.

### Variations on a theme

If we’re willing to sacrifice some on factorization,

• Single precision factor + double precision refinement?
• Sloppy factorizations (marginal stability) + refinement?
• Modify $$m$$ small pivots as they’re encountered (low rank updates), fix with $$m$$ steps of a Krylov solver?