CS 5220

Applications of Parallel Computers

Graph partitioning

Prof David Bindel

Please click the play button below.

Sparsity and partitioning

Want to partition sparse graphs so that

Subgraphs are same size (load balance)
Cut size is minimal (minimize communication)

Uses: sparse matvec, nested dissection solves, ...

A common theme

Common idea: partition under static connectivity

Physical network design (telephone, VLSI)
Sparse matvec
Preconditioners for PDE solvers
Sparse Gaussian elimination
Data clustering
Image segmentation

Goal: Big chunks, small “surface area” between

Graph partitioning

Given: $G = (V,E)$, possibly weights + coordinates.
We want to partition $G$ into $k$ pieces such that

Node weights are balanced across partitions.
Weight of cut edges is minimized.

Important special case: $k = 2$.

Vertex separator

Edge separator

Node to edge and back again

Can convert between node and edge separators

Node to edge: cut edges from sep to one side
Edge to node: remove nodes on one side of cut

Fine if degree bounded (e.g. near-neighbor meshes).
Optimal vertex/edge separators very different for social networks!

Cost

How many partitionings are there? If $n$ is even, \[ \begin{pmatrix} n \\ n/2 \end{pmatrix} = \frac{n!}{( (n/2)! )^2} \approx 2^n \sqrt{2/(\pi n)}. \] Finding the optimal one is NP-complete.

We need heuristics!

Partitioning with coordinates

Lots of partitioning problems from “nice” meshes
- Planar meshes (maybe with regularity condition)
- $k$-ply meshes (works for $d > 2$)
- Nice enough $\implies$ cut $O(n^{1-1/d})$ edges
  (Tarjan, Lipton; Miller, Teng, Thurston, Vavasis)
- Edges link nearby vertices
Get useful information from vertex density
Ignore edges (but can use them in later refinement)

Recursive coordinate bisection

Idea: Cut with hyperplane parallel to a coordinate axis.

Pro: Fast and simple
Con: Not always great quality

Inertial bisection

Idea: Optimize cutting hyperplane via vertex density \[\begin{aligned} \bar{\mathbf{x}} &= \frac{1}{n} \sum_{i=1}^n \mathbf{x}_i, \quad \bar{\mathbf{r}_i} = \mathbf{x}_i-\bar{\mathbf{x}} \\ \mathbf{I}&= \sum_{i=1}^n\left[ \|\mathbf{r}_i\|^2 I - \mathbf{r}_i \mathbf{r}_i^T \right] \end{aligned}\] Let $(\lambda_n, \mathbf{n})$ be the minimal eigenpair for the inertia tensor $\mathbf{I}$, and choose the hyperplane through $\bar{\mathbf{x}}$ with normal $\mathbf{n}$.

Inertial bisection

Pro: Simple, more flexible than coord planes
Con: Still restricted to hyperplanes

Random circles (Gilbert, Miller, Teng)

Stereographic projection
Find centerpoint (any plane is an even partition)
In practice, use an approximation.
Conformally map sphere, centerpoint to origin
Choose great circle (at random)
Undo stereographic projection
Convert circle to separator

May choose best of several random great circles.

Coordinate-free methods

Don’t always have natural coordinates
- Example: the web graph
- Can add coordinates? (metric embedding)
Use edge information for geometry!

Breadth-first search

Pick a start vertex $v_0$
- Might start from several different vertices
Use BFS to label nodes by distance from $v_0$
- We’ve seen this before – remember RCM?
- Or minimize cuts locally (Karypis, Kumar)
Partition by distance from $v_0$

Spectral partitioning

Label vertex $i$ with $x_i = \pm 1$. We want to minimize \[\mbox{edges cut} = \frac{1}{4} \sum_{(i,j) \in E} (x_i-x_j)^2\] subject to the even partition requirement \[\sum_i x_i = 0.\] But this is NP hard, so we need a trick.

Spectral partitioning

\[\mbox{edges cut} = \frac{1}{4} \sum_{(i,j) \in E} (x_i-x_j)^2 = \frac{1}{4} \|Cx\|^2 = \frac{1}{4} x^T L x \] where $C=$ incidence matrix, $L = C^T C = $ graph Laplacian: \[\begin{aligned} C_{ij} &= \begin{cases} 1, & e_j = (i,k) \\ -1, & e_j = (k,i) \\ 0, & \mbox{otherwise}, \end{cases} & L_{ij} &= \begin{cases} d(i), & i = j \\ -1, & (i,j) \in E, \\ 0, & \mbox{otherwise}. \end{cases} \end{aligned}\] Note: $C e = 0$ (so $L e = 0$), $e = (1, 1, 1, \ldots, 1)^T$.

Spectral partitioning

Now consider the relaxed problem with $x \in \mathbb{R}^n$: \[\mbox{minimize } x^T L x \mbox{ s.t. } x^T e = 0 \mbox{ and } x^T x = 1.\] Equivalent to finding the second-smallest eigenvalue $\lambda_2$ and corresponding eigenvector $x$, also called the Fiedler vector. Partition according to sign of $x_i$.

How to approximate $x$? Use a Krylov subspace method (Lanczos)! Expensive, but gives high-quality partitions.

Spectral partitioning

Spectral coordinates

Alternate view: define a coordinate system with the first $d$ non-trivial Laplacian eigenvectors.

Spectral partitioning = bisection in spectral coords
Can cluster in other ways as well (e.g. $k$-means)

Spectral coordinates

Refinement by swapping

Gain from swapping $(a,b)$ is $D(a) + D(b) - 2w(a,b)$, where $D$ is external - internal edge costs: \[\begin{aligned} D(a) &= \sum_{b' \in B} w(a,b') - \sum_{a' \in A, a' \neq a} w(a,a') \\ D(b) &= \sum_{a' \in A} w(b,a') - \sum_{b' \in B, b' \neq b} w(b,b') \end{aligned}\]

Greedy refinement

Start with a partition $V = A \cup B$ and refine.

$\operatorname{gain}(a,b) = D(a) + D(b) - 2w(a,b)$
Purely greedy strategy: until no positive gain
- Choose swap with most gain
- Update $D$ in neighborhood of swap; update gains
Local minima are a problem.

Kernighan-Lin

In one sweep, while no vertices marked

Choose $(a,b)$ with greatest gain
Update $D(v)$ for all unmarked $v$ as if $(a,b)$ were swapped
Mark $a$ and $b$ (but don’t swap)
Find $j$ such that swaps $1, \ldots, j$ yield maximal gain
Apply swaps $1, \ldots, j$

Kernighan-Lin

Usually converges in a few (2-6) sweeps. Each sweep is $O(|V|^3)$. Can be improved to $O(|E|)$ (Fiduccia, Mattheyses).

Further improvements (Karypis, Kumar): only consider vertices on boundary, don’t complete full sweep.

Multilevel ideas

Basic idea (same will work in other contexts):

Coarsen
Solve coarse problem
Interpolate (and possibly refine)

May apply recursively.

Maximal matching

One idea for coarsening: maximal matchings

Matching of $G = (V,E)$ is $E_m \subset E$ with no common vertices.
Maximal: cannot add edges and remain matching.
Constructed by an obvious greedy algorithm.
Maximal matchings are non-unique; some may be preferable to others (e.g. choose heavy edges first).

Coarsening via maximal matching

Collapse matched nodes into coarse nodes
Add all edge weights between coarse nodes

Software

All these use some flavor(s) of multilevel:

METIS/ParMETIS (Kapyris)
PARTY (U. Paderborn)
Chaco (Sandia)
Scotch (INRIA)
Jostle (now commercialized)
Zoltan (Sandia)

Graph partitioning: Is this it?

Consider partitioning just for sparse matvec:

Edge cuts $\neq$ communication volume
Should we minimize max communication volume?
Communication volume – what about latencies?

Some go beyond graph partitioning (e.g. hypergraph in Zoltan).

Graph partitioning: Is this it?

Additional work on:

Partitioning power law graphs
Covering sets with small overlaps

Also: Classes of graphs with no small cuts (expanders)

Graph partitioning: Is this it?

Recall: partitioning for matvec and preconditioner

Block Jacobi (or Schwarz) – relax on each partition
Want to consider edge cuts and physics
- E.g. consider edges = beams
- Cutting a stiff beam worse than a flexible beam?
- Doesn’t show up from just the topology
Multiple ways to deal with this
- Encode physics via edge weights?
- Partition geometrically?
Tradeoffs are why we need to be informed users

Graph partitioning: Is this it?

So far, considered problems with static interactions

What about particle simulations?
Or what about tree searches?
Or what about...?

Next time: more general load balancing issues