# CS 5220 ## Parallelism and locality in simulation ### Distributed parameter systems ## 17 Sep 2015

### Distributed parameter problems

Mostly PDEs:

Type Example Time? Space dependence?
Elliptic electrostatics steady global
Hyperbolic sound waves yes local
Parabolic diffusion yes global

Different types involve different communication:

• Global dependence $\implies$ lots of communication
• Finite wave speed $\implies$ local dependence

### Example: 1D heat equation

Consider flow (e.g. of heat) in a uniform rod

• Heat ($Q$) $\propto$ temperature ($u$) $\times$ mass ($\rho h$)
• Heat flow $\propto$ temperature gradient (Fourier’s law)

### Example: 1D heat equation

\begin{aligned} \frac{\partial Q}{\partial t} \propto h \frac{\partial u}{\partial t} &\approx C \left[ \left( \frac{u(x-h)-u(x)}{h} \right) + \left( \frac{u(x)-u(x+h)}{h} \right) \right] \\ \frac{\partial u}{\partial t} &\approx C \left[ \frac{u(x-h)-2u(x)+u(x+h)}{h^2} \right] \rightarrow C \frac{\partial^2 u}{\partial x^2} \end{aligned}

### Spatial discretization

Heat equation with $u(0) = u(1) = 0$ $$\frac{\partial u}{\partial t} = C \frac{\partial^2 u}{\partial x^2}$$

Spatial semi-discretization: $$\frac{\partial^2 u}{\partial x^2} \approx \frac{u(x-h)-2u(x)+u(x+h)}{h^2}$$

### Spatial discretization

Yields a system of ODEs \begin{align} \frac{du}{dt} & = C h^{-2} (-T) u \\ &= -C h^{-2} \begin{bmatrix} 2 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & \ddots & \ddots & \ddots & \\ & & -1 & 2 & -1 \\ & & & -1 & 2 \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_{n-2} \\ u_{n-1} \end{bmatrix} \end{align}

### Explicit time stepping

Approximate PDE by ODE system (“method of lines”): $$\frac{du}{dt} = C h^{-2} T u$$ Now need a time-stepping scheme for the ODE:

• Simplest scheme is Euler: $$u(t+\delta) \approx u(t) + u'(t) \delta = \left( I - C \frac{\delta}{h^2} T \right) u(t)$$
• Taking a time step $\equiv$ sparse matvec with $\left( I - C \frac{\delta}{h^2} T \right)$
• This may not end well...

### Explicit time stepping data dependence

Nearest neighbor interactions per step $\implies$
finite rate of numerical information propagation

### Explicit time stepping in parallel


for t = 1 to N
communicate boundary data ("ghost cell")
take time steps locally
end


### Overlap communication + computation


for t = 1 to N
start boundary data sendrecv
compute new interior values
finish sendrecv
compute new boundary values
end


### Batching time steps


for t = 1 to N by B
start boundary data sendrecv (B values)
compute new interior values
finish sendrecv (B values)
compute new boundary values
end


### Explicit pain

Unstable for $\delta > O(h^2)$!

### Implicit time stepping - Backward Euler uses backward difference for $d/dt$ $$u(t+\delta) \approx u(t) + u'(t + \delta t) \delta$$ - Taking a time step $\equiv$ sparse matvec with $\left( I + C \frac{\delta}{h^2} T \right)^{-1}$ - No time step restriction for stability (good!) - But each step involves linear solve (not so good!) - Good if you like numerical linear algebra?

### Explicit and implicit

• Propagates information at finite rate
• Steps look like sparse matvec (in linear case)
• Stable step determined by fastest time scale
• Works fine for hyperbolic PDEs

### Explicit and implicit

• No need to resolve fastest time scales
• Steps can be long... but expensive
• Linear/nonlinear solves at each step
• Often these solves involve sparse matvecs
• Critical for parabolic PDEs

### Poisson problems

Consider 2D Poisson $$-\nabla^2 u = \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = f$$

• Prototypical elliptic problem (steady state)
• Similar to a backward Euler step on heat equation

### Poisson problem discretization

$$u_{i,j} = h^{-2} \left( 4u_{i,j}-u_{i-1,j}-u_{i+1,j}-u_{i,j-1}-u_{i,j+1} \right)$$

### Poisson problem discretization

$$L = \left[ \begin{array}{ccc|ccc|ccc} 4 & -1 & & -1 & & & & & \\ -1 & 4 & -1 & & -1 & & & & \\ & -1 & 4 & & & -1 & & & \\ \hline -1 & & & 4 & -1 & & -1 & & \\ & -1 & & -1 & 4 & -1 & & -1 & \\ & & -1 & & -1 & 4 & & & -1 \\ \hline & & & -1 & & & 4 & -1 & \\ & & & & -1 & & -1 & 4 & -1 \\ & & & & & -1 & & -1 & 4 \end{array} \right]$$

### Poisson solvers in 2D

Demmel, Applied Numerical Linear Algebra.

Method Time Space
Dense LU $N^3$ $N^2$
Band LU $N^2$ $N^{3/2}$
Jacobi $N^2$ $N$
Explicit inv $N^2$ $N^2$
CG $N^{3/2}$ $N$
Red-black SOR $N^{3/2}$ $N$
Sparse LU $N^{3/2}$ $N \log N$
FFT $N \log N$ $N$
Multigrid $N$ $N$

Remember: best MFlop/s $\neq$ fastest solution!

### General implicit picture - Implicit solves or steady state $\implies$ solving systems - Nonlinear solvers generally linearize - Linear solvers can be - Direct (hard to scale) - Iterative (often problem-specific) - Iterative solves boil down to matvec!
### PDE solver summary Can be implicit or explicit (as with ODEs) - Explicit (sparse matvec) - fast, but short steps? - works fine for hyperbolic PDEs - Implicit (sparse solve) - Direct solvers are hard! - Sparse solvers turn into matvec again
### PDE solver summary - Differential operators turn into local mesh stencils - Matrix connectivity looks like mesh connectivity - Can partition into subdomains that communicate only through boundary data - More on graph partitioning later - Not all nearest neighbor ops are equally efficient! - Depends on mesh structure - Also depends on flops/point