CS 5220
Parallelism and locality in simulation
Distributed parameter systems
17 Sep 2015
Distributed parameter problems
Mostly PDEs:
Type |
Example |
Time? |
Space dependence? |
Elliptic |
electrostatics |
steady |
global |
Hyperbolic |
sound waves |
yes |
local |
Parabolic |
diffusion |
yes |
global |
Different types involve different communication:
- Global dependence ⟹ lots of communication
- Finite wave speed ⟹ local dependence
Example: 1D heat equation
Consider flow (e.g. of heat) in a uniform rod
- Heat (Q) ∝ temperature (u) × mass (ρh)
- Heat flow ∝ temperature gradient (Fourier’s law)
Example: 1D heat equation
∂Q∂t∝h∂u∂t≈C[(u(x−h)−u(x)h)+(u(x)−u(x+h)h)]∂u∂t≈C[u(x−h)−2u(x)+u(x+h)h2]→C∂2u∂x2
Spatial discretization
Heat equation with u(0)=u(1)=0
∂u∂t=C∂2u∂x2
Spatial semi-discretization:
∂2u∂x2≈u(x−h)−2u(x)+u(x+h)h2
Spatial discretization
Yields a system of ODEs
dudt=Ch−2(−T)u=−Ch−2[2−1−12−1⋱⋱⋱−12−1−12][u1u2⋮un−2un−1]
Explicit time stepping
Approximate PDE by ODE system (“method of lines”):
dudt=Ch−2Tu
Now need a time-stepping scheme for the ODE:
- Simplest scheme is Euler:
u(t+δ)≈u(t)+u′(t)δ=(I−Cδh2T)u(t)
- Taking a time step ≡ sparse matvec with
(I−Cδh2T)
- This may not end well...
Explicit time stepping data dependence

Nearest neighbor interactions per step ⟹
finite rate of numerical information propagation
Explicit time stepping in parallel

for t = 1 to N
communicate boundary data ("ghost cell")
take time steps locally
end
Overlap communication + computation

for t = 1 to N
start boundary data sendrecv
compute new interior values
finish sendrecv
compute new boundary values
end
Batching time steps

for t = 1 to N by B
start boundary data sendrecv (B values)
compute new interior values
finish sendrecv (B values)
compute new boundary values
end
Explicit pain

Unstable for δ>O(h2)!
Implicit time stepping
- Backward Euler uses backward difference for d/dt
u(t+δ)≈u(t)+u′(t+δt)δ
- Taking a time step ≡ sparse matvec with
(I+Cδh2T)−1
- No time step restriction for stability (good!)
- But each step involves linear solve (not so good!)
- Good if you like numerical linear algebra?
Explicit and implicit
- Propagates information at finite rate
- Steps look like sparse matvec (in linear case)
- Stable step determined by fastest time scale
- Works fine for hyperbolic PDEs
Explicit and implicit
- No need to resolve fastest time scales
- Steps can be long... but expensive
- Linear/nonlinear solves at each step
- Often these solves involve sparse matvecs
- Critical for parabolic PDEs
Poisson problems
Consider 2D Poisson
−∇2u=∂2u∂x2+∂2u∂y2=f
- Prototypical elliptic problem (steady state)
- Similar to a backward Euler step on heat equation
Poisson problem discretization

ui,j=h−2(4ui,j−ui−1,j−ui+1,j−ui,j−1−ui,j+1)
Poisson problem discretization
L=[4−1−1−14−1−1−14−1−14−1−1−1−14−1−1−1−14−1−14−1−1−14−1−1−14]
Poisson solvers in 2D
Demmel, Applied Numerical Linear Algebra.
Method |
Time |
Space |
Dense LU |
N3 |
N2 |
Band LU |
N2 |
N3/2 |
Jacobi |
N2 |
N |
Explicit inv |
N2 |
N2 |
CG |
N3/2 |
N |
Red-black SOR |
N3/2 |
N |
Sparse LU |
N3/2 |
NlogN |
FFT |
NlogN |
N |
Multigrid |
N |
N |
Remember: best MFlop/s ≠ fastest solution!
General implicit picture
- Implicit solves or steady state ⟹ solving systems
- Nonlinear solvers generally linearize
- Linear solvers can be
- Direct (hard to scale)
- Iterative (often problem-specific)
- Iterative solves boil down to matvec!
PDE solver summary
Can be implicit or explicit (as with ODEs)
PDE solver summary
- Differential operators turn into local mesh stencils
- Matrix connectivity looks like mesh connectivity
- Can partition into subdomains that communicate only through
boundary data
- More on graph partitioning later
- Not all nearest neighbor ops are equally efficient!
- Depends on mesh structure
- Also depends on flops/point
CS 5220
Parallelism and locality in simulation
Distributed parameter systems
17 Sep 2015