CS 5220
Parallelism and locality in simulation
Particle systems
15 Sep 2015
Particle simulation
Particles move via Newton (F=ma), with
- External forces: ambient gravity, currents, etc.
- Local forces: collisions, Van der Waals (1/r6), etc.
- Far-field forces: gravity and electrostatics (1/r2), etc.
- Simple approximations often apply (Saint-Venant)
A forced example
Example force:
fi=∑jGmimj(xj−xi)r3ij(1−(arij)4),rij=‖
- Long-range attractive force (r^{-2})
- Short-range repulsive force (r^{-6})
- Go from attraction to repulsion at radius a
A simple serial simulation
In Matlab, we can write
npts = 100;
t = linspace(0, tfinal, npts);
[tout, xyv] = ode113(@fnbody, ...
t, [x; v], [], m, g);
xout = xyv(:,1:length(x))';
... but I can’t call ode113 in C in parallel (or can I?)
A simple serial simulation
Maybe a fixed step leapfrog will do?
npts = 100;
steps_per_pt = 10;
dt = tfinal/(steps_per_pt*(npts-1));
xout = zeros(2*n, npts);
xout(:,1) = x;
for i = 1:npts-1
for ii = 1:steps_per_pt
x = x + v*dt;
a = fnbody(x, m, g);
v = v + a*dt;
end
xout(:,i+1) = x;
end
Pondering particles
- Where do particles “live” (esp. in distributed memory)?
- Decompose in space? By particle number?
- What about clumping?
- How are long-range force computations organized?
- How are short-range force computations organized?
- How is force computation load balanced?
- What are the boundary conditions?
- How are potential singularities handled?
- What integrator is used? What step control?
External forces
Simplest case: no particle interactions.
- Embarrassingly parallel (like Monte Carlo)!
- Could just split particles evenly across processors
- Is it that easy?
- Maybe some trajectories need short time steps?
- Even with MC, load balance may not be entirely trivial.
Local forces

- Simplest all-pairs check is O(n^2) (expensive)
- Or only check close pairs (via binning, quadtrees?)
- Communication required for pairs checked
- Usual model: domain decomposition
Local forces: Communication

Minimize communication:
- Send particles that might affect a neighbor
soon
- Trade extra computation against communication
- Want low surface area-to-volume ratios on domains
Local forces: Load balance

- Are particles evenly distributed?
- Do particles remain evenly distributed?
- Can divide space unevenly (e.g. quadtree/octtree)
Far-field forces

- Every particle affects every other particle
- All-to-all communication required
- Overlap communication with computation
- Poor memory scaling if everyone keeps everything!
- Idea: pass particles in a round-robin manner
Passing particles for far-field forces

copy local particles to current buf
for phase = 1:p
send current buf to rank+1 (mod p)
recv next buf from rank-1 (mod p)
interact local particles with current buf
swap current buf with next buf
end
Passing particles for far-field forces
Suppose n = N/p particles in buffer. At each phase
\begin{align}
t_{\mathrm{comm}} &\approx \alpha + \beta n \\
t_{\mathrm{comp}} &\approx \gamma n^2
\end{align}
So we can mask communication with computation if
n \geq \frac{1}{2\gamma} \left( \beta + \sqrt{\beta^2 + 4 \alpha \gamma}
\right) > \frac{\beta}{\gamma}
More efficient serial code
\implies larger n needed to mask communication!
\implies worse speed-up as p gets larger (fixed N)
but scaled speed-up (n fixed) remains unchanged.
Far-field forces: particle-mesh methods

Consider r^{-2} electrostatic potential interaction
- Enough charges looks like a continuum!
- Poisson equation maps charge distribution to potential
- Use fast Poisson solvers for regular grids (FFT, multigrid)
- Approximation depends on mesh and particle density
- Can clean up leading part of approximation error
Far-field forces: particle-mesh methods

- Map particles to mesh points (multiple strategies)
- Solve potential PDE on mesh
- Interpolate potential to particles
- Add correction term – acts like local force
Far-field forces: tree methods

- Distance simplifies things
- Andromeda looks like a point mass from here?
- Build a tree, approximating descendants at each node
- Several variants: Barnes-Hut, FMM, Anderson’s method
- More on this later in the semester
Summary of particle example
- Model: Continuous motion of particles
- Could be electrons, cars, whatever...
- Step through discretized time
- Local interactions
- Relatively cheap
- Load balance a pain
- All-pairs interactions
- Obvious algorithm is expensive (O(n^2))
- Particle-mesh and tree-based algorithms help
An important special case of lumped/ODE models.