CS 5220
Parallelism and locality in simulation
Life
15 Sep 2015
Discrete events
Basic setup:
- Finite set of variables, updated via transition function
- Synchronous case: finite state machine
- Asynchronous case: event-driven simulation
- Synchronous example: Game of Life
Nice starting point — no discretization concerns!
Game of Life
Game of Life (John Conway):
- Live cell dies with < 2 live neighbors
- Live cell dies with > 3 live neighbors
- Live cell lives with 2–3 live neighbors
- Dead cell becomes live with exactly 3 live neighbors
Game of Life
Easy to parallelize by domain decomposition.
- Update work involves volume of subdomains
- Communication per step on surface (cyan)
Game of Life: Pioneers and Settlers
What if pattern is “dilute”?
- Few or no live cells at surface at each step
- Think of live cell at a surface as an “event”
- Only communicate events!
- This is asynchronous
- Harder with message passing — when do you receive?
Asynchronous Game of Life
How do we manage events?
- Could be speculative — assume no communication across
boundary for many steps, back up if needed
- Or conservative — wait whenever communication
possible
- possible ≢ guaranteed!
- Deadlock: everyone waits for everyone else to send
- Can get around this with NULL messages
How do we manage load balance?
- No need to simulate quiescent parts of the game!
- Maybe dynamically assign smaller blocks to processors?
- Lots of implementations use fancy bit representations
- Ch 17 and
Ch 18 of
Abrash's Game Programmer's Black Book have an old, but still
illuminating discussion of low-level (serial) optimizations
- HashLife is a triumph
of algorithm design.
How would I tackle this? Assuming matrix version, I might:
- Build a bit-packed representation
- Use a fast vectorized kernel to update small blocks
- Coarse blocking with generation skipping
- Dynamic scheduling of coarse block updates
What would you do?