## Logistics (5 minutes)

- Second call for part-time TAs
- Status of course computing platforms

## Examples (25 minutes)

- Accelerating the centroid code.
- Accelerating an all-to-all computation on a single core.

## Discussion of Game of Life (10 minutes)

## Breakout (25 minutes)

- Suppose you have a tuned single-core dot product that is limited
by memory bandwidth (with memory at 12.4 GB/s for one core),
and sending a message between processors takes 10 microseconds.
If a parallel dot product implementation requires p-1 messages,
what is the speedup curve for running a dot product on double
precision vectors of dimension one million?
- Consider a spatial decomposition of “Game of Life” on an n-by-n
grid with periodic boundary conditions in distributed memory.
Assume we have a p-by-q grid of processors, and exchange a “halo”
of d layers of boundary cells every d steps of the simulation.
How would we model the communication and computation costs at
each step? Under what circumstances is it possible to “hide”
the communication under the computation. Use a simple model of
the type discussed toward the end of the particle lecture.

## Report out (10 minutes)

## Afternotes