Due: Wednesday, March 10 by 5 pm.

Problem

The purpose of this assignment is introduction to programming in shared and distributed memory models. Your goal is to parallelize a toy particle simulator with a short-range potential force (Lennard-Jones) and an external gravitational field. Some starting point code is documented here. Our current version uses a naive algorithm that does not use spatial locality. Your mission, should you choose to accept it, is to fix that. You should report on your progress in two weeks, including three things:

  1. Characterize the performance of all three basic codes as a function of the number of particles in the system and the number of processors. Simple models are nice here, but mostly I want to see empirical timing data.
  2. Improve the complexity by changing from the naive O(n2) algorithm to an O(n) (roughly) force evaluation algorithm based on spatial partitioning. You will want to modify the parallel algorithms to use this spatial decomposition, and you will probably want to do something with the communication to keep it from dominating the computational cost in your new code!
  3. If you have time, play a little! Improve or extend the code in some way that appeals to you, either by doing something clever with the time integrator, adding error diagnostics (is monitoring conservation of energy and momentum enuough?), doing some dynamic load balancing, or by doing some performance tuning on the serial implementation. Feel free to suggest your own ideas as well!

Source Code

You may start with the serial and parallel implementations supplied below. All of them run in O(n2) time.

hw2code.pdf
a document explaining the structure of the code using my literate programming tool dsbweb.
nbserial.c
a serial implementation,
nbomp.c
a shared memory parallel implementation done using OpenMP,
nbmpi.c
a distributed memory parallel implementation done using MPI,
common.c, common.h
common numerical functionality (leapfrog integrator and Lennard-Jones evaluation routines)
params.c, params.h
simulation parameters and command-line processing
Makefile
a makefile that should work on crocus
run_serial.qsub, run_omp.qsub, run_mpi.qsub
sample script files for Crocus
nbody.tgz
all above files (and more!) in one tarball.

You may consider using the following Java visualization program to check the correctness of the result produced by your code: Bouncy.jar. If you feel like hacking on it, here is Bouncy.java.

Submission

You may work in groups of 2 or 3. One person in your group should be a non-CS student (if possible), but otherwise you're responsible for finding a group. You do not have to have the same groups as last time.

Here is a list of items you might show in your report:

Resources

OpenMP tutorial (LLNL), OpenMP tutorial, OpenMP specifications, MPI tutorial (LLNL), and MPI specifications.