next up previous
Next: Active Messages background Up: Low-Latency Communication on the Previous: Low-Latency Communication on the

Introduction

The IBM RISC System/6000 SP has established itself as one of the most powerful commercial massively parallel processors (MPPs) because of its fast Power2 processors, high-bandwidth interconnection network, and scalability. Nevertheless, the SP's network latency is 2 to 4 times higher than that of older MPPs such as the TMC CM-5 or Meiko CS-2, mainly due to overheads in the communication software and in the network interface architecture.

Active Messages (AM) provide simple communication primitives that are well suited as building blocks for higher layers of software such as parallel languages and complex message interfaces. Originally developed for the CM-5 [12], implementations are also available for the Meiko CS-2[10], HP workstations on FDDI ring [9], Intel Paragon, and the U-Net ATM cluster of Sun Sparcstations[13]. All the implementations are based on the Generic Active Message Specification Version 1.1 [3].

Message passing is the most widely used communication model in parallel computing and is now standardized in the Message Passing Interface (MPI)[5]. It supports blocking and nonblocking sends and receives, collective communication, noncontiguous messages, and contains facilities for dealing with groups of processes and libraries. Since much of MPI's functionality is machine-independent, a freely available MPICH [7] implementation of MPI was developed to take care of the upper layers of MPI while providing an abstract device interface (ADI) to the machine dependent layers.

This paper investigates the use of AM communication primitives as an alternative to message passing on the SP in order to reduce communication overheads and deliver high bandwidth with small messages. The first part of this paper (Section 2) describes the IBM SP implementation of AM (SP AM) which is layered directly on top of the SP's network adapter (TB2) and which does not use any IBM software on the Power2 processor. SP AM achieves a one-word message round-trip time of 51 tex2html_wrap_inline798 s which is only 4 tex2html_wrap_inline800 s higher than the raw application-to-application round-trip latency, and 40% lower than the 88 tex2html_wrap_inline802 s measured using IBM's message passing library (MPL). SP AM bulk transfers achieve an asymptotic network bandwidth of 34.3 MBytes/s which is comparable with the 34.6 MBytes/s measured using IBM MPL. Furthermore, SP AM has a message half-power point ( tex2html_wrap_inline804 ) of only 260 bytes using asynchronous bulk-transfers.

The second part of the paper demonstrates the power of AM as a communication substrate by porting Split-C, a split-phase shared-memory extension to C, and implementing MPI over SP AM. Split-C benchmarks are used in Section 3 to compare the SP to other MPPs and show that low message overhead and high message throughput compensate for the SP's high network latency.

The MPI implementation (Section 4) is built entirely over SP AM and is based on MPICH. The goal is to demonstrate that the communications core of MPI can be implemented over AM in a simple fashion and still provide very high performance. This simplicity eases portability and eases optimizations that might otherwise be unwieldy. The implementation focuses on the basic point-to-point communication primitives used by MPICH's ADI and relies on the higher-level MPICH routines for collective communication and non-contiguous sends. Extending the implementation to specialize these functions to use AM more directly would be straight-forward. The current MPI over SP AM matches MPI-F's performance for very small and very large messages and outperforms MPI-F by 10 to 30% for medium size (8 Kbyte to 20 KByte) messages. The NAS benchmarks (Section 6) achieve the same performance using MPICH over SP AM as using MPI-F.




next up previous
Next: Active Messages background Up: Low-Latency Communication on the Previous: Low-Latency Communication on the

Chris Hawblitzel
Thu Sep 19 12:22:33 EDT 1996