next up previous
Next: Overheads Up: Active Messages Implementation Previous: Round-trip Latency

Bandwidth

Several tests are used to measure the asymptotic network bandwidth ( tex2html_wrap_inline836 ) and the data size at which the transfer rate is half the asymptotic rate ( tex2html_wrap_inline838 ). tex2html_wrap_inline840 indicates how fast the network adapter moves data from the virtual buffers to the network while tex2html_wrap_inline842 characterizes the performance of bulk transfers for small messages.

The bandwidth benchmarks involve two processing nodes and measure the one-way bandwidth for data sizes varying from 16 bytes to 1 Mbytegif. They were run using SP AM bulk transfer primitives as well as IBM MPL send and receive primitives for comparison. The blocking transfer bandwidth test measures synchronous transfer requests by issuing blocking requests ( am_store and am_get) and waiting for their completion. For MPL an mpc_bsend is followed by 0-byte mpc_brecv. The pipelined asynchronous transfer bandwidth uses a number of small requests to transfer a large block. This benchmark sends N bytes of data using tex2html_wrap_inline846 transfers of n bytes, where N is 1 MByte and n varies from 64 bytes to 1 MByte, using am_store_async and mpc_send respectively.

   figure198
Figure 3: Bandwidth of blocking and non-blocking bulk transfers.

Figure 3 shows the results. The tex2html_wrap_inline854 achieved by pipelining am_store_async and am_get is 34.3 MBytes/s compared to MPL's 34.6 MBytes/s using mpc_send. The tex2html_wrap_inline856 value of about 260 bytes for am_store_async (slightly higher for am_get) compared to about 450 bytes for mpc_send indicates that SP AM achieves better performance with small messages.

The bandwidth of SP AM's synchronous stores and gets also converges to 34.3 MBytes/s but at a slower rate due to the round-trip latency as the sender blocks after every transfer waiting for an acknowledgement. Also, for smaller transfer sizes, the performance for gets is slightly lower than for stores because of the overhead of the get request. Consequently, the bandwidth curve for synchronous gets shows an tex2html_wrap_inline858 of 3000 bytes compared to the 2800 bytes for stores. The effect of this overhead on the bandwidth vanishes as the transfer size increases, explaining the overlapping of both curves for sizes larger than 4 KBytes. Despite a higher tex2html_wrap_inline860 of 34.6 MBytes/s, synchronous transfers using MPL's sends and receives have an tex2html_wrap_inline862 greater than 3500 bytes.

Figure 3 clearly shows that SP AM's asynchronous transfers are no better than their blocking counterparts for message sizes larger than one chunk (8064 bytes), which is when the flow control kicks in.


next up previous
Next: Overheads Up: Active Messages Implementation Previous: Round-trip Latency

Chris Hawblitzel
Thu Sep 19 12:22:33 EDT 1996