Several tests are used to measure the asymptotic network bandwidth ( ) and the data size at which the transfer rate is half the asymptotic rate ( ). indicates how fast the network adapter moves data from the virtual buffers to the network while characterizes the performance of bulk transfers for small messages.
The bandwidth benchmarks involve two processing nodes and measure the one-way bandwidth for data sizes varying from 16 bytes to 1 Mbyte. They were run using SP AM bulk transfer primitives as well as IBM MPL send and receive primitives for comparison. The blocking transfer bandwidth test measures synchronous transfer requests by issuing blocking requests ( am_store and am_get) and waiting for their completion. For MPL an mpc_bsend is followed by 0-byte mpc_brecv. The pipelined asynchronous transfer bandwidth uses a number of small requests to transfer a large block. This benchmark sends N bytes of data using transfers of n bytes, where N is 1 MByte and n varies from 64 bytes to 1 MByte, using am_store_async and mpc_send respectively.
Figure 3: Bandwidth of blocking and non-blocking bulk transfers.
Figure 3 shows the results. The achieved by pipelining am_store_async and am_get is 34.3 MBytes/s compared to MPL's 34.6 MBytes/s using mpc_send. The value of about 260 bytes for am_store_async (slightly higher for am_get) compared to about 450 bytes for mpc_send indicates that SP AM achieves better performance with small messages.
The bandwidth of SP AM's synchronous stores and gets also converges to 34.3 MBytes/s but at a slower rate due to the round-trip latency as the sender blocks after every transfer waiting for an acknowledgement. Also, for smaller transfer sizes, the performance for gets is slightly lower than for stores because of the overhead of the get request. Consequently, the bandwidth curve for synchronous gets shows an of 3000 bytes compared to the 2800 bytes for stores. The effect of this overhead on the bandwidth vanishes as the transfer size increases, explaining the overlapping of both curves for sizes larger than 4 KBytes. Despite a higher of 34.6 MBytes/s, synchronous transfers using MPL's sends and receives have an greater than 3500 bytes.
Figure 3 clearly shows that SP AM's asynchronous transfers are no better than their blocking counterparts for message sizes larger than one chunk (8064 bytes), which is when the flow control kicks in.