Active Messages: an Efficient Communication Architecture for Multiprocessors

Thorsten von Eicken

Dipl. Ing. (Eidgenoessische Hochschule, Zuerich) 1987

Committee in charge:

November, 1993


Abstract

This dissertation presents Active Messages, a new communication architecture for massively parallel multiprocessors. Active Messages achieves an order of magnitude performance improvement over conventional communication layers and efficiently supports a variety of parallel programming models, including message passing, shared memory and dataflow. The key to this success is that Active Messages is predicated on the use of high-level parallel languages and takes the entire system into account, from the micro-architectural level up to the language system. This dissertation develops a conceptual framework around the notion of a communication architecture extending the sequential instruction set architecture. The focus is on integrating communication into the compilation process and allowing trade-offs to be made across all hardware and software layers of the system. Understanding how the various layers of abstraction interact allows the key communications issues to be addressed at the right level in an approach analogous to RISC architectures.

The analysis of traditional systems (e.g., message passing, shared memory, message driven, and dataflow) concludes that these provide too complex and rigid primitives that are tailored to a specific programming model. In contrast, Active Messages provides simple communication primitives appropriate for code generation and decouples storage allocation and sophisticated scheduling from communication proper. This enables powerful compiler optimizations and results in a more efficient system as demonstrated through the analysis of programs written in Split-C and Id90. On a single platform, the CM-5, Active Messages supports message passing, message driven, dataflow, and NUMA shared memory programming models today as efficiently as more specialized hardware, and, coupled with hardware support for a global address space, is appropriate for implementing cache-coherent shared memory.

The simplicity of Active Messages make it an attractive basis for the development of new hardware support for communication. This dissertation analyzes the timing of Active Messages in detail on the CM-5 and the nCUBE/2 to uncover simple hardware changes which could double the communication performance. A number of ongoing multiprocessor hardware developments use Active Messages and promise more dramatic improvements. To date, the Active Messages communication architecture offers the most versatile and efficient avenue towards multiprocessors which support a full spectrum of parallel programming languages.


Table of Contents

(The links refer to postscript files.)

Title page i

Abstract

Table of Contents iii

List of Figures vii

List of Tables ix

Acknowlegments x

1 Introduction 1

1.1 The Architecture Design Process 2
1.1.1 The RISC argument 2
1.2 Communication Architecture Design 4
1.3 Previous Approaches to Communication Architecture 5
1.3.1 Message driven architectures 5
1.3.2 Message passing architectures 5
1.3.3 Shared memory architectures 7
1.3.4 Conclusions 7
1.4 The Active Messages Approach 9
1.4.1 Active Messages 9
1.4.2 Contributions 9
1.4.3 Dissertation overview 10

2 Communication Architecture 11

2.1 A Vertical Approach 12
2.1.1 Computer architecture in review 12
2.1.2 Communication layers 14
2.1.3 Aspects of a communication architecture 15
2.1.4 Aspects of a communication micro-architecture 17
2.1.5 Glossary 17
2.2 Active Messages Communication Architecture 19
2.3 Metrics for Communication Performance 20
2.3.1 The LogP model of parallel computation 20
2.3.2 Simple metrics 21
2.4 Summary 22

3 Communication Micro-Architectures 23

3.1 The nCUBE/2 Micro-Architecture 25
3.1.1 Node micro-architecture 25
3.1.2 Instruction set architecture summary 26
3.1.3 Communication micro-architecture 28
3.1.4 Summary 31
3.2 The CM5 Micro-Architecture 33
3.2.1 Node micro-architecture 33
3.2.2 Instruction set architecture summary 34
3.2.3 Communication micro-architecture 36
3.2.4 Summary 40
3.3 Conclusion 41

4 Active Messages Communication Architectures 43

4.1 Active Messages 46
4.1.1 Addressing the four key issues 46
4.1.2 Active Messages example: Fetch&add 49
4.2 Active Messages Architecture on the nCUBE/2 51
4.2.1 Design constraints 51
4.2.2 Design highlights 53
4.2.3 Communication architecture interface 57
4.2.4 Micro-benchmarks 59
4.2.5 Active Messages implementation 61
4.2.6 Macro-benchmarks 68
4.2.7 Conclusions 76
4.3 Active Messages Architecture on the CM-5 78
4.3.1 Design constraints 78
4.3.2 Design highlights 79
4.3.3 Communication Architecture Interface 84
4.3.4 Micro-benchmarks 88
4.3.5 CMAM Implementation 89
4.3.6 FFT macro-benchmark 93
4.3.7 Pros and cons of asynchronous handler execution 97
4.3.8 Conclusions 100
4.4 Summary 102

5 Traditional Communication Architectures 107

5.1 Send&receive 108
5.1.1 Definition 108
5.1.2 Blocking send&receive 110
5.1.3 Non-blocking send and blocking receive 114
5.1.4 Non-blocking send and non-blocking receive 117
5.1.5 Critique 117
5.1.6 Summary 119
5.2 Dataflow and Message Driven 121
5.2.1 Description 121
5.2.2 Critique 122
5.2.3 Relationship to Active Messages 125
5.2.4 Summary 126
5.3 Shared-memory 127
5.3.1 Definition 127
5.3.2 Shared memory with non-uniform memory access 128
5.3.3 Cache-coherent shared memory with non-uniform memory access 135
5.3.4 Summary 139
5.4 Conclusions 141

6 Impact on New Programming Languages 143

6.1 Split-C 144
6.1.1 Split-C language summary 144
6.1.2 Libsplit-c implementation using CM-5 Active Messages 147
6.1.3 Split-C evaluation 154
6.1.4 Summary 158
6.2 TAM/Id90 160
6.2.1 The TAM/Id90 compilation system 160
6.2.2 Evaluation 163
6.2.3 Conclusions 169

7 Conclusions and Prospect 171

7.1 Impact on Communication Micro-Architectures 174
7.1.1 Improving Network Interfaces 174
7.1.2 Hardware support for message handlers 175
7.1.3 Support for a global address space 177
7.2 Impact on Operating Systems 179
7.3 Summary 180

Bibliography 181


The collection of postscript files is also available as a 2.7Mbyte gzipped tar archive.


For more information or for comments, please contact Thorsten von Eicken