# CS 5220 ## Distributed memory ### Networks and models ## 06 Oct 2015
### Basic questions - How much does a message cost? - *Latency*: time to get between processors - *Bandwidth*: data transferred per unit time - How does *contention* affect communication? - This is a combined hardware-software question! - We want to understand just enough for reasonable modeling.
### Thinking about interconnects Several features characterize an interconnect: - *Topology*: who do the wires connect? - *Routing*: how do we get from A to B? - *Switching*: circuits, store-and-forward? - *Flow control*: how do we manage limited resources?
### Thinking about interconnects - Links are like streets - Switches are like intersections - Hops are like blocks traveled - Routing algorithm is like a travel plan - Stop lights are like flow control - Short packets are like cars, long ones like buses? At some point the analogy breaks down...

Bus topology

  • One set of wires (the bus)
  • Only one processor allowed at any given time
    • Contention for the bus is an issue
  • Example: basic Ethernet, some SMPs


  • Dedicated path from every input to every output
    • Takes $O(p^2)$ switches and wires!
### Bus vs. crossbar - Crossbar: more hardware - Bus: more contention (less capacity?) - Generally seek happy medium - Less contention than bus - Less hardware than crossbar - May give up one-hop routing
### Network properties Think about latency and bandwidth via two quantities: - *Diameter*: max distance between nodes - *Bisection bandwidth*: smallest bandwidth cut to bisect - Particularly important for all-to-all communication

Linear topology

  • $p-1$ links
  • Diameter $p-1$
  • Bisection bandwidth $1$

Ring topology

  • $p$ links
  • Diameter $p/2$
  • Bisection bandwidth $2$


  • May be more than two dimensions
  • Route along each dimension in turn


Torus : Mesh :: Ring : Linear


  • Label processors with binary numbers
  • Connect $p_1$ to $p_2$ if labels differ in one bit

Fat tree

  • Processors at leaves
  • Increase link bandwidth near root
### Others... - Butterfly network - Omega network - Cayley graph
### Conventional wisdom - Roughly constant latency (?) - Wormhole routing (or cut-through) flattens latencies vs store-forward at hardware level - Software stack dominates HW latency! - Latencies *not* same between networks (in box vs across) - May also have store-forward at library level - Avoid topology-specific optimization - Want code that runs on next year’s machine, too! - Bundle topology awareness in vendor MPI libraries? - Sometimes specify a *software* topology