CS 5220: Applications of Parallel Computers
27 Aug 2015
Reading Note
This is a supplement to the notes.
Go read them, too!
How Fast Can We Go?
- Speed in flop/s for Linpack: top500
- Giga (109) -- a single core
- Tera (1012) -- a big machine
- Peta (1015) -- current top 10 machines (5 in US)
- Exa (1018) -- favorite of funding agencies
- Current record-holder: China's Tianhe-2
- 33.9 Petaflop/s (54.9 theoretical peak)
- 17.8 MW + cooling
Tianhe-2 environment
Commodity nodes, custom interconnect:
- Xeon E5-2692 nodes with Phi accelerators
- Intel compilers + Intel math kernel libraries
- MPICH2 MPI with customized channel
- Kylin Linux
- TH Express-2 interconnect
A US Contender
Sequoia at LLNL (3 of 500)
- 20.1 Petaflop/s theoretical peak
- 17.2 Petaflop/s Linpack benchmark (86%)
- 14.4 Petaflop/s in a bubble-cloud sim (72%)
- 2013 Gordon Bell Prize
- 2010 Prize was 30% peak on ORNL Jaguar
- Performance on more standard code?
- 10% is probably very good!
- Peak > Linpack > Gordon Bell > Typical
- Measuring performance of real applications is hard
- Typically a few bottlenecks slow things down
- Figuring out why can be tricky!
- And we really care about time-to-solution
- Sophisticated methods get answers in fewer flops
- ... but may look bad in flop rate benchmarks
- Lots of delusion and deception in performance analysis
- Starting point: good serial performance
- Strong scaling: compare parallel to serial time (fixed size)
- Speedup = Serial Time / Parallel Time
- Efficiency = Speedup / p
- Ideally, speedup = p; usually lower
- Barriers to perfect speedup
- Serial work (Amdahl's law)
- Parallel overheads (communication, synchronization)
Amdahl
p= number of processorss= fraction of work that is serialts= serial timetp= parallel time≥sts+(1−s)ts/p
Speedup=tstp=1s+(1−s)/p<1s
Things look better if n grows with p (a weak scaling study)
Summary
- We're approaching exaflop peak rates
- Codes rarely get peak performance
- Better: Compare to tuned serial performance
- Measure speedup and efficiency
- Strong scaling: increase p, fix n
- Weak scaling: increase both p and n
- Serial overheads and communication kill speedup
- Simple analytical models help understand scaling
CS 5220: Applications of Parallel Computers
Intro to Performance Analysis
27 Aug 2015