The roofline model for performance is a visualization that tells us the type of peak performance that we might expect of a kernel with a given arithmetic intensity based. The “roofline” is a line whose slope is associated with memory bandwidth effects, and then a flat part that is associated with peak flop rate. There can be multiple slopes and flats depending on the features of the kernel and its implementation (instruction mix, simplicity of the access pattern, use of SIMD, etc).

I am not going to try to give a new slide deck on the roofline model. Rather, I am going to point you to a couple nice resources:

  1. Roofline: An Insightful Visual Performance Model for Multicore Architectures - in Communications of the ACM, so you may want to use the library passkey service if you are trying to get it from off campus. Alternately, go read the tech report version
  2. The Roofline Model: A pedagogical tool for program analysis and optimization - slides from a tutorial talk presented by Sam Williams when the roofline idea was first getting started.
  3. Introduction to the Roofline Model - 2020 talk by Sam Williams at TPUs for Science 2020. Missing the first part of the talk, but should be OK.

Watch the video if you want to hear someone narrating to you! You’ll be hearing me narrate again soon enough.