Lecture 5: ML Frameworks¶

CS4787/5777 — Principles of Large-Scale Machine Learning Systems¶

$\newcommand{\R}{\mathbb{R}}$

Continuing from last time: Reverse-Mode AD¶

  • Fix one output $\ell$ over $\mathbb{R}$
  • Compute partial derivatives $\frac{\partial \ell}{\partial y}$ for each value $y$
  • Need to do this by going backward through the computation

"Deep Learning" ML Frameworks¶

Classical Core Components:

  • Numerical linear algebra library
  • Hardware support (e.g. GPU)
  • Backpropagation engine
  • Library for expressing deep neural networks

All embedded in a high-level language

  • Usually python.

Numerical Linear Algebra¶

You've already seen and used this sort of thing: NumPy.

  • Arrays are objects "owned" by the library
  • Any arithmetic operation on these objects goes through the library
    • The library calls an optimized function to compute the operation
    • This happens outside the python interpreter
    • Control is returned to python when the function finishes
    • By default you're only going to be running one such function at a time.

Numerical Linear Algebra: More Details¶

  • Arrays are mutable
  • Multiple references can exist!

Numerical Linear Algebra On-Device¶

  • The simplest version of this is essentially a "copy" of NumPy for each sort of hardware we want to run on. This contains a copy of every function we want to support for each type of hardware.
    • e.g. one copy that runs on the CPU, one copy that runs on the GPU
  • Arrays are located explicitly on one device
    • in PyTorch, you move them with x.to("device_name")
  • When we try to call a function, the library checks where the inputs are located
    • if they're all on one device, it calls that device's version of the function
    • if they're not all on the same device, it raises an exception

Eager Execution vs Graph Execution¶

When we manifest a node of the compute graph, we can either:

  • (eager) compute and manifest the value at that node immediately
  • (graph) just manifest the node
    • need to call some function to compute the forward pass later

This was the classic distinction between TensorFlow and PyTorch

In [ ]: