Lecture 5: ML Frameworks¶

CS4787/5777 — Principles of Large-Scale Machine Learning Systems¶

$\newcommand{\R}{\mathbb{R}}$

Fix one output $\ell$ over $\mathbb{R}$
Compute partial derivatives $\frac{\partial \ell}{\partial y}$ for each value $y$
Need to do this by going backward through the computation

Classical Core Components:

All embedded in a high-level language

You've already seen and used this sort of thing: NumPy.

Arrays are objects "owned" by the library
Any arithmetic operation on these objects goes through the library
- The library calls an optimized function to compute the operation
- This happens outside the python interpreter
- Control is returned to python when the function finishes
- By default you're only going to be running one such function at a time.

The simplest version of this is essentially a "copy" of NumPy for each sort of hardware we want to run on. This contains a copy of every function we want to support for each type of hardware.
- e.g. one copy that runs on the CPU, one copy that runs on the GPU
Arrays are located explicitly on one device
- in PyTorch, you move them with x.to("device_name")
When we try to call a function, the library checks where the inputs are located
- if they're all on one device, it calls that device's version of the function
- if they're not all on the same device, it raises an exception

When we manifest a node of the compute graph, we can either:

(eager) compute and manifest the value at that node immediately
(graph) just manifest the node
- need to call some function to compute the forward pass later

This was the classic distinction between TensorFlow and PyTorch