Simple example implementations of Automatic Differentiation

By Steve Marschner for Cornell CS6630, Fall 2022

Automatic differentiation is an important tool in scientific computing, machine learning, and computer graphics. It is a simple technique that works well to compute derivatives of nearly any function you can write a program to evaluate. I want to build really simple, but working, model implementations of a couple basic autodiff methods, to show you how they work and to demonstrate how simple they are at their core, even if the details can get pretty involved in practice.

If you are looking at the static HTML version of this document on the course website, you can download the live notebook to play with yourself. Installing Anaconda is an easy way to set up an appropriate Jupyter/Python environment.

Forward mode

The simplest type of autodiff is known as forward mode, and we'll start with the simplest case, where we have a function of one variable that we want to differentiate--so every variable in the program is either a constant or something that depends on that one independent variable.

The idea is to think of the program that computes the function as a sequence of simple elementary operations, and to augment every operation so that it computes the derivative of its result from its inputs and their derivatives. I'm going to do this by defining a class var that contains two numbers: a value and the derivative of that value with respect to the (single) independent variable. It implements all the standard arithmetic operations, and in each case it uses the chain rule to compute the derivative of the new object it returns.

In Python this is going to produce computations that do a ton of extra heap allocation, which is not efficient, so it's really just a toy demonstration. I'm not sure how to make it efficient in Python without doing source-to-source translation, but in C++ with expression templates it's possible to make a similar approach produce efficient code.

That's pretty simple. Let me try it on a simple example: $$f(x) = \frac{x}{\sqrt{1 + x}}.$$ The derivative, according to my scratch paper, ought to be $$f'(x) = \frac{x + 2}{2(1+x)^{3/2}}.$$ To evaluate the function and its derivative using my autodiff framework, I just have to compute the function using values of my var class rather than regular numbers.

Well that's good, the derivatives in the variables match the derivatives computed separately, with errors at most around the machine precision. A more general way to check our results is with finite differences; here is a function that can do that:

That is encouraging; the errors are on the order of the square root of the machine precision (we are using double precision here) which is about the best we can hope for from a finite difference derivative.

Forward mode, multidimensional range

Without fundamentally changing anything, we can upgrade the forward mode autodiff variable so that it supports functions $f : \mathbb{R} \rightarrow \mathbb{R}^n$. The idea is to make a datatype that is like a NumPy array but which knows the derivative of all its components with respect to a single independent variable, and define arithmetic operations that operate on this datatype and compute the derivatives along with the values. For elementary operations it's simple to compute the derivatives of the result from the derivatives of the operands using familiar formulas from calculus. The matrix multiply derivative looks just like a regular product rule because it's just a sum of products.

Let's try it out computing the derivative of the length of a vector, first doing it all with scalars...

...and trying the same thing with a vector operand

Let's try this on a little more of an application: computing the tangent vector of a parametric curve. Here's one called the Folium of Descartes: $$ x(t) = \frac{3at}{1+t^3} \quad y(t) = \frac{3at^2}{1+t^3} $$

That all worked well.

Reverse mode

One big problem with forward mode autodiff comes up if we have many independent variables. Differentiating with respect to multiple independent variables is easy, but it involves essentially performing $n$ separate derivative computations, each of which costs as much as the first one, to compute the derivative (known as the gradient in this context) of a function $f : \mathbb{R}^n \rightarrow \mathbb{R}$.

There is a second way to do automatic differentiation, which takes exactly the same idea but computes derivatives from the output towards the input.

...so that works, with code that looks remarkably like the code used in forward mode. The difference is the intermediate objects persist until the reverse pass.