Neural Acceleration

January 5, 2013

Last month, I had the honor of presenting my group’s work on neural acceleration at MICRO 2012. This project, which I worked on with my co-conspirator Hadi Esmaeilzadeh and our advisors, represents the beginning of a particularly exciting research direction for us that uses hardware neural networks to make programs faster and more energy-efficient.

The core of the idea is related to how other accelerators work: if a “hot” part of a program has a certain characteristic, then it can be much more efficient to execute it on an accelerator—a hardware structure separate from the CPU core but often on the same chip—than on the main processor. GPGPUs and vector units, for example, are good at doing lots of identical math in parallel to large arrays of data. When a code has more irregular, data-dependent parallelism, an FPGA can bring huge benefits in performance and power.

I’m talking here about configurable accelerators—fixed-function circuits like the H.264 video decoder found in most smartphone SoCs are less interesting: it’s infeasible to manually design and verify a special circuit for every important algorithm you might want to run. More promising are some recent proposals for transparent accelerators that, unlike GPUs or FPGAs, benefit unmodified programs written in traditional programming languages (i.e., C or Java, not CUDA or Verilog) by speeding up their hot spots. Examples include DySER from Wisconsin, BERET from Michigan, and QsCores from UC San Diego.

Our paper is about a new kind of accelerator. The key idea is to accelerate approximate parts of programs. Just as vector units are useful because many programs have SIMD-like hotspots, we’ve found that many programs have components that are error-resilient: if small differences occur in certain parts of a program, they’re unlikely to affect the output too much. We call these computations approximate programs and we’ve thought a lot about them in prior work. My favorite example is a lossy image decoder: the codec has already made a trade-off between image quality and file size. So if we can accelerate the decoder at the cost of a small amount of image quality, we can exploit a new trade-off between efficiency and quality that takes advantage of humans’ tolerance to small differences in images.

To build an approximate accelerator, we turned to hardware neural networks. You may know artificial neural networks as a machine learning algorithm used for tasks like handwriting recognition and robot steering, but you might not know that they have a long history of extremely efficient hardware implementations. Hardware structures that implement NNs are fast, low-power, parallel, and regular. They can even be implemented using analog circuitry—which can be orders of magnitude more efficient than the digital logic we typically use to build computers today. (Analog computers went out of style in the ’60s.)

Our idea is to use a hardware neural network to mimic approximate program fragments. With help from the programmer, the compiler finds chunks of code that are approximable—that is, approximating their results won’t compromise the program’s usefulness too much. (Our previous work on programming languages for approximate computation can help with this step.) Then, the compiler runs the program a few times to collect sample inputs and outputs to the target code chunk. You can imagine the compiler instrumenting a function with statements that log the arguments and return values to a file. These input/output pairs represent a sampling of the behavior of the mathematical function that defines the behavior of the target code—and exactly the data necessary to train a neural network to mimic that function. We use a standard algorithm, called backpropagation, that uses the sampled data to configure a neural network that mimics the original code. Then, we compile a version of the program that invokes the neural network instead of the original chunk of C++ or Java code—introducing small errors but, hopefully, saving lots of time and energy.

And, indeed, our simulations suggest that neural acceleration can bring significant benefits in performance and energy efficiency. We accelerated a portion of six different approximate programs and found, on average, that we could speed up the programs by 2.3x and reduce their energy consumption by two thirds. These results are particularly encouraging because they are just the tip of the neural-network iceberg: we used a simple digital design for this project and more outlandish analog approaches could perform even better.

I think dark silicon and the imminent faltering of multicore scaling mean that heavily accelerator-driven designs are nearly inevitable in the next few years. We’ll need many different kinds of accelerators to have any hope of covering the wide variety of modern performance- and power-sensitive applications. We found that neural networks can be useful in accelerating even programs that have nothing to do with learning, classification, or other traditional NN tasks. With the proliferation of approximate, soft applications, accelerators based on neural networks have the potential to be particularly effective additions to the CPU.

If you’re curious to learn more about this project, check out the conference paper, the talk slides, or the poster.