Adrian Sampson: Holy-Grail Approximation

A central challenge in approximate computing research is programmability. It is not enough to implement relaxed execution; developers need support to select good program relaxations that make a profitable quality–efficiency trade-off.

Existing systems vary in their effectiveness at finding good relaxations and the strengths of the guarantees they provide. Techniques can be seen as approaching a mythical “holy grail” approximation system that finds optimal program relaxations without intervention. This post outlines such an ideal programming model, called HGApprox for holy-grail approximation, and argues the fundamental reasons that it is unrealizable. Even though HGApprox is unrealistic, it is useful as a benchmark for comparing existing work and for setting the agenda for future work that draws closer to the ideal.

HGApprox

HGApprox takes two inputs:

A program, $P$, that the user wants to optimize.
A quality metric, $Q$, which is a function from an input $x$ and $P$’s output, $P(x)$, to a score between 0 and 1 reflecting the output’s quality.

HGApprox automatically produces a set of transformed programs, $P_i$, that are Pareto-optimal with respect to resource usage (i.e., performance) and total quality over all possible inputs, $\sum_x Q(x, P_i(x))$. (You could also imagine specifying a quality loss threshold; HGApprox would then return a single optimal transformed program $P_i$ from the Pareto frontier.) HGApprox requires no further guidance from the programmer and is guaranteed to produce optimal relaxations—no program may exist that has better quality and performance.

Compromises

Realistic approximate programming systems compromise with respect to the HGApprox ideal in three ways that acknowledge three unfortunate realities.

Tractability: real systems need hints.

The space of possible program relaxations $P_i$ is too large to search for optimal relaxations. On hardware with approximate ALUs, for example, the search space size is exponential in the number of operators in $P$. The space may also be rife with local optima. It is infeasible for an automated tool to find the best relaxations in this space for general programs. Realistic systems need hints from the programmer, such as a distinguished subset of code or data where approximation is allowed, to help shrink the search space and guide the system to a better local optimum.

Quality metrics are imperfect.

Fully automatic, “black-box” tools make programmers wary—and they should. HGApprox produces programs that are optimal with respect to the quality metric $Q$, so any shortcomings in $Q$ are reflected in the output programs $P_i$.

It is difficult to write a quality metric that accounts for every possible way that things could go wrong due to approximation. An image metric that constrains pixel value deltas works well for random errors but can permit low-magnitude error patterns that nonetheless lead to distracting visual artifacts—a contingency that the programmer did not anticipate when devising $Q$.

In practice, programmers need some control over how approximation is applied to ensure that it does not break the program in unexpected ways.

Quality is uncomputable.

Nontrivial properties of programs are undecidable (by Rice’s theorem). Quality is no exception: the problem of checking whether a candidate program $P_i$ always meets a quality threshold (i.e., $\forall x \; Q(x, P_i(x)) > c$) is uncomputable.

If a realistic system cannot even evaluate a candidate relaxation’s quality, how can it hope to produce good relaxations? Realistic systems resort to imprecise quality evaluators that fall into a few categories:

Offline testing. Check the quality on representative inputs and hope that they generalize. This approach has the same benefits and drawbacks as traditional testing.
Static analysis, which must be incomplete, unsound, or both. Sound, conservative analyses are promising but require “escape hatches” when safety cannot be proven.
Runtime checks. On-line systems can decide post facto whether a particular computation was sufficiently accurate. Dynamic checks must be made cheap enough as to avoid obviating the savings offered by approximation. There are no bounds on the complexity of quality metrics $Q$, however, so a run-time check might need to use a cheaper version $Q'$ that sacrifices soundness or completeness.

Conclusion

These three compromises are inevitable, but the community should focus on ways that we can approach the ideal in each case. Which programmer hints are most effective in guiding the search for approximate programs? How can we help programmers specify better quality metrics that match their expectations? What quality measurement strategies best capture general run-time behavior? Each area of compromise represents an opportunity to make approximate computing more usable for mainstream programmers.