In a different era, I worked on a project about approximate storage. This post is about a problem we never solved during that project—a problem that haunts me to this day.
One of our ideas was to abuse multilevel cells (MLCs), which are where you pack more than one bit into a single physical memory element. Because memory cells are analog devices, MLC designs amount to a choice of how to quantize the underlying signal to a digital value. Packing in the analog levels more tightly gives you more bits per cell in exchange for slower reads and writes—or more error.
Our idea was to pack more levels into a cell, beyond what would be allowed in a traditional, precise memory. Without adjusting the timing to compensate, we exposed the resulting errors to the rest of the system. Approximate computing!
The nice thing about these approximate memories is that analog storage errors are more often small than large. In a fourlevel (twobit) cell, for example, when you write a 0 into the cell, you are more likely to read a 1 back later than a 3. Put differently, error probabilities are monotonic in the value distance. If $v$ is the value you originally wrote and $v_1$ and $v_2$ are other cell values where $v  v_1 \ge v  v_2$, then the probability of reading $v_1$ is at most the probability of reading $v_2$. Applications enjoy small errors more than they enjoy large errors, so MLCstyle monotonic errors are a good fit.
The problem, however, is that real programs don’t use many twobit numbers. It’s not feasible to cram 65,536 levels into a single cell in most technologies, but we’d really like to be able to use 16bit numbers in our programs. It’s easy to combine, say, two twobit cells to represent a fourbit number under ordinary circumstances: just split up the bits or use a Gray code to minimize the expected cost of small changes. But these strategies ruin our nice error monotonicity property: small changes in one cell might cause large changes in our fourbit number.
Stating the Problem
Let’s compare strategies for encoding $n$bit numbers onto $c$ cell values of $b$ bits each. A given code will consist of an encoding function $e$ and a decoding function $d$. Encoding turns a single $n$bit number into a $c$tuple of $b$bit numbers, so we’ll write $e(x) = \overline{v} = \langle v_1, v_2, \ldots, v_c \rangle$ where each $v_i$ consists of $b$ bits.
We assume that, within a given cell, small errors are more likely than large errors. We hope that small percell errors translate to small errors in the decoded value. To make this precise, define an overloaded function $\Delta$ that gets the size of errors in either encoded or plaintext values. For plain numbers, for example, $\Delta(1000, 0110) = 2$; this function is the absolute difference between the values. For encoded cellvalue tuples, $\Delta(\langle 01, 10 \rangle, \langle 10, 01 \rangle) = 2$; the function is the sum of the differences for each cell. Here’s a formal statement of an errormonotonicity property we’d like:
In other words, if some celllevel error is smaller than another celllevel error, then it also translates to a smaller error in the space of decoded numbers.
The Options
Let’s consider a few options. For simplicity, I’ll give examples for $n=4$, $c=2$, and $b=2$, but each strategy should generalize to any problem size where $n = c \times b$.

I’ll call the naïve strategy a chunking code because it just breaks the number into $c$ equallysized pieces. For example, $e(0110) = \langle 01, 10 \rangle$. But a small, onelevel error in the first cell causes a large error in the represented value. For example, an error of size one can turn $\langle 01, 10 \rangle$ into $\langle 00, 10 \rangle$. The decoded error size is $\Delta(0110, 0010) = 4$. So a distanceone error in the cells can lead to a distancefour error in the value. Such an error is just as likely as a distanceone value error (when the second cell is faulty instead of the first).

A Gray code tries to avoid situations where incrementing a number makes many cells change simultaneously. This property minimizes the cost of the most common writes, so it’s a popular strategy for memory coding. But I contend that, in an abstract sense, it’s the opposite of what we want for error robustness. A Gray code takes small changes in the value and turns them into small changes in the cells. We want this implication to go the other way around: small changes in the cells should lead to small changes in the corresponding values. A small change in a cell can still lead to an arbitrarily large change in the represented number.

Grasping at straws, we could try a striping code where the bits are interleaved: the first cell holds all the bits at positions that are zero mod $b$; the next cell gets all the bits at 1 mod $b$, and so on. For example, $e(0011) = \langle 01, 01 \rangle$. But clearly, a small error in one cell can still lead to a large error in the value. For example, a singlelevel error can produce $\langle 10, 01 \rangle$. And $d(\langle 10, 01 \rangle) = 1001$, which is a valuespace error of 6.
A Question
None of these three options meets the goal I wrote above. Worse, none of them seems meaningfully closer to satisfying errormonotonicity than any other. For about five years now, I’ve wondered whether it’s possible to do any better than the naïve chunking code. I would be equally satisfied with a definitive no as with an existence proof. But so far, I have no traction at all in either direction and Google has failed me. Let me know if you have any bright ideas—I’d be thrilled.