1 A Framework for Coded Computation Eric Rachlin and John E. Savage Computer Science, Brown University Providence, RI 02912-1910 Abstract— Error-correcting codes have been very successful in protecting against errors in data transmission. Computing on encoded data, however, has proved more difficult. In this paper we extend a framework introduced by Spielman [14] for computing on encoded data. This new formulation offers signifi- cantly more design flexibility, reduced overhead, and simplicity. It allows for a larger variety of codes to be used in computation and makes explicit conditions on codes that are compatible with computation. We also provide a lower bound on the overhead required for a single step of coded computation. I. I NTRODUCTION When symbols are transmitted across a noisy channel, one of the most basic approaches for protecting against errors is to use a repetition code. Since the birth of coding theory in the 1940’s, however, many far more efficient codes have been discovered. Unfortunately, analogous results have not been obtained for noisy computation. In this paper, we consider networks of noisy computing elements, logic gates, for example, in which each element can “fail” independently at random with probability ǫ. When an element fails, it outputs an incorrect value. In 1956, von Neumann proposed the first systematic ap- proach to building logic circuits from noisy gates [1]. His approach was to repeat each gate r times, then periodically suppress errors by taking many random majorities. This is very similar to protecting transmitted data using a repetition code. The main complication is that the majority gates can them- selves fail. Thus, the new goal is to avoid error accumulattion while keeping the size of majority gates constant. Thirty years later Pippenger successfully analyzed von Neumann’s construction [2]. He demonstrated that given a fault-free circuit of size C, a fault-tolerant version of it, C , could be constructed of size O(|C| log |C|) such that for all inputs the probability that an output of C is in error is within O(ǫ). An excellent description of this analysis can be found in [3]. Unfortunately the analysis also suggests that the constant associated with the O(log |C|) bound is large, as do experimental results [4]. After Pippenger obtained an upper bound on r, he and others obtained lower bounds for the size of von Neumann fault-tolerant circuits. Under the assumption that all gates fail independently with probability ǫ, the size and depth required for repetition-based fault tolerance has been shown to be within a constant factor of optimal for many basic functions (XOR for example) [5], [6], [7]. The derivation of these bounds highlight a shortcoming of the von Neumann model. Since This research was funded in part by NSF Grants CCF-0403674 and CCF- 0726794. all gates can fail with probability ǫ, the inputs and outputs of a circuit always have probability ǫ of being incorrect. For a sensitive function of N inputs, XOR for instance, each input must be sampled O(log N ) times simply to ensure that information about its correct value reaches the output with high probability. In other words, since the inputs to a circuit are essentially encoded using repetition, the amount of redundancy for a reliable encoding is Ω(log N ) = Ω(log |C|). In this paper, we consider a more general, and more realistic model of noisy computation in which some gates can be larger, but highly reliable (much like today’s CMOS gates), while most gates are small, but susceptible to transient failures (an anticipated characteristic of nanoscale technologies [8]). In this new model, most computational steps are done using noisy gates interspersed by a few steps in which reliable gates are used to decode and re-encode data without errors. This model more closely parallels data being transmitted over a noisy channel using a reliable encoder and decoder. As we demonstrate, this coded computation model intro- duces a wide range of new design possibilities including new codes. Although this model has not been extensively studied, lower bounds on circuit size have been obtained. We will review and generalize these bounds. II. RELATED WORK Linear error correcting codes have been in use since the 1950s [9]. These are codes defined over finite fields in which the check symbols are linear combinations of the information symbols. When the computations that need to be coded are linear, it is known how to compute on such encoded data [10]. The problem is much more complex when the computations are non-linear. Early work in coded computation established simple lower bounds [11], [12], [13] (see Section IV) that suggest the difficulty of this problem. Spielman [14] has proposed a general purpose approach to coded computation that we extend in this paper. First, he proposed that codes be used over alphabets that are supersets of the source data alphabets. These codes may use symbols from a larger alphabet for check symbols. Second, he proposed that the definition of functions over the smaller alphabet be ex- tended to functions over the larger alphabet using interpolation polynomials. Third, he proposed that data be encoded using 2D Reed-Solomon (RS) codes. In Spielman’s approach, the result of computing on RS encoded data is RS encoded data using an RS code with smaller error correcting capability than the original code. This overcomes the lower bounds referenced above. However, it ne- cessitates that the newly computed data be “transcoded” back