1 ABSTRACT We provide an overview of the notion of error tolerance and describe the context that motivated its development. We then present a summary of some of our case studies, which demonstrate the significant potential benefits of error tolerance. We present a summary of testing and design techniques that we have developed for error tolerant systems. Finally, we conclude by identifying shifts in paradigm required for wide exploitation of error tolerance. I. BACKGROUND AND MOTIVATION The notion of error tolerance is motivated by three important trends in information processing, namely changes in fabrication technology, changes in the mix of applications, and emergence of new paradigms of computation. Fabrication technology: As we get closer to what some call the “end of CMOS”, we see the emergence of highly unreliable and defect-prone technologies. This is accompanied by rapid development of new computing technologies such as bio, molecular, and quantum devices. Most of these new technologies are also extremely unreliable and defect-prone (e.g., see [12]). However, these new technologies also provide the ability to carry out massive numbers of computations in parallel and at speeds that far exceed those currently achieved by CMOS devices. Applications: Increasingly larger fractions of the total number of chips fabricated in any given year implement multi-media applications and process signals representing audio, speech, images, video and graphics. The outputs of such systems eventually become input signals to human users. There are several interesting aspects to the computational requirements for such systems. 1) The result of computation, i.e., the output data, is not measured in terms of being right or wrong, but rather on perceptual quality to its human users. For example, in the case of an image the perceptual quality may be defined in terms of absence of visible artifacts, clarity, color and *This research was supported by National Science Foundation (0428940) intensity. In other words, the criterion is not correctness but whether the end product is acceptable to the human user. 2) Most such systems are by design lossy, in the sense that the outputs deviate from perfection due to sampling of input signals, conversion to digital, quantization, lossy encoding, decoding and conversion to analog signals. 3) Many such applications require parallel architectures as they are computationally intensive and have real-time performance constraints. Emerging paradigms of computation: Several new paradigms are emerging on how functions are computed and what requirements are placed on the “correctness” and “accuracy” of the results. With tongue in cheek, in our school systems 5+7=13 is not considered to be “wrong,” but rather “that is close Jimmy.” Increasingly this is also the case for many emerging computation paradigms, which carry out computations somewhat differently than classical computations carried out for applications like bookkeeping and flight control systems. Consider the following paradigms. Evolutionary computation is a simplified attempt to solve a problem based on several analogies made with evolution as it occurs in biological systems. One important aspect of such heuristic computations, and of many other heuristics, can be summarized as in [14]: “Evolutionary computing deals with the process where ‘a computer can learn on its own and become an expert in any chosen area.’ Such systems often rely on neural nets for their implementation. The process can adapt over time, e.g. one can modify the score function.” Neural nets also define acceptability of the results of computation in a similarly less stringent manner. For example [25] states: “Neural nets typically provide a greater degree of fault tolerance than von Neumann sequential computers because there are many more processing elements, each with primarily local connections. Damage to a few elements or links thus need not impair the overall performance significantly.” Approximate computations: In [24], Partridge states that a challenge is “to develop a science of approximate computation and derive from it a well-founded discipline for engineering approximate software. In order to meet this challenge, a radical departure from discrete correct/incorrect computation is Melvin Breuer, Keith Chugg, Sandeep Gupta, and Antonio Ortega Ming Hsieh Department of Electrical Engineering University of Southern California, Los Angeles, CA 90089 mb@poisson.usc.edu; chugg@usc.edu; sandeep@poisson.usc.edu; ortega@sipi.usc.edu Error tolerance: Why and how to use slightly defective digital systems*