Highly Energy and Performance Efficient Embedded Computing through Approximately Correct Arithmetic A Mathematical Foundation and Preliminary Experimental Validation Lakshmi N. B. Chakrapani Department of Computer science Rice University Houston, Texas, USA chakra@rice.edu Kirthi Krishna Muntimadugu Department of Electrical and Computer Engineering Rice University Houston, Texas, USA kirthi.krishna@rice.edu Avinash Lingamneni Department of Electrical and Computer Engineering Rice University Houston, Texas, USA Avinash.Lingamneni@rice.edu Jason George School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia, USA george@ece.gatech.edu Krishna V. Palem Department of Computer science Department of Electrical and Computer Engineering Rice University Houston, Texas, USA palem@cs.rice.edu ABSTRACT We develop a theoretical foundation to characterize a novel methodology for low energy and high performance dsp for embedded computing. Computing elements are operated at a frequency higher than that permitted by a conventionally correct circuit design, enabling a trade-off between error that is deliberately introduced, and the energy consumed. Simi- lar techniques considered previously were relevant to deeply scaled future technology generations. Our work extends this idea to be applicable to current-day designs through: (i) a mathematically rigorous foundation characterizing a trade- off between energy consumed and the quality of solution, and (ii) a means of achieving this trade off through very aggres- sive voltage scaling beyond that of a conventionally designed circuit. Through our “cmos inspired” mathematical model, we show that our approach is better (by an exponential fac- tor) than the conventional uniform voltage scaling approach for comparable computational speed or performance. We fur- ther establish through experimental study that a similar im- provement by a factor of 3.4x to the snr over conventional voltage-scaled approaches can be achieved in the context of the ubiquitous discrete Fourier transform. This author wishes to thank the the Moore distinguished faculty fellow program at the California Institute of Tech- nology, which enabled pursuing this work in part. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CASES’08, October 19–24, 2008, Atlanta, Georgia, USA. Copyright 2008 ACM 978-1-60558-469-0/08/10 ...$5.00. General Terms Design,Experimentation,Performance Categories and Subject Descriptors B.2.0 [Hardware]: Arithmetic and Logic Structures—gen- eral Keywords Digital signal pocessing, voltage overscaling 1. INTRODUCTION High performance and low energy operation are of great importance in embedded and mobile systems. A huge class of such power and performance constrained systems real- ize various forms of digital signal processing or “dsp” work- loads. Hence techniques for energy and performance effi- ciency at various levels—circuit, architecture, algorithmic and application—have been sought and invented in this con- text. These techniques may be classified under two cat- egories: (i) techniques which improve energy and perfor- mance with no degradation of quality of solution. For ex- ample, these techniques include using better algorithms to replace complex operations such as multiplications with sim- pler operations such as additions and eliminating redundant computations. At the circuit level, techniques for energy and performance efficiency typically seek more efficient circuit implementations of dsp primitives. (ii) Techniques which trade off energy for quality of solution. These are typi- cally at the algorithmic level, where parameters such as the number of quantization levels, and the precision of coeffi- cients are traded off for the quality of solution [8, 13, 1, 2]. These conventional techniques for low energy and high performance dsp utilize deterministic building blocks and 187