0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2017.2672976, IEEE Transactions on Computers IEEE TRANSACTIONS ON COMPUTERS Design of Approximate Radix-4 Booth Multipliers for Error- Tolerant Computing Weiqiang Liu, Senior Member, IEEE, Liangyu Qian, Chenghua Wang, Honglan Jiang, Jie Han, Senior Member, IEEE, Fabrizio Lombardi, Fellow, IEEE Abstract—Approximate computing is an attractive design method- ology to achieve low power, high performance (low delay) and re- duced circuit complexity by relaxing the requirement of accuracy. In this paper, approximate Booth multipliers are designed based on ap- proximate radix-4 modified Booth encoding (MBE) algorithms and a regular partial product array that employs an approximate Wallace tree. Two approximate Booth encoders are proposed and analyzed for error-tolerant computing. The error characteristics are analyzed with respect to the so-called approximation factor that is related to the inexact bit width of the Booth multipliers. Simulation results at 45 nm feature size in CMOS for delay, area and power consumption are also provided. The results show that the proposed 16-bit approxi- mate radix-4 Booth multipliers with approximate factors of 12 and 14 are more accurate than existing approximate Booth multipliers with moderate power consumption. The proposed R4ABM2 multiplier with an approximation factor of 14 is the most efficient design when considering both power-delay product and the error metric NMED. Case studies for image processing show the validity of the proposed approximate radix-4 Booth multipliers. Index Terms—Radix-4 multiplier, Booth encoder, approximate com- puting, low power. ———————————————————— 1 INTRODUCTION ULTIPLIERS are widely used in arithmetic units of microprocessors, multimedia and digital signal pro- cessors; moreover, high performance and low power mul- tipliers are in high demand for embedded systems. It is becoming extremely difficult to further improve perfor- mance and reduce the power consumption of multipliers under the requirement of full accuracy; however, the re- quirements of high precision and exactness are not so strict for many applications related to human perception, such as multimedia signal processing and machine learn- ing. High precision and exactness in the operations of dig- ital logic circuits are related to the generally accepted re- quirement of correctness of information processing; nu- merous error-tolerant applications can be found in com- puting and by relaxing the requirement of strict accuracy, performance and power consumption can be substan- tially improved [1]. This design principle is generally known as approximate or inexact computing [2]. As the basic operations of an arithmetic processor, ad- dition and multiplication are very important for achiev- ing high performance. Addition has been extensively studied for approximate computing for reduction in power consumption and delay [3-5]. New metrics includ- ing error distance (ED), mean error distance (MED) and normalized error distance (NED) have been proposed for evaluating the designs of approximate adders [6]. Approximate multiplication has not been extensively studied despite its importance for arithmetic processing and systems; multiplication is more complex compared with addition, because it requires the accumulation of partial product rows. The most widely used high perfor- mance multiplier consists of a modified Booth encoding (MBE) to reduce the number of partial product rows by half as the first step [7, 8]. The current designs for an approximate multiplier can be categorized as truncation and non-truncation schemes. A truncation-based design relies on a simple approxima- tion in which either the lower part of the partial products is removed, or the least significant partial products are es- timated by a constant (so referred to as fixed-width mul- tiplier design [9]); however, the error generated by the truncated partial product rows can be rather large. There- fore, error compensation strategies have been proposed to increase the accuracy of truncated multipliers; an inexact array multiplier has been proposed by ignoring some of the least significant columns of the partial products and considering them as a constant [3]. In [10], the truncated multiplier utilizes a correction constant selected accord- ing to both the reduction and rounding errors. However, this truncated multiplier incurs a very large error if the partial products in the least significant columns are all ones or all zeros; therefore, a truncated multiplier with variable correction has been also proposed in [11]. Re- cently, a few error compensation strategies have been proposed to further improve the accuracy of fixed-width Booth multipliers [9, 12-15]. An error is compensated with the outputs of the Booth encoders in [9]; the error com- pensation circuit proposed in [12] mainly uses a simpli- fied sorting network. An adaptive conditional-probability estimator has been proposed in [15] to compensate the quantization error of a fixed-width Booth multiplier. These truncated Booth multipliers use error compensa- tion circuits to improve the accuracy. However, the extra compensation circuits require additional hardware; ap- proximate computing can be employed to reduce such overhead. A non-truncation scheme utilizes approximate circuits to assemble an approximate multiplier. An approximate 2×2 multiplier has been proposed in [16] by simplifying its logic expression using a Karnaugh-Map (K-map); this xxxx-xxxx/0x/$xx.00 © 201x IEEE Published by the IEEE Computer Society M ———————————————— • W. Liu, L. Qian and C. Wang are with the College of Electronic Infor- mation and Engineering, Nanjing University of Aeronautics and As- tronautics, Nanjing, Jiangsu, China, 210016. E-mail: {liuweiqiang, qliangyu, chwang}@nuaa.edu.cn. • H. Jiang and J. Han are with Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada. E-mail: {honglan, jhan8}@ualberta.ca • F. Lombardi is with the Department of Electrical and Computer Engi- neering, Northeastern University, Boston, MA 02115. E-mail: lom- bardi@ece.neu.edu.