Energy-Efficient Approximate Wallace-Tree Multiplier using Significance-Driven Logic Compression Issa Qiqieh , Rishad Shafik , Ghaith Tarawneh , Danil Sokolov , Shidhartha Das , Alex Yakovlev School of Electrical and Electronic Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK, ARM, 110 Fulbourn Rd, Cambridge CB1 9NJ, Cambridge, UK Emails: {i.qiqieh1, rishad.shafik, ghaith.tarawneh, danil.sokolov, alex.yakovlev}@newcastle.ac.uk, Shidhartha.Das@arm.com Abstract—In this paper, we propose an energy-efficient approx- imate multiplier design approach. Fundamental to this approach is configurable lossy logic compression, coupled with low-cost error mitigation. The logic compression is aimed at reducing the number of product rows using progressive bit significance, and thereby decreasing the number of reduction stages in Wallace- tree accumulation. This accounts for substantially lower number of logic counts and lengths of the critical paths at the cost of errors in lower significant bits. These errors are minimised through a parallel error detection logic and compensation vector. To validate the effectiveness of our approach, multiple 8-bit multipliers are designed and synthesized using Synopses Design Compiler with different logic compression levels. Post synthesis experiments showed the trade-offs between energy and accuracy for these compression levels, featuring up to 70% reduction in power-delay product (PDP) and 60% lower area in the case of a multiplier with 4-bit logic compression. These gains are achieved at a low loss of accuracy, estimated at less than 0.0554 of mean relative error. To demonstrate the impact of approximation on a real application, a case study of image convolution filter was extensively investigated, which showed up to 62% (without error compensation) and 45% (with error compensation) energy savings when processing image with a multiplier using 4-bit logic compression. I. I NTRODUCTION Approximate computing has been introduced as an efficient solution for achieving higher computational performance at low energy cost for imprecision-resilient applications. The basic premise of approximate computing is to replace tradi- tional complex and energy-wasteful data processing blocks by low-complexity ones with reduced logic counts. As a result, effective chip area and energy consumption are reduced at the cost of imprecision introduced to the processed data [1]. Research has shown that the majority of modern appli- cations could be ordered under the domain of approximate computing, such as digital signal processing, computer vision, robotics, multimedia and data analytics [2]. This can be lever- aged as an opportunity for energy-efficient system design for current and future generations of application-specific systems. Approximate arithmetic, such as approximate adders and multipliers, can be exploited as means of reducing energy requirements, increasing speed, reducing cost and increasing reliability in many of these applications [3] [4]. Multipliers are crucial arithmetic units in many of the aforementioned applications, for two major reasons. Firstly, they are charac- terized by complex logic design, being one of the most energy- demanding data processing units in modern microprocessors. Secondly, compute-intensive applications typically exercise a large number of multiplication operations [5]. These factors have prompted approximate multiplier design research, since improvements made in the power/speed of a multiplier are expected to substantially influence the overall system power/ performance trade-offs [6]. Recently reported multiplier design approaches can be largely categorized as modifications of either timing or func- tional behaviors. Timing behavior can be modified by lowering the supply voltage below its nominal value which allows for reductions in energy consumption at the cost of time-induced errors [7]. Since timing errors are caused by long carry chains, i.e., impact the most significant bit of the final product, it is necessary to quantify the impact of timing violation by modifying the conventional multiplier to allow for graceful degradation [8]. Functional modifications deal with logic reduction tech- niques and can be performed by relaxing the need for accurate Boolean equivalence in favor of energy and circuit area reductions. For example, truncating multiplier product terms allows for the elimination of some of the least significant partial product terms [9]. As more columns are eliminated, further energy reduction is achieved; however, errors also increase. Large efficient multipliers using inaccurate small multiplier blocks is another effective technique [10] [11]; however, the hierarchical organization of small approximate blocks may not significantly reduce the critical path and also will eventually propagate more errors when increasing the size of multiplier. Automated design approaches [12]–[14] present design flows for generating approximate circuits using circuit activity profiles, quality bounds and evolutionary processes. The key principle of the above studies is to achieve reduced logic complexity, which is also the main aim of our work. In this paper, we propose an energy efficient approximate multiplier using significance driven logic compression. Com- pared to our previous work in [15], we present the following new key contributions: 1) Incorporate a Wallace-tree accumulation method to- gether with the significance-driven logic compression (SDLC) approach to shorten the reduction stages. 2) Add a parallel error detection and compensation method to minimise the impact of lossy compression. 978-1-5386-0446-5/17/$31.00 c 2017 Crown