Power Efficient Implementation of Bit-Parallel Unrolled CORDIC Structures for FPGA Platforms Burhan Khurshid Department of CSE National Institute of Technology Srinagar (J and K), India, 190006 Roohie Naaz Mir Department of CSE National Institute of Technology Srinagar (J and K), India, 190006 Abstract ! Power consumption is one of the major concerns while mapping designs on FPGAs. Dynamic power dissipation in FPGAs is a strong function of the switching activity of the nodes and the charging and discharging capacitances associated with the critical path. This paper focuses on reducing the power dissipation in bit-parallel unfolded CORDIC structures by modeling the switching activity and the charging/discharging capacitances within the critical path. Two approaches have been used; the first one reduces the switching activity by hiding the high activity nodes within look up tables and the second one re- times the structure to reduce the critical path and the associated charging/discharging capacitances. A comparative analysis of our implementation results against the traditional approach has been carried out for varying input word-lengths ranging from 4 to 32- bit parallel operands. The implementation targets two different FPGA families viz. Spartan-6 and Virtex-5. The analysis concludes that a 10 to 20 percent reduction in dynamic power dissipation and 35 to 40 percent reduction in total power dissipation is achievable with these approaches. Keywords !FPGA, CORDIC, DSP, ASIC, Look up table I. INTRODUCTION The energy performance of digital signal processing (DSP) architectures is one of the important metrics that needs to be considered during the design flow [1], [2]. In general digital designers always try to maximize the performance while keeping the cost down [3]. In the context of general digital design, performance may be measured in terms of the amount of hardware circuitry and resources required; the speed of execution and the amount of power dissipated. There is always an application driven trade-off between these opposing performance parameters. Very recently power dissipation has proven to be a deciding factor in many DSP applications [4] [5]. This demands for low power realization of circuits used in these DSP systems [6] [7]. The DSP market has long been driven by processor oriented solutions, where the emphasis is on designing efficient high- level codes with some thought given to the underlying processor architecture to optimize the performance [8]. For high-speed applications some platform-based solutions such as application specific integrated circuits (ASIC) and structural ASICs have been used [9]. Recently field programmable gate arrays (FPGAs) have proven to be favored platform for VLSI design engineers. FPGAs offer many advantages over ASIC and programmable systems. The high speed and low power advantage of FPGAs over microprocessors is a sustainable trend for a wide variety of applications [10] [11] [12]. Some other advantages include design modifications post production, low non-recurring engineering (NRE) costs, re- configurable design approach etc. [12] [13]. However, relative to ASICs with comparable functionality, FPGAs are power- hungry beasts and are typically not well suited for ultra low- power design techniques [14]. Thus developing FPGA specific low power structures is an important concern. CORDIC (COordinate Rotation DIgital Computer) [15] [16] is an iterative algorithm used for calculating various linear, trigonometric, hyperbolic and transcendental functions. The algorithm operates by rotating a vector, in linear, circular or hyperbolic coordinate systems, using only add and shift operations. CORDIC is unparalleled in its ability to encapsulate a diversity of math functions in one basic set of iterations [17]. Since the algorithm involves only add and shift operations it has very good hardware efficiency and a very minimal control overhead. The CORDIC algorithm has been applied to many different applications and has been used as a core arithmetic engine in many VLSI signal-processing implementations [18]. It has been used for computing the fast Fourier transform (FFT) [19] [20], the discrete cosine transform (DCT) [21], and the discrete Hartley transform [22]. A lot of work has focused on CORDIC based approaches for implementing various types of linear operations, including singular value decomposition (SVD) [23], Given!s rotations [24], recursive least square (RLS) filtering [25] etc. The rest of the paper is as follows. Section II briefly discusses the CORDIC algorithm, its operating modes and how the algorithm can be used for evaluating trigonometric functions. Section III discusses the unrolled CORDIC architectures. Section IV discusses how power reduction can be achieved by reducing the switching activity and the charging/discharging capacitances associated with the critical path. Section V carries out the actual synthesis, implementation and simulation. Conclusions are drawn in section VI and references are listed at last. II. CORDIC ALGORITHM CORDIC algorithm was first introduced by Volder [15] in 1959 as a technique for calculating the trigonometric functions required for real-time aircraft navigation. The algorithm operates by rotating a vector in linear, circular and hyperbolic coordinate systems by some prefixed elementary angle. Since its introduction, the basic algorithm has been extended to evaluate a very rich set of functions from the one basic set of equations. Different versions of the CORDIC algorithm can be defined under the circular, hyperbolic, and linear coordinate