High Performance FIR Filter Design for 6-input LUT Based FPGAs Ugur Cini Dept. of Electrical Engg. Trakya University Edirne, Turkey ugurcini@trakya.edu.tr Mustafa Aktan TeraHz Microelectronics Hacettepe Teknokent Ankara, Turkey mustafa.aktan@tera-micro.com Abstract—Advanced FPGA structures contain 6-input LUT tables suitable for the implementation of complex logic functions in a more compact structure. In this paper, high performance fixed coefficient FIR filters are designed by exploiting the advantages of 6-input LUT structures. Using the proposed methodology, fixed coefficient multiplication and accumulation is employed as only two cascades of 6-input LUTs in the critical path. Therefore high performance FIR filtering is possible without any pipelining in the system. For the multiply- accumulate operations only (6, 3) counters are employed together with redundant carry double save operations. 440 MHz clock frequency is reached for the designed 25 tap FIR filter on the Stratix II family FPGA. The proposed arithmetic structure provides more than 90% speed advantage over hardware multiplier based multiply accumulate operations. Keywords—FIR Filtering; carry save adder; carry double save; FPGA arithmetic I. INTRODUCTION Generic field programmable gate array (FPGA) devices are based on 4-input look-up table based logic elements. High performance FPGA devices offer 6-input look-up table (LUT) logic elements by which more complex functions can be realized with higher performance. In this work, an extra redundant arithmetic scheme is proposed to further exploit 6- input LUT devices. By exploiting the extra redundant number system a fixed coefficient finite impulse filter (FIR) methodology is presented especially suitable for 6-input LUT structures. Redundant architectures are based on signed-digit systems and carry-save arithmetic both of which provide carry-free addition schemes [1-3]. In carry-save arithmetic, each digit of a number is represented by two bits, namely carry (c) and sum (s) [2, 3], whereas in conventional binary (e.g. 2’s complement) representation, each digit is represented by a single bit. The redundancy in number representation provides carry-free arithmetic implementations. In this paper, a special case for redundant arithmetic implementation is selected to exploit full advantages of 6-input LUT structures. The result of each filter tap is encoded as double carry-save representation, where the output is encoded as the output of (6, 3) counter arrays. The proposed structure is totally based on double carry- save encoding where each digit of an arbitrary number is represented by three bits. Redundant architectures implemented on FPGAs are not very common. Publications related to FPGA arithmetic can be found in [6-7], especially [7] focuses on 6- input LUT structures. Increased redundancy enables addition operation to be handled within a single LUT delay on a 6-input LUT based architecture, which is not possible with a conventional redundant carry-save addition scheme. Addition of two double carry-save mode numbers can be done using a parallel array of (6, 3) counters. Using the proposed arithmetic, addition of two redundant numbers requires a single LUT delay in 6-input LUT structures which is the core of the paper. For the 6-input LUT based FPGAs (6,3) counters are the best suited multi-operand addition schemes [4]. In multiplication, (6, 3) counters can be used to reduce six partial products to three. Together with provided double carry-save arithmetic, both addition and multiplication operations can be handled using only (6, 3) counters, which provides a very regular structure. A multiply-accumulate operation based on the proposed system takes 2 LUT delays if the coefficients of multiplications are 12-bit wide. 12-bit also refers (6, 3) counters since canonic signed-digit (CSD) [5, 8] recording of 12-bit coefficients corresponds to at most 6 non-zero partial products which is also suitable for the inputs of (6, 3) counters. Moreover, wider bit fixed coefficient FIR filters are possible using the proposed structure if any of the coefficient have at most 6 non-zero digits. To guarantee that each coefficient has at most 6 non-zero digits for the CSD encoded FIR filter coefficients, a mixed integer programming model is presented. So that higher coefficient bit width is possible. In the paper two example FIR structures are synthesized having 12-bit coefficients and 16-bit coefficients. Mixed integer programming model provides at most 6 non-zero digits in each of the FIR filter coefficients. In the proposed system, the multiplication of each constant coefficient is achieved through (6, 3) counter array with redundant outputs. Moreover, backward sign-extension is implemented for the removal of extra sign-bit in the system which will be explained in the following sections. After the multiplication phase, the redundant addition operation, i.e. accumualtion phase, is also implemented by a single stage (6, 3) counter array yielding a regular multiply-accumulate 978-1-5090-0246-7/15/$31.00 ©2015 IEEE 653