High Performance FIR Filter Design
for 6-input LUT Based FPGAs
Ugur Cini
Dept. of Electrical Engg.
Trakya University
Edirne, Turkey
ugurcini@trakya.edu.tr
Mustafa Aktan
TeraHz Microelectronics
Hacettepe Teknokent
Ankara, Turkey
mustafa.aktan@tera-micro.com
Abstract—Advanced FPGA structures contain 6-input LUT
tables suitable for the implementation of complex logic functions
in a more compact structure. In this paper, high performance
fixed coefficient FIR filters are designed by exploiting the
advantages of 6-input LUT structures. Using the proposed
methodology, fixed coefficient multiplication and accumulation is
employed as only two cascades of 6-input LUTs in the critical
path. Therefore high performance FIR filtering is possible
without any pipelining in the system. For the multiply-
accumulate operations only (6, 3) counters are employed together
with redundant carry double save operations. 440 MHz clock
frequency is reached for the designed 25 tap FIR filter on the
Stratix II family FPGA. The proposed arithmetic structure
provides more than 90% speed advantage over hardware
multiplier based multiply accumulate operations.
Keywords—FIR Filtering; carry save adder; carry double save;
FPGA arithmetic
I. INTRODUCTION
Generic field programmable gate array (FPGA) devices are
based on 4-input look-up table based logic elements. High
performance FPGA devices offer 6-input look-up table (LUT)
logic elements by which more complex functions can be
realized with higher performance. In this work, an extra
redundant arithmetic scheme is proposed to further exploit 6-
input LUT devices. By exploiting the extra redundant number
system a fixed coefficient finite impulse filter (FIR)
methodology is presented especially suitable for 6-input LUT
structures.
Redundant architectures are based on signed-digit systems
and carry-save arithmetic both of which provide carry-free
addition schemes [1-3]. In carry-save arithmetic, each digit of a
number is represented by two bits, namely carry (c) and sum
(s) [2, 3], whereas in conventional binary (e.g. 2’s
complement) representation, each digit is represented by a
single bit. The redundancy in number representation provides
carry-free arithmetic implementations. In this paper, a special
case for redundant arithmetic implementation is selected to
exploit full advantages of 6-input LUT structures. The result of
each filter tap is encoded as double carry-save representation,
where the output is encoded as the output of (6, 3) counter
arrays. The proposed structure is totally based on double carry-
save encoding where each digit of an arbitrary number is
represented by three bits. Redundant architectures implemented
on FPGAs are not very common. Publications related to FPGA
arithmetic can be found in [6-7], especially [7] focuses on 6-
input LUT structures.
Increased redundancy enables addition operation to be
handled within a single LUT delay on a 6-input LUT based
architecture, which is not possible with a conventional
redundant carry-save addition scheme. Addition of two double
carry-save mode numbers can be done using a parallel array of
(6, 3) counters. Using the proposed arithmetic, addition of two
redundant numbers requires a single LUT delay in 6-input
LUT structures which is the core of the paper.
For the 6-input LUT based FPGAs (6,3) counters are the
best suited multi-operand addition schemes [4]. In
multiplication, (6, 3) counters can be used to reduce six partial
products to three. Together with provided double carry-save
arithmetic, both addition and multiplication operations can be
handled using only (6, 3) counters, which provides a very
regular structure. A multiply-accumulate operation based on
the proposed system takes 2 LUT delays if the coefficients of
multiplications are 12-bit wide. 12-bit also refers (6, 3)
counters since canonic signed-digit (CSD) [5, 8] recording of
12-bit coefficients corresponds to at most 6 non-zero partial
products which is also suitable for the inputs of (6, 3) counters.
Moreover, wider bit fixed coefficient FIR filters are possible
using the proposed structure if any of the coefficient have at
most 6 non-zero digits. To guarantee that each coefficient has
at most 6 non-zero digits for the CSD encoded FIR filter
coefficients, a mixed integer programming model is presented.
So that higher coefficient bit width is possible. In the paper two
example FIR structures are synthesized having 12-bit
coefficients and 16-bit coefficients. Mixed integer
programming model provides at most 6 non-zero digits in each
of the FIR filter coefficients.
In the proposed system, the multiplication of each constant
coefficient is achieved through (6, 3) counter array with
redundant outputs. Moreover, backward sign-extension is
implemented for the removal of extra sign-bit in the system
which will be explained in the following sections. After the
multiplication phase, the redundant addition operation, i.e.
accumualtion phase, is also implemented by a single stage (6,
3) counter array yielding a regular multiply-accumulate
978-1-5090-0246-7/15/$31.00 ©2015 IEEE 653