Hardware efficient design of Variable Length FFT
Processor
Vinay Gautam
1
, Kailash Chandra Ray
2
, Pauline Haddow
1
1
Department of Computer and Information Science, NTNU
Trondheim, Norway-7491
{vkgautam, pauline}@idi.ntnu.no
2
Department of Electrical Engineering, Indian Institute of Technology
Patna, India-800013
kcr@iitp.ac.in
Abstract—Proliferation of handheld devices and growing
interests in pervasive computing has led to the need for more
flexible communication solutions where a single device integrates
various wired and wireless communication standards e.g.
Asymmetric Digital Subscriber loop (ADSL), Very high speed
Digital Subscriber Loop (VDSL), Digital Audio Broadcasting
(DAB), Digital Video Broadcasting (DVB-T/H) and 802.11. In
this paper, such a flexible communication solution is presented,
applicable to all useful FFT processor lengths: 2
n
(n=6, 7…..13)
and implemented on a flexible platform: Field Programmable
Gate Array (FPGA). The solution is optimized ensuring an
efficient implementation with respect to resource usage whilst
ensuring that the solution meets the throughput requirements of
the individual standards. The key features of the efficient design
include: a conflict free in-place memory replacement scheme for
intermediate data storage; a dynamic address generator scheme
and the CORDIC (CO-ordinate Rotational Digital Computer)
technique for twiddle factor multiplication.
Keywords— OFDM, FFT, Pipelined CORDIC, Dynamic Address
Generator.
I. INTRODUCTION AND MOTIVATION
A Fast Fourier Transform (FFT) processor is one of the major
components of an Orthogonal Frequency Division
Multiplexing (OFDM) communication system [1]. There are a
number of communication standards for both wired and
wireless communication, each requiring a separate FFT length
[2] and minimum throughput. In recent years, variable length
FFT processors (VL-FFT), covering all such standards, have
had much attention so as to meet the availability requirements
from users of portable and handheld devices requiring flexible
access to various communication channels. FFT operation is
commonly implemented as a separate module to meet
computational intensity requirement on a Digital Signal
Processor (DSP), an application specific FFT Processor on a
FPGA or as an ASIC design. A DSP solution is relatively
simple to implement and generally exhibits high throughput
due to the higher clock frequency comparable to FPGAs.
However, the high power and resource usage [3] does not fit
with the move to handheld and portable devices.
To achieve the minimum throughput requirement of the
different standards on a less power hungry FPGA requires a
highly optimized design. As such, the focus of this work has
been to create a flexible FPGA solution that meets such
throughput requirements whilst focusing on an efficient design
with respect to resource usage. An extension to this work
would be to further refine the proposed solution to minimize
power usage. However, such an extension is not included in
this work.
There are two common architectures for FFT processors:
Pipelined architecture and memory based architectures [2], [3],
[4], [5]. Pipelined FFT processors provide higher performance
and consume much hardware resource whilst memory based
FFT processors need less hardware resource but require to
operate at higher clock frequency to meet the throughput. It is
the later approach addressed in this work so as to support
efficient design.
II.FAST FOURIER TRANSFORM
Cooley and Tukey [9] proposed Fast Fourier Transform (FFT)
as a computationally efficient method for Discrete Fourier
Transform (DFT). Equation (1) represents N-point DFT,
where X(k) and x(n) are N point sequences in the frequency
domain and time domain respectively.
N
j
N
N n
n
nk
N
e W Factor Twiddle
N k for W n x k X
π 2
1
0
:
) 1 ( 1 0 ) ( ) (
-
- =
=
=
- ≤ ≤ =
∑
III. MEMORY BASED FFT PROCESSOR
Memory based architectures[][][] for variable length FFT
processors require a Processing Element (PE), memory
(RAM) - for storing initial as well as intermediate processed
data; a conflict free memory accessing scheme ( read/write
address generation) and twiddle factor multiplication
technique.
A. Processing Element:
The basic processing unit of FFT operation is known as
butterfly or processing element (PE). Radix-2 butterfly – see
Fig. 1, is such a basic processing element. Radix-2 FFT
operation is performed on two time domain values and
978-1-4244-9756-0/11/$26.00 ©2011 IEEE