Hardware efficient design of Variable Length FFT Processor Vinay Gautam 1 , Kailash Chandra Ray 2 , Pauline Haddow 1 1 Department of Computer and Information Science, NTNU Trondheim, Norway-7491 {vkgautam, pauline}@idi.ntnu.no 2 Department of Electrical Engineering, Indian Institute of Technology Patna, India-800013 kcr@iitp.ac.in AbstractProliferation of handheld devices and growing interests in pervasive computing has led to the need for more flexible communication solutions where a single device integrates various wired and wireless communication standards e.g. Asymmetric Digital Subscriber loop (ADSL), Very high speed Digital Subscriber Loop (VDSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting (DVB-T/H) and 802.11. In this paper, such a flexible communication solution is presented, applicable to all useful FFT processor lengths: 2 n (n=6, 7…..13) and implemented on a flexible platform: Field Programmable Gate Array (FPGA). The solution is optimized ensuring an efficient implementation with respect to resource usage whilst ensuring that the solution meets the throughput requirements of the individual standards. The key features of the efficient design include: a conflict free in-place memory replacement scheme for intermediate data storage; a dynamic address generator scheme and the CORDIC (CO-ordinate Rotational Digital Computer) technique for twiddle factor multiplication. KeywordsOFDM, FFT, Pipelined CORDIC, Dynamic Address Generator. I. INTRODUCTION AND MOTIVATION A Fast Fourier Transform (FFT) processor is one of the major components of an Orthogonal Frequency Division Multiplexing (OFDM) communication system [1]. There are a number of communication standards for both wired and wireless communication, each requiring a separate FFT length [2] and minimum throughput. In recent years, variable length FFT processors (VL-FFT), covering all such standards, have had much attention so as to meet the availability requirements from users of portable and handheld devices requiring flexible access to various communication channels. FFT operation is commonly implemented as a separate module to meet computational intensity requirement on a Digital Signal Processor (DSP), an application specific FFT Processor on a FPGA or as an ASIC design. A DSP solution is relatively simple to implement and generally exhibits high throughput due to the higher clock frequency comparable to FPGAs. However, the high power and resource usage [3] does not fit with the move to handheld and portable devices. To achieve the minimum throughput requirement of the different standards on a less power hungry FPGA requires a highly optimized design. As such, the focus of this work has been to create a flexible FPGA solution that meets such throughput requirements whilst focusing on an efficient design with respect to resource usage. An extension to this work would be to further refine the proposed solution to minimize power usage. However, such an extension is not included in this work. There are two common architectures for FFT processors: Pipelined architecture and memory based architectures [2], [3], [4], [5]. Pipelined FFT processors provide higher performance and consume much hardware resource whilst memory based FFT processors need less hardware resource but require to operate at higher clock frequency to meet the throughput. It is the later approach addressed in this work so as to support efficient design. II.FAST FOURIER TRANSFORM Cooley and Tukey [9] proposed Fast Fourier Transform (FFT) as a computationally efficient method for Discrete Fourier Transform (DFT). Equation (1) represents N-point DFT, where X(k) and x(n) are N point sequences in the frequency domain and time domain respectively. N j N N n n nk N e W Factor Twiddle N k for W n x k X π 2 1 0 : ) 1 ( 1 0 ) ( ) ( - - = = = - = III. MEMORY BASED FFT PROCESSOR Memory based architectures[][][] for variable length FFT processors require a Processing Element (PE), memory (RAM) - for storing initial as well as intermediate processed data; a conflict free memory accessing scheme ( read/write address generation) and twiddle factor multiplication technique. A. Processing Element: The basic processing unit of FFT operation is known as butterfly or processing element (PE). Radix-2 butterfly – see Fig. 1, is such a basic processing element. Radix-2 FFT operation is performed on two time domain values and 978-1-4244-9756-0/11/$26.00 ©2011 IEEE