Journal of VLSI Signal Processing 25, 187–193, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Numerical Accuracy of Fast Fourier Transforms with CORDIC Arithmetic M. BEKOOIJ AND J. HUISKEN Philips Research Laboratories, Prof. Holstlaan 4, 5656 AA Eindhoven,The Netherlands K. NOWAK Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands Abstract. The vector rotation operation in the butterfly of a Fast Fourier Transform (FFT) can be calculated by a complex multiplier as well as a CORDIC (COordinate Rotation DIgital Computer). For these vector rotation blocks, expressions for the maximum numerical error are derived. It is shown that the error introduced by the CORDIC can be reduced by increasing the size of the input vector of the CORDIC and decreasing the size of the output vector by the same amount. This input vector scaling makes the reduction possible of the number of bits in the data path of the CORDIC. The impact on the Signal to Noise Ratio (SNR) of the FFT is evaluated when a CORDIC is applied in the FFT butterfly. 1. Introduction The N-points Discrete Fourier Transform (DFT) of a N -point sequence is by definition: X(k ) = N −1 n=0 x(n) · W nk N k = 0, 1,..., N − 1 (1) with W N = e − j (2π/ N ) The input sequence x(n) and output sequence X(k ) are complex numbers. In this article a complex number, v = a + j · b, is denoted by a varaible with a bold face and represents a 2D-vector, v = ( a b ). The direct computation of the DFT requires a num- ber of computations proportional to N 2 . Cooley and Tuckey [1] published in 1965 an algorithm for the computation of the discrete Fourier transform which exploits the symmetry and the periodicity of the se- quence (W k N ) and requires only a number of computa- tions proportional to N log N . After this publication many other DFT algorithms where published in which the number of computa- tions was drastically reduced. These algorithms be- came known as Fast Fourier Transforms (FFT). The algorithm of Cooley and Tuckey remains very attractive for a hardware implementation because only one butterfly type is used and the regularity of memory address calculation makes an efficient in place FFT implementation possible. To achieve the dramatical increase in computational efficiency, the DFT is decomposed in DFTs of a smaller length. Algorithms in which the sequence x(n) is de- composed in smaller subsequences are called decima- tion in-time algorithms and algorithms in which the sequence X(k) is decomposed are called decimation in-frequency algorithms. The computational complex- ity of both algorithms is O ( N log N ). The data flow graph of an eight-point decimation in-time FFT is de- picted in Fig. 1. The basic computation units in the FFT are called butterflies because of their shape in the data flow graph. In Fig. 2, a radix 2 decimation in time butterfly is shown which consists of a complex adder, a subtracter and a complex multiplier (CM). The operation of a butterfly is described by the Eqs. (2)–(4). X m+1 ( p) = X m ( p) + W r N X m (q ) (2) X m+1 (q ) = X m ( p) − W r N X m (q ) (3)