18 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 6, NO. 1, MARCH 1998 A CORDIC Processor for FFT Computation and Its Implementation Using Gallium Arsenide Technology Roberto Sarmiento, Member, IEEE, F´ elix Tobajas, Valent´ ın de Armas, Roberto Esper-Cha´ ın, Jos´ e F. L´ opez, Member, IEEE, Juan A. Montiel-Nelson, Member, IEEE, and Antonio N´ u˜ nez Abstract—In this paper, the architecture and the implementa- tion of a complex fast Fourier transform (CFFT) processor using 0.6 m gallium arsenide (GaAs) technology are presented. This processor computes a 1024-point FFT of 16 bit complex data in less than 8 s, working at a frequency beyond 700 MHz, with a power consumption of 12.5 W. The architecture of the processor is based on the COordinate Rotation DIgital Computer (CORDIC) algorithm, which avoids the use of conventional multiplication- and-accumulation (MAC) units, but evaluates the trigonometric functions using only add and shift operations. Improvements to the basic CORDIC architecture are introduced in order to reduce the area and power of the processor. This together with the use of pipelining and carry save adders produces a very regular and fast processor. The CORDIC units were fabricated and tested in order to anticipate the final performance of the processor. This work also demonstrates the maturity of GaAs technology for implementing ultrahigh-performance signal processors. Index Terms—Application specific integrated circuits (ASIC’s), carry save adders, COordinate Rotation DIgital Computer (CORDIC), fast Fourier transform (FFT), full-custom, gallium arsenide (GaAs) VLSI design, high-performance systems. I. INTRODUCTION F FT (fast Fourier transform) is the most popular algorithm in digital signal processing. FFT has many high-end applications, such as radar, sonar, spread-spectrum commu- nications, image processing, general filtering, convolution, etc. Many of them require a good precision and real-time response. With the advent of VLSI, digital signal proces- sors (DSP’s) provide a convenient way to develop these applications. However, high-performance applications are out of the reach of a single processor and parallel DSP chips and application specific DSP chips have been introduced for this reason. Application specific circuits developed in silicon compute a 1024-point FFT in tens of milliseconds [1]. Higher performance is only possible using several of these Silicon chips in parallel. However, digital gallium arsenide (GaAs) technology can provide a superior performance using monolithic solutions. Most signal processing algorithms (e.g., fast Fourier trans- form) require the evaluation of trigonometric functions, which Manuscript received February 3, 1997. This work was supported in part by the ESPRIT project CT93-0385, and the Spanish National Science Foundation (DGICYT) project CE94-0013. The authors are with the Centro de Microelectr´ onica Aplicada, Universidad de Las Palmas de Gran Canaria, 35017 Las Palmas de Gran Canaria, Spain (e-mail: roberto@cma.ulpgc.es). Publisher Item Identifier S 1063-8210(98)01319-5. normally create a bottleneck in digital signal processors. These functions are frequently implemented using a multiplication- and-accumulation unit (MAC), which basically consists of multipliers, adders, and registers. The design and implementa- tion of these primitives has already been studied in gallium arsenide [2]. However, mimicking the silicon solutions in GaAs could produce inefficient systems in terms of perfor- mance or cost. The CORDIC (COordinate Rotation DIgital Computer) al- gorithm has been shown to be an efficient way of evaluating the elementary functions [3], such as trigonometric, expo- nential, and logarithmic functions. Although the CORDIC algorithm was proposed in 1959, current VLSI technology has created a new interest in a number of applications of the algorithm. In this paper, the architecture and the implementation of a complex fast Fourier transform (CFFT) processor using 0.6 m gallium arsenide technology are presented. This processor computes a 1024-point FFT of 16 bit complex data in less than 8 s, working at a frequency beyond 700 MHz, with a power consumption of only 12.5 W. The architecture of the processor is based on the CORDIC algorithm, that evaluates the trigonometric functions using only add and shift schemes. Previous work [4], [5] has shown that the CORDIC algorithm usually yields slow and area demanding circuits. Hence, an application specific CORDIC has been used and other improvements (such as a novel mixed radix2/radix4 approach) have been made in order to over- come these drawbacks. To increase the throughput, pipelining and redundant arithmetic representation have been used. For this architecture, functional units have been designed and optimized for an enhancement/depletion self-aligned process taking into account such issues as process spread, temperature variations, etc. The processor is laid-out in full-custom using the merged logic approach [6] optimizing area and power consumption. The organization of the paper is as follows. In Section II, an overview of the CORDIC processor is presented. A basic description of the FFT processor architecture, which includes a mixed radix2/radix4 approach is presented in Section III. The design and implementation of the different units and the design of the system using the selected technology is given in Section IV. This section also focuses on the implementation challenges put forward by this demanding technology. In Section V, the performance of the processor is highlighted 1063–8210/98$10.00 1998 IEEE