Macro-programmable DSP architecture for parallel/pipelined data path units, targeted for FFT based algorithms Andreas Drollinger, Alexandre Heubi, Peter Balsiger, Fausto Pellandini Institute of Microtechnology, University of Neuchâtel Rue A.-L. Breguet 2, 2000 Neuchâtel, Switzerland URL: www-imt.unine.ch/esplab e-mail: andreas.drollinger@imt.unine.ch Abstract A macro-programmable DSP architecture is presented, which is very well situated for the implementation of algorithms with regular data flow graphs, like FFTs. A smart grouping of the algorithm together with the macrocode concept reduce drastically the control and address generation overhead of the DSP and shorten the computation time. This is finally manifested in very low-power consumption, small DSP size, high throughput combined with a high flexibility of the DSP architecture. 1. Introduction Several DSP algorithms have regular data flow structures and use just a small set of basic operations, which are repetitively executed. Taking as example an FFT filterbank: Its operation set is limited to the radix-2 butterfly. Butterfly operations of a higher radix number make the FFT computation more efficient. With some supplementary simple operations even sophisticated, FFT based filterbank algorithms like the WOLA 1 algorithm can be performed and gain coefficients can be applied on the processed data in the frequency domain [1], [2]. It is quite inefficient to use general purpose DSPs for such algorithms because they do not allow to profit from the algorithm’s regularity in order to save program code or to shorten the execution time. General purposes DSPs execute each operation individually and needs for that a detailed description that includes the internal (macro-) operation sequence and the parameter addresses. For each operation, general purposes DSPs have to read all these informations together with control instructions. This lowers 1 WOLA: weighted o verl ap a dd the execution time and increases the power consumption. On the other hand, a hardwired processor works much faster, because it doesn’t lose cycles for the control instructions. But the flexibility, due to the hardwired control and datapath units is low. The presented DSP architecture offers a solution that is flexible because it is macro- programmable and that has a similar efficiency to hardwired processors. It is especially efficient for well-organized algorithms like FFTs. The remainder of this paper describes the development of a macro-programmable DSP architecture. Section 2 presents the structuring concept for regular algorithms, while section 3 describes the DSP architecture for this kind of algorithms. Section 4 shows an application example and some results. Finally, conclusions are presented in Section 5. 2. The algorithm structuring Before considering the algorithm’s structuring, some terms should be introduced: • A function is a part of the algorithm, which has to be executed without interruption. • A pass is a sub-part of a function and groups identically operations together in a way that all data are read one time from the memory, processed and written back to the memory. • An operation is a sub-part of a pass and corresponds to a basic datapath operation. • A cycle is the number of clock cycles, which are needed by the datapath unit to process an operation. Figure 1 shows schematically the data flow of a 16-complex-point FFT spectrum analyzer with a windowing and time folding feature like it is used in the WOLA transformation. It has three Proceedings of the International Conference on Signal Processing, Applications and Technology (ICSPAT), 1-5, 2000 which should be used for any reference to this work 1