Journal of VLSI Signal Processing 26, 333–359, 2000 c  2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Reconﬁgurable Filter Coprocessor Architecture for DSP Applications S. RAMANATHAN, S.K. NANDY AND V. VISVANATHAN Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore-560 012, India Received September 22, 1999; Revised April 20, 2000 Abstract. Digital Signal Processing (DSP) is widely used in high-performance media processing and communi- cation systems. In majority of these applications, critical DSP functions are realized as embedded cores to meet the low-power budget and high computational complexity. Usually these cores are ASICs that cannot be easily retargeted for other similar applications that share certain commonalities. This stretches the design cycle that affects time-to-market constraints. In this paper, we present a reconﬁgurable high-performance low-power ﬁlter coproces- sor architecture for DSP applications. The coprocessor architecture, apart from having the performance and power advantage of its ASIC counterpart, can be reconﬁgured to support a wide variety of ﬁltering computations. Since ﬁltering computations abound in DSP applications, the implementation of this coprocessor architecture can serve as an important embedded hardware IP. Keywords: reconﬁgurable coprocessors, ﬁlter coprocessor architecture, low-power architectures, pipelined architectures, systolic architectures and digital signal processing I. Introduction Digital signal processing (DSP) is all pervasive in cur- rent day high performance media processing and com- munication systems [1–10]. With advances in VLSI technology, it is possible today to realize DSP algo- rithms with throughput demands exceeding gigabit /sec along with a low-power budget [11–15]. The demand for high throughput is driven by the fact that com- putations in DSP are real-time in nature and DSP applications are becoming more complex. The de- mand for low-power is driven by the proliferation of portable/mobile systems. Current day approach to the design of high per- formance DSP systems is to implement compute- intensive/power-critical portion of the DSP application as an ASIC and the rest of the application as soft- ware running on a programmable processor. The for- mer could be treated as the hardware partition of the target application, while the latter could be treated as the software partition. For example, a ﬁlter coproces- sor is integrated along with the DSP56300 core [16] (Motorola DSP processor) to double the chip’s perfor- mance for telecommunication applications [2] while keeping the footprint small and power consump- tion low. As another example, a variable length de- coder (VLD) coprocessor and an image coproces- sor are embedded in the TriMedia CPU64 core [17] (Philips Media Processor) to enhance the chip’s per- formance for media processing applications [1]. In yet another example, video signal processors [18] use coprocessors for DCT, motion estimation and entropy coding. This design approach, even though meets the performance objectives, fails to meet shorter “time-to-market”. This is primarily due to the fact that, each new DSP application would most of- ten involve minimal reuse of designs from simi- lar applications. This further increases the system cost. The advances in general purpose programmable pro- cessors and special purpose programmable processors, such as:  RISC architectures based on superscalar [19–23] and multiscalar [24, 25] approaches improve in- struction-level-parallelism (ILP) [26],