Analyzing heterogeneous system architectures by means of cost functions: A comparative study for basic operations H. Blume, H. T. Feldkämper, H. Hübert, T. G. Noll Chair of Electrical Engineering and Computer Systems Institute of Technology, RWTH Aachen Schinkelstr. 2, 52062 Aachen, GERMANY blume@eecs.rwth-aachen.de Abstract Optimum partitioning plays a crucial role in the im- plementation of systems on heterogeneous target architectures. Typical target architectures for such systems include FPGAs, semi-custom or physically optimized ASICs as well as programmable digital signal or general purpose processors. The goal was to provide an estimation for implementation specific parameters like throughput rate, power dissipation and required silicon area by means of cost functions. Considering normalization we provide implementation parameters of selected basic operations like linear and nonlinear filtering, exemplary arithmetic operation and simple matching operations which all are required in the digital video processing domain. These operations were optimized for each architecture block. We show quantitatively that the cost ratio between different architecture blocks highly depends on the operation to be performed. This information is essential in order to find the optimum partitioning for implementing a system. 1. Introduction Until recently, computational intensive high- throughput applications like video processing were implemented on dedicated ASICs. Today modern reconfigurable components and digital signal processors (DSP) can also be applied due to their increasing capacity [2], [3]. The applications are often based on systems containing system blocks like filtering (Fig- ure 1). These systems are partitioned to heterogeneous target architectures. These architecures consist of different architecture blocks e.g. an FPGA or a dedicated coprocessor (CP) due to their inherent advantages in flexibility respectively performance. These different architecture blocks provide implementation alternatives for the system blocks. After partitioning each system block is mapped to a specific architecture block. A comparison between the architecture blocks regarding implementation specific parameters like throughput rate, power dissipation and required silicon area supports the optimum partitioning for a heterogeneous target architecture. In the past many authors provided comparisons including a subset of the architecture blocks respectively the specific parameters [1], [4], [5]. DSP core FPGA CP1 filter image analysis modules motion estimation . . . system system blocks architecture blocks heterogeneous target architecture (system-on-chip) µCntr. partitioning and mapping CP2 CPx on chip memory implementation alternatives Figure 1: Partitioning of a system to a heterogeneous target architecture This paper is organized as follows: In chapter 2 the selected basic operations are presented and the applied evaluation metrics are explained including the technol- ogy normalizations. In chapter 3 the different archi- tecture blocks are described. In chapter 4 optimizations for the architecture blocks and the results for different basic operations are discussed. Conclusions are given in chapter 5. 2. Selected basic operations and evaluation metrics for digital signal processing With regard to toolboxes of modern multimedia ap- plications we selected the following basic operations in order to evaluate them as reference examples: ! 1D/2D linear filters, e.g. for interpolation, edge detection and noise reduction,