Published in IET Computers & Digital Techniques Received on 1st June 2007 Revised on 28th November 2007 doi: 10.1049/iet-cdt:20070085 ISSN 1751-8601 Scheduling methodology for conditional execution of kernels onto multicontext reconfigurable architectures F. Rivera 1 M. Sanchez-Elez 1 R. Hermida 1 N. Bagherzadeh 2 1 Departamento Arquitectura de Computadores y Automa ´ tica, Universidad Complutense de Madrid, Madrid 28040, Spain 2 Department of Electrical Engineering and Computer Science, University of California, Irvine, CA 92697-2625, USA E-mail: farivera@fis.ucm.es Abstract: The authors present a scheduling methodology for conditional execution of kernels onto single instruction stream/multiple data stream multicontext reconfigurable architectures. Data flow graphs are used to describe the target applications in which some kernels are conditionally executed depending on runtime conditions. Immediately after testing a condition the next kernel to be processed is known and its configurations and input data can be loaded, producing a computation stall while these transfers are performed. A compilation-time kernel scheduling is proposed to handle conditional branches (CBs) by determining a kernel sequence that minimises these computation stalls reducing the application latency. Target applications are firstly partitioned taking into account the presence of CBs, and then kernels are ordered for execution and mapped onto the reconfigurable system. Experimental results obtained for interactive and synthetic applications demonstrate the effectiveness of the proposal. 1 Introduction Configurable computing systems [1] represent an intermediate approach between general purpose and application specific systems. Configurable computers potentially achieve a similar performance to that of customised hardware, while maintaining a similar flexibility to that of general purpose machines. Configurable computing fundamental principle is that the hardware organisation, functionality and/or interconnections may be customised after fabrication. The most common devices used in configurable computing systems are field programmable gate arrays (FPGAs) [2]. Multicontext coarse-grained systems are a configurable alternative to FPGAs. Although the processing elements (PEs) in fine-grain systems are dedicated to bit-oriented operations, in coarse-grain systems the PEs may contain complete functional units like arithmetic and logic units (ALUs) and/or multipliers that operate upon multiple-bit words. Many coarse-grained multicontext reconfigurable systems have been proposed from academy and industry: MorphoSys [3], Remarc [4], Matrix [5], Chameleon/Montium [6], DAPDNA [7], and XPP [8]. Abundant parallel resources, high computational density and flexibility in terms of changing the behaviour during runtime make coarse-grained reconfigurable architectures more suitable for many applications in the multimedia and communication domains. Applications usually implemented on them are data-intensive. Likewise, they are computation- intensive because of the large number of significant computations they have to perform, implying an intensive context switching workload. Then, data and configuration transfers come up as a problem to be taken into account when implementing these applications. Initially, the problem seems to be solved because the applications usually implemented have a behaviour that & The Institution of Engineering and Technology 2008 IET Comput. Digit. Tech., 2008, Vol. 2, No. 3, pp. 199–213 / doi: 10.1049/iet-cdt:20070085 199 www.ietdl.org