Runtime decision of hardware or software execution on a heterogeneous reconfigurable platform Vlad-Mihai Sima, Koen Bertels Computer Engineering Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract In this paper, we present a runtime optimization targeting the speedup of applications running on a reconfigurable platform supporting the MOLEN programming paradigm. More specifically, for functions that have an execution time dependent on parameters, we propose an online adaptive decision algorithm to determine if the gain of running that function in hardware outweighs the overhead of transferring the parameters, managing the start and stop of the execution and obtaining the result. Our approach is dynamic in the sense it does not rely on compile time information.The al- gorithm is applied on a real video codec for which a function is implemented in hardware and we show improvements as big as 24% percent can be obtained for the specific kernel. We also determine the overhead and execution time ranges in which this optimisation is usefull and what other factors can influence it. 1 1. Introduction Due to the increasing heterogeneity of computer system and applications, the hardware and software designers de- velop new approaches that use more efficiently the available, limited, resources. A wide range of such problems can be solved in a fast and efficient manner by Reconfigurable Computing which combines the flexibility of a GPP (general purpose processor) with the speed of the (reconfigurable) hardware. One issue with such complex system is to decide the mapping between the tasks that have to be performed and the available hardware. Within a single application context, this can be solved by the compiler. However as soon as multiple applications compete for the same resources, the compiler cannot solve this. In this paper, we propose an online decision algorithm called AMAP (adaptive mapping algorithm) that, taking into account particularities of the function and the history of the execution decides which implementation should use for the execution of that instance. One main novelty of the algorithm 1. This research has been funded by the hArtes project EU-IST-035143, the Morpheus project EU-IST-027342 and the Rcosy Progress project DES- 6392 is that it takes this decision as late as possible - before each run - so it can make better decisions than a compile time algorithm. Also the profile information is stored and will be used when taking the decision for the next function call. The paper is organized as follows: in Section 2 we briefly present the Molen programming paradigm for reconfigurable architectures and related work. Next, we give a motivational example and also define the exact problem. A detailed description of the runtime algorithm is presented in Section 4. The results of the algorithm are shown in Section 5. In Section 6, we present conclusions and outline new research directions. 2. Background and related work The MOLEN programming paradigm [1] is a paradigm that offers an abstraction of the available resources to the programmer, together with a model of interaction between the components. Using a ’one time’ architectural or oper- ating system extension the Molen programming paradigm allows for a virtually infinite number of new hardware operations to be executed on the reconfigurable hardware. The work done in hardware software partitioning consid- ered until now static partitioning done at compile time with the objective to minimize the total cost or minimize the cost while satisfying one constraint [2]. Various algorithms have been used to solve this problem like simulated annealing [3] [4], integer linear programming [5], mixed integer linear programming [6], knapsack problem [7] and genetic algo- rithms [8]. Different other problems were considered when doing the partitioning, like: area allocation [5], granularity selection [9] and scheduling [4] [8]. One common characteristic of all these approaches is that they rely on the fact that the profile information and execution trace are available at compile time and they optimize just for one specific set of cases ([5] [4] [9]). From the runtime and operating system point of view, the work was focused on online scheduling for task that are already mapped to hardware [10]. Using a cache and software dispatch was proposed in [11] but the software dispatch was used when the contention was too high.