A Framework for Mapping Dynamic Virtual Kernels onto Heterogeneous Reconﬁgurable Platforms Harry Sidiropoulos, Kostas Siozios, Dimitrios Soudris School of Electrical and Computer Engineering National Technical University of Athens Athens, Greece Email: {harry, ksiop, dsoudris} @microlab.ntua.gr Abstract—Field Programmable Gate Arrays (FPGAs) promise a low power ﬂexible alternative for today’s market heterogeneous systems. In order to be widely accepted, novel solutions and approaches are required for fast and ﬂexible application imple- mentation. In this paper we propose a methodology, as well as the supporting toolﬂow targeting to provide fast implementation of multiple applications onto heterogeneous FPGAs. For this purpose we introduce the concept of dynamic virtual kernels. Experimental results prove the efﬁciency of the introduced solution, as we achieve application’s mapping 30× faster on average compared to a state-of-art approach, with negligible performance degradation. Additionally, we enable the dynamic mapping of multiple applications onto a single FPGA with only a small penalty of 4.7% in the maximum operation frequency of those applications compared with our reference solution. I. I NTRODUCTION Existing applications impose a continuously increased de- mand for processing power. This trend affects not only scien- tiﬁc and industrial applications but also consumer and end-user applications. As an outcome, a number of design strategies and methodologies have been proposed that take into advantage the additional ﬂexibility offered by heterogeneous systems. For instance, existing consumer electronics (eg. smart- phones, TVs) implement a wide variety of kernels, usually with diverse functionalities spanning from multimedia players to telecommunication platforms. An interesting thing to note is that these functionalities, in many cases, are not known in design time as they are user deﬁned (smart-phones used as gaming consoles, smart TVs as web browsers, etc.). Both industry and academia consider the Field Pro- grammable Gate Arrays (FPGAs) as a viable alternative im- plementation medium. The inherent parallelism and repro- grammability features found in FPGAs are applicable either at design or run time, depending on the application’s inherent requirements. There is a continuously increased interest for employing FPGAs as hardware accelerators and processing modules in high performance and embedded computing do- mains. FPGAs in order to integrate efﬁciently on this new land- scape need to support fast application development and imple- mentation. Industry has taken steps towards faster application development, exploring diverse solutions. Examples of these solutions can be found in the EDA tools of leading commercial FPGA companies, like Xilinx that integrated a High Level Synthesis environment in the new toolﬂow Vivado [1], and Altera that supports OpenCL kernels [2]. For faster application implementation the main body of research focuses on faster mapping algorithms and tools. The most computational intensive task during application implementation onto an FPGA, is the placement and routing (P&R) step. In order to overcome this limitation researchers have already proposed a number of solutions [3] [4] [5]. Authors in [3] have developed a parallel placer based on a simulated annealing algorithm in order to decrease execution time and incorporated this placer in Altera’s FPGA toolﬂow. In [4] and [5], authors incorporate known techniques from the Application Speciﬁc Integrated System’s (ASICs) domain in order to reduce placer’s execution time. Its important to note that the reconﬁgurable platforms in [3] and [4] are realistic heterogeneous FPGAs that consist of logic, DSP, memory and I/O blocks. Another approach for supporting fast application imple- mentation relies on reconﬁguring parts of the FPGA on runtime in order to change an already implemented design or replace it with another. Research on this topic aims usually to identify a proper region over the target architecture, with a sufﬁcient amount of contiguous free hardware resources, in advance of placing a new conﬁguration bitstream [6] [7] [8] [9] [10] [11]. A relevant problem affects the case, where it is not possible to identify such a region onto the target architecture. Algo- rithms that deal with this problem perform re-arrangement of conﬁgured hardware resources [12] [13]. Unfortunately, these re-allocation algorithms are applicable almost exclusively at design-time (off-line), due to limitations such as the increased computational effort and the data hazard problems during the transfer of applications functionalities. Throughout this paper we introduce a novel methodology and the supporting toolﬂow for performing rapid application mapping. We consider a heterogeneous FPGA platform as a pool of hardware resources including logic blocks, memory and DSP blocks, where applications can be mapped as dy- namic hardware kernels onto these resources. Each of these virtual dynamic kernels, named VKernels, can realize only one application whereas multiple VKernels can be mapped onto a single FPGA. The contributions of the proposed methodology, named Het-JITPR, and the supporting toolﬂow are summarized as follows: - It enables mapping of multiple virtual hardware ker- nels (VKernels) onto a single heterogeneous FPGA.