Abstract—Datapath merging is an efficient high level synthesis method to merge Data Flow Graphs (DFGs), corresponding to two or more computational intensive loops. This process creates a general purpose datapaths (merged datapaths) instead of multiple datapaths that results in shorter bit-stream length and therefore reduces the configuration time in reconfigurable systems. The merged datapath, however has worse loop execution time. This paper represents two datapath merging algorithms to address this problem. These algorithms consider the impact of adding multiplexer's latency to the critical path delay of the merged datapath. The former algorithm merges DFGs from the biggest DFG to the smallest one to make high speed merged datapath. The latter merges DFGs in steps, and in the final step, it combines the resources inside the merged datapath to achieve additional reduction in configuration time. The proposed techniques are evaluated using several Mediabench applications. The experimental results show a significant reduction, up to 35% in loops execution time for the first algorithm and up to 27% reduction for the second algorithm in comparison to previous datapath merging algorithm. I. INTRODUCTION Many applications contain computational intensive loops, which in some cases can be accelerated by reconfigurable devices such as FPGAs. On the other hand, the FPGA resources are limited. In order to share FPGA resources among different applications, run-time reconfiguration is employed when the hardware is needed [1]. However, the run-time reconfiguration imposes a considerable overhead to the performance of the system. Therefore, the configuration should be done as efficient as possible. The bit-stream length and the configuration time of the hardware are directly proportional. In fact, the time of transmitting bit-stream into FPGA corresponds to the configuration time [2] and therefore, reducing the bit-stream length amortizes the configuration time. Previous research has been carried out to reduce the configuration time by using compression and caching techniques. For instance, the authors in [3,4] and [5] used compression and caching techniques to reduce the bit-stream length, respectively. Although these techniques reduce the configuration time, they are costly. In order to prevent the additional cost, the configuration time reduction can be addressed during High Level Synthesis (HLS). Mostly HLS is used to create the Data Flow Graphs (DFG) with a number of iterations for the computational intensive loops [6, 7]. The HLS shares resources of the DFGs to make a more generic datapath. Therefore, it reduces the hardware cost of the datapath. The synthesis process comprises the major tasks of scheduling, resource allocation, resource binding, and interconnection binding [8]. Making a multimode datapath instead of multiple datapaths can reduce the hardware cost [9]. Datapath merging is an efficient HLS approach that makes a multimode datapath for partially reconfigurable systems [10,11]. We showed in [12] that datapath merging is a suitable method to reduce the datapath configuration time. The method in [12] heuristically chooses a sequence of DFGs to merge together, consequently, it cannot optimize the configuration time of the merged datapath. On the other hand, datapath merging algorithms add multiplexers in the input port of the functional units in the merged datapath, and as such, increase the execution time of the loops via the merged datapath. Merging more DFGs also causes sharing more functional units in merged datapath. This means that the final merged datapath employs larger multiplexers. If the multiplexer is on the critical path of the merged datapath, it will increase the loops execution time. In case of loops with many iterations; this time-overhead will be unacceptable. In order to provide a high speed merged datapath, we present two new datapath merging algorithms. The former merges DFGs to reduce the configuration time while it avoids large multiplexer on the critical path of the merged datapath. The latter algorithm merges DFGs in steps in the same way, and in the final step it combines the resources inside the merged datapath to achieve the additional reduction in configuration time. The rest of the paper is organized as follows. In section II, trade-off between the conflicting factors, configuration time reduction, and increase in loop High Speed Merged-datapath Design for Run-Time Reconfigurable Systems Mahmood Fazlali #*1 , Ali Zakerolhosseini #2 , Asadollah Shahbahrami #†3 and Georgi Gaydadjiev *4 # Department of Computer Engineering, Shahid Beheshti University G.C, Tehran, Iran 1 fazlali@cc.sbu.ac.ir 2 a-zaker@sbu.ac.ir * Computer Engineering Lab., Delft University of Technology, Delft, The Netherlands † Department of Computer Engineering, University of Guilan, Rasht, Iran { 3 a.shahbahrami, 4 g.n.gaydadjiev}@tudelft.nl PREPRESS PROOF FILE CAUSAL PRODUCTIONS 1