A Hierarchical C2RTL Framework for FIFO-Connected Stream Applications Shuangchen Li Yongpan Liu Daming Zhang Xinyu He Pei Zhang Huazhong Yang {ypliu,yanghz}@tsinghua.edu.cn zhangpei@gmail.com TNList, EE Dept.,Tsinghua University Y Explorations Inc. Beijing, 100084, China San Jose, CA 95134, U.S. Abstract—In modern embedded systems, the C2RTL (high- level synthesis) technology helps the designer to greatly reduce time-to-market, while satisfying the performance and cost con- straints. To attack the performance challenges in complex designs, we propose a FIFO-connected hierarchical approach to replace the traditional flatten one in stream applications. Furthermore, we develop an analytical algorithm to find the optimal FIFO capacity to connect multiple modules efficiently. Finally, we prove the advantages of the proposed method and the feasibility of our algorithm in seven real applications. Experimental results show that the hierarchical approach can have an up to 10.43 times speedup compared to the flatten design, while our analytical FIFO sizing algorithm shrinks design time from hours to seconds with the same accuracy compared to the simulation based approach. I. I NTRODUCTION With continuous scaling down of CMOS technology, the gap between design productivity and transistor resources be- comes ever larger. To resolve the challenge, design community is seeking a higher abstraction rather than register transfer level(RTL). Furthermore, in modern SoC designs, extensive use of embedded processors, huge silicon capacity, reuse of be- havior IPs, extensive adoption of accelerators and more time- to-market pressure are needed. Compared with the traditional RTL approach, the C2RTL flow provides magnitudes of im- provements in productivity to better meet those requirements. Recently, people observed a rapid rising demand for the high quality C language to RTL (C2RTL) tools [1]. In reality, designers have successfully developed various applications using C2RTL tools with much shorter design time, such as face detection [2], 3G/4G wireless communication [3], digital video broadcasting [4] and so on. Among those tools, many [5]–[8] are focusing on stream applications. They create design architectures including different modules connected by first-in first-out (FIFO) channels. There are some other tools focusing on general purpose applications. For example, Catapult C [9] takes different timing and area constraints to generate Pareto-optimal solutions from common C algorithms. However, little control on the architecture leads to suboptimal results. As [10] has shown, FIFO-connected architecture can generate much faster and smaller results in stream applications. Among C2RTL tools for stream applications, GAUT [5] transforms C functions into pipelined modules consisting of processing units, memory units, and communication units. Global asynchronous local synchronous interconnections are adopted to connect different modules with multiple clocks. ROCCC [6] can create efficient pipelined circuits from C to be re-used in other modules or system codes. Impulse C [7] 0 This work was supported in part by the NSFC under grant 60976032, National Science and Technology Major Project under contract 2010ZX03006- 003-01, eXCite software donated from Y Explorations Inc. and High-Tech Research and Development (863) Program under contract 2009AA01Z130. provides a C language extension to define parallel processes and communication channels among modules. ASC [8] pro- vides a design environment for users to optimize systems from algorithm level to gate level, all within the same C++ program. All above tools leave the user to determine the FIFO capacity between modules, which is nontrivial. As shown in Section II, the FIFO capacity has a great impact on the system performance and memory resources. Though determining the FIFO capacity via extensive simulations may work for several modules, the exploration space will become prohibitive large in the multiple-module case. Therefore, previous simulation- based method is neither time-efficient nor optimal. To design a stream application, researchers also had inves- tigated on the input stream rates to make sure that the FIFO between PEs will not overflow, while the real-time processing requirements are met. On-chip traffic analysis of the SoC ar- chitecture had been explored [11]. However, their simulation- based approaches suffer from a long executing time and fail in exploring large design space. A mathematical framework of rate analysis for stream applications have been proposed [12]. Based on the network calculus, [13] extended the service curves to show how to shape an input stream to meet buffer constraints. Furthermore, [14] discussed the generalized rate analysis for multimedia processing platforms. However, all of them adopts a more complicated behavior model for PE streams, which is not necessary in the hierarchical C2RTL framework. This paper proposed a novel C2RTL framework, which supports a hierarchical way to implement complex stream applications and determines the FIFO capacity automatically. It is noted that this framework may be applicable to other applications, but it has the best improvement in stream ap- plications. Our contributions are listed as below: 1) Unlike treating the whole algorithm as one module in the flatten design, we cut the complex stream algorithm into modules and connect them with FIFOs. Experimental results showed that the hierarchical implementation provides 10.43 times speedup compared to the flatten design. 2) We formulate the parameters of modules in stream applications and give out analytical results for the optimal FIFO capacity in two-module case, which is validated by exhaustive simulations. Furthermore, we develop a heuristic algorithm to find the optimal FIFO capacity in a multiple-module design. 3) We demonstrate the proposed method in seven real applications. Compared to the uniform FIFO capacity, our method can save memory resources by 14.46 times. Furthermore, the algorithm can optimize FIFO capacity in seconds, while extensive simulations may need hours. The rest of the paper is organized as follows. Section II describes the motivation of our work. We present our model framework in Section III. The algorithm for optimal FIFO size 978-1-4673-0772-7/12/$31.00 ©2012 IEEE 2A-4 133