A Hierarchical C2RTL Framework for
FIFO-Connected Stream Applications
Shuangchen Li
∗
Yongpan Liu
∗
Daming Zhang
∗
Xinyu He
∗
Pei Zhang
†
Huazhong Yang
∗
{ypliu,yanghz}@tsinghua.edu.cn zhangpei@gmail.com
∗
TNList, EE Dept.,Tsinghua University Y Explorations Inc.
Beijing, 100084, China San Jose, CA 95134, U.S.
Abstract—In modern embedded systems, the C2RTL (high-
level synthesis) technology helps the designer to greatly reduce
time-to-market, while satisfying the performance and cost con-
straints. To attack the performance challenges in complex designs,
we propose a FIFO-connected hierarchical approach to replace
the traditional flatten one in stream applications. Furthermore,
we develop an analytical algorithm to find the optimal FIFO
capacity to connect multiple modules efficiently. Finally, we prove
the advantages of the proposed method and the feasibility of our
algorithm in seven real applications. Experimental results show
that the hierarchical approach can have an up to 10.43 times
speedup compared to the flatten design, while our analytical
FIFO sizing algorithm shrinks design time from hours to seconds
with the same accuracy compared to the simulation based
approach.
I. I NTRODUCTION
With continuous scaling down of CMOS technology, the
gap between design productivity and transistor resources be-
comes ever larger. To resolve the challenge, design community
is seeking a higher abstraction rather than register transfer
level(RTL). Furthermore, in modern SoC designs, extensive
use of embedded processors, huge silicon capacity, reuse of be-
havior IPs, extensive adoption of accelerators and more time-
to-market pressure are needed. Compared with the traditional
RTL approach, the C2RTL flow provides magnitudes of im-
provements in productivity to better meet those requirements.
Recently, people observed a rapid rising demand for the high
quality C language to RTL (C2RTL) tools [1].
In reality, designers have successfully developed various
applications using C2RTL tools with much shorter design time,
such as face detection [2], 3G/4G wireless communication [3],
digital video broadcasting [4] and so on. Among those tools,
many [5]–[8] are focusing on stream applications. They create
design architectures including different modules connected
by first-in first-out (FIFO) channels. There are some other
tools focusing on general purpose applications. For example,
Catapult C [9] takes different timing and area constraints to
generate Pareto-optimal solutions from common C algorithms.
However, little control on the architecture leads to suboptimal
results. As [10] has shown, FIFO-connected architecture can
generate much faster and smaller results in stream applications.
Among C2RTL tools for stream applications, GAUT [5]
transforms C functions into pipelined modules consisting of
processing units, memory units, and communication units.
Global asynchronous local synchronous interconnections are
adopted to connect different modules with multiple clocks.
ROCCC [6] can create efficient pipelined circuits from C to
be re-used in other modules or system codes. Impulse C [7]
0
This work was supported in part by the NSFC under grant 60976032,
National Science and Technology Major Project under contract 2010ZX03006-
003-01, eXCite software donated from Y Explorations Inc. and High-Tech
Research and Development (863) Program under contract 2009AA01Z130.
provides a C language extension to define parallel processes
and communication channels among modules. ASC [8] pro-
vides a design environment for users to optimize systems from
algorithm level to gate level, all within the same C++ program.
All above tools leave the user to determine the FIFO
capacity between modules, which is nontrivial. As shown in
Section II, the FIFO capacity has a great impact on the system
performance and memory resources. Though determining the
FIFO capacity via extensive simulations may work for several
modules, the exploration space will become prohibitive large
in the multiple-module case. Therefore, previous simulation-
based method is neither time-efficient nor optimal.
To design a stream application, researchers also had inves-
tigated on the input stream rates to make sure that the FIFO
between PEs will not overflow, while the real-time processing
requirements are met. On-chip traffic analysis of the SoC ar-
chitecture had been explored [11]. However, their simulation-
based approaches suffer from a long executing time and fail
in exploring large design space. A mathematical framework of
rate analysis for stream applications have been proposed [12].
Based on the network calculus, [13] extended the service
curves to show how to shape an input stream to meet buffer
constraints. Furthermore, [14] discussed the generalized rate
analysis for multimedia processing platforms. However, all
of them adopts a more complicated behavior model for PE
streams, which is not necessary in the hierarchical C2RTL
framework.
This paper proposed a novel C2RTL framework, which
supports a hierarchical way to implement complex stream
applications and determines the FIFO capacity automatically.
It is noted that this framework may be applicable to other
applications, but it has the best improvement in stream ap-
plications. Our contributions are listed as below: 1) Unlike
treating the whole algorithm as one module in the flatten
design, we cut the complex stream algorithm into modules and
connect them with FIFOs. Experimental results showed that
the hierarchical implementation provides 10.43 times speedup
compared to the flatten design. 2) We formulate the parameters
of modules in stream applications and give out analytical
results for the optimal FIFO capacity in two-module case,
which is validated by exhaustive simulations. Furthermore, we
develop a heuristic algorithm to find the optimal FIFO capacity
in a multiple-module design. 3) We demonstrate the proposed
method in seven real applications. Compared to the uniform
FIFO capacity, our method can save memory resources by
14.46 times. Furthermore, the algorithm can optimize FIFO
capacity in seconds, while extensive simulations may need
hours.
The rest of the paper is organized as follows. Section II
describes the motivation of our work. We present our model
framework in Section III. The algorithm for optimal FIFO size
978-1-4673-0772-7/12/$31.00 ©2012 IEEE
2A-4
133