Token Flow Control
Amit Kumar, Li-Shiuan Peh and Niraj K. Jha
Department of Electrical Engineering
Princeton University, Princeton, NJ 08544
Email: {amitk, peh, jha}@princeton.edu
Abstract
As companies move towards many-core chips, an efficient on-
chip communication fabric to connect these cores assumes crit-
ical importance. To address limitations to wire delay scalability
and increasing bandwidth demands, state-of-the-art on-chip
networks use a modular packet-switched design with routers
at every hop which allow sharing of network channels over
multiple packet flows. This, however, leads to packets going
through a complex router pipeline at every hop, resulting in
the overall communication energy/delay being dominated by the
router overhead, as opposed to just wire energy/delay.
In this work, we propose token flow control (TFC), a flow
control mechanism in which nodes in the network send out
tokens in their local neighborhood to communicate information
about their available resources. These tokens are then used
in both routing and flow control: to choose less congested
paths in the network and to bypass the router pipeline along
those paths. These bypass paths are formed dynamically, can
be arbitrarily long and, are highly flexible with the ability to
match to a packet’s exact route. Hence, this allows packets
to potentially skip all routers along their path from source
to destination, approaching the communication energy-delay-
throughput of dedicated wires. Our detailed implementation
analysis shows TFC to be highly scalable and realizable at an
aggressive target clock cycle delay of 21FO4 for large networks
while requiring low hardware complexity.
Evaluations of TFC using both synthetic traffic and traces
from the SPLASH-2 benchmark suite show reduction in packet
latency by up to 77.1% with upto 39.6% reduction in aver-
age router energy consumption as compared to a state-of-the-
art baseline packet-switched design. For the same saturation
throughput as the baseline network, TFC is able to reduce the
amount of buffering by 65% leading to a 48.8% reduction in
leakage energy and a 55.4% lower total router energy.
1. Introduction
The current trend in utilizing the growing number of tran-
sistors provided by each technology generation is to use a
modular design with several computation cores on the same
chip. As the number of such on-chip cores increases, a scalable
and high-bandwidth communication fabric to connect them
becomes critically important. As a result, packet-switched on-
chip networks are fast replacing buses and crossbars to emerge
as the pervasive communication fabric in both general-purpose
chip multi-processor (CMP) [1]–[3] as well as application-
specific system-on-a-chip (SoC) [4] domains.
Apart from providing scalable and high-bandwidth commu-
nication, on-chip networks are required to provide ultra-low
latency with an extremely constrained power envelope and a low
area budget. Most state-of-the-art packet-switched designs use
a complex router at every node to orchestrate communication,
and packets travel only a short distance on the link wires
before having to go through a complete router pipeline at every
intermediate hop along their path. As a result, communication
energy/delay in such networks is dominated by the router
overhead, in contrast to an ideal network where packet latency
and energy are solely due to the wires between the source and
destination. For instance, routers consume around 61% of the
average network power in the MIT Raw chip as opposed to 39%
consumed by the links [5]. Similarly, the Intel 80-core teraflops
chip has router power taking 83% of network power versus 17%
consumed by the links [3]. The large energy-delay-throughput
gap between the state-of-the-art packet-switched network and
the ideal interconnect of dedicated point-to-point wires was
pointed out in [6].
In this work, we propose TFC, a flow-control mechanism
which aims to deliver the energy-delay-throughput of dedicated
wires through the use of tokens. Tokens are indications of
resource availability in the network. Each node in the network
sends out tokens in its fixed local neighborhood of d
max
hops to
disseminate information about availability of resources, such as
buffers and virtual channels (VCs) at its input ports. Individual
packets then use these tokens during both routing – to find less
congested routes in chunks of up to d
max
hops, and flow control
– to bypass the router pipeline at intermediate nodes along these
d
max
-hop routes. When one such d
max
-hop token route ends,
another token route can be chained to it seamlessly without
any additional energy-delay overhead. Thus, packets can use
an arbitrary number of tokens to bypass all intermediate routers
between their source to destination, like that in an ideal network.
In the rest of this paper, Section 2 provides background for
this work by looking at router energy/delay overhead in state-of-
the-art packet-switched designs. This is followed by the working
of TFC in Section 3 and its implementation details in Section 4.
Evaluation results are presented in Section 5. Section 6 presents
related work while Section 7 concludes the paper.
2. Background
2.1. Baseline state-of-the-art router
Fig. 1(a) shows the microarchitecture of a state-of-the-art
baseline VC router used for comparison in all our experiments.
We assume a two-dimensional mesh topology for simplicity.
Flit-level buffering and on/off VC flow control [7] are used
to minimize the amount of buffering per router and hence its
area footprint. This design incorporates several features which
are critical to on-chip networks – low pipeline delay using
lookahead routing [8], speculation [11], [12], no-load bypassing
978-1-4244-2837-3/08/$25.00 ©2008 IEEE 342