Low-Latency Virtual-Channel Routers for On-Chip Networks
Robert Mullins, Andrew West and Simon Moore
Computer Laboratory, University of Cambridge
William Gates Building, JJ Thomson Avenue, Cambridge CB3 0FD, UK
Robert.Mullins@cl.cam.ac.uk
Abstract
The on-chip communication requirements of many
systems are best served through the deployment of a regular
chip-wide network. This paper presents the design of a
low-latency on-chip network router for such applications.
We remove control overheads (routing and arbitration
logic) from the critical path in order to minimise cycle-time
and latency. Simulations illustrate that dramatic cycle time
improvements are possible without compromising router
efficiency. Furthermore, these reductions permit flits to be
routed in a single cycle, maximising the effectiveness of the
router’s limited buffering resources.
1. Introduction
The ability to fully exploit modern fabrication
technologies is tempered by both physical and logical
design complexity. The cost of this complexity suggests
the reuse of design and verification effort wherever
possible. This is often achieved by composing systems
from commodity IP or by reusing custom blocks repeatedly
in the same design. The relatively poor scaling of global
interconnects and the need to achieve architectural
performance gains in an energy-efficient manner,
provide pressure to decentralise computation. Together
these trends suggest a move towards an increasingly
communication-centric view of processor and system
architecture [16, 21, 15, 14].
One proposed solution to the problem of chip-wide
communication is a network of top-level point-to-point
communication channels [1, 8, 12] (See Figure 1). This
highly regular wiring strategy aims to reuse a small number
of highly optimised wiring layout and driver designs. As
channel layouts are reused to create the network, effort in
characterising delay, power and verifying signal integrity
is minimised. The simple behaviour of the network also
aids in predicting performance and ensuring correctness. In
contrast, large bus based communication networks present
a complex verification task at every level. In addition,
the limited ability to scale interconnect delays makes
the presence of long global wires and buses increasingly
undesirable.
Physical Channel
Router
Tile
Figure 1. On-Chip Network. Each tile may contain
identical logic, as in the case of a multiprocessor or
tiled system, or simply represent a partitioning of a SoC
design.
Similar observations have already been made in the case
of inter-chip and wider-area communication. While much
of this work is applicable, some important differences
exist [8]. In particular, on-chip designs exploit a far greater
number of pins and wires, while inter-chip designs are
often pin limited. In addition, while inter-chip router
designs may exploit a large number of buffers, on-chip
designs must aim to minimise buffer count in order to
maximise the silicon real-estate available for computation.
Area pressures, together with the need to minimise on-chip
communication latencies, suggest the implementation of
relatively simple on-chip routers.
This paper describes how router latency may be
significantly reduced by hiding control overheads. The
Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04)
1063-6897/04 $ 20.00 © 2004 IEEE