Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West and Simon Moore Computer Laboratory, University of Cambridge William Gates Building, JJ Thomson Avenue, Cambridge CB3 0FD, UK Robert.Mullins@cl.cam.ac.uk Abstract The on-chip communication requirements of many systems are best served through the deployment of a regular chip-wide network. This paper presents the design of a low-latency on-chip network router for such applications. We remove control overheads (routing and arbitration logic) from the critical path in order to minimise cycle-time and latency. Simulations illustrate that dramatic cycle time improvements are possible without compromising router efficiency. Furthermore, these reductions permit flits to be routed in a single cycle, maximising the effectiveness of the router’s limited buffering resources. 1. Introduction The ability to fully exploit modern fabrication technologies is tempered by both physical and logical design complexity. The cost of this complexity suggests the reuse of design and verification effort wherever possible. This is often achieved by composing systems from commodity IP or by reusing custom blocks repeatedly in the same design. The relatively poor scaling of global interconnects and the need to achieve architectural performance gains in an energy-efficient manner, provide pressure to decentralise computation. Together these trends suggest a move towards an increasingly communication-centric view of processor and system architecture [16, 21, 15, 14]. One proposed solution to the problem of chip-wide communication is a network of top-level point-to-point communication channels [1, 8, 12] (See Figure 1). This highly regular wiring strategy aims to reuse a small number of highly optimised wiring layout and driver designs. As channel layouts are reused to create the network, effort in characterising delay, power and verifying signal integrity is minimised. The simple behaviour of the network also aids in predicting performance and ensuring correctness. In contrast, large bus based communication networks present a complex verification task at every level. In addition, the limited ability to scale interconnect delays makes the presence of long global wires and buses increasingly undesirable. Physical Channel Router Tile Figure 1. On-Chip Network. Each tile may contain identical logic, as in the case of a multiprocessor or tiled system, or simply represent a partitioning of a SoC design. Similar observations have already been made in the case of inter-chip and wider-area communication. While much of this work is applicable, some important differences exist [8]. In particular, on-chip designs exploit a far greater number of pins and wires, while inter-chip designs are often pin limited. In addition, while inter-chip router designs may exploit a large number of buffers, on-chip designs must aim to minimise buffer count in order to maximise the silicon real-estate available for computation. Area pressures, together with the need to minimise on-chip communication latencies, suggest the implementation of relatively simple on-chip routers. This paper describes how router latency may be significantly reduced by hiding control overheads. The Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04) 1063-6897/04 $ 20.00 © 2004 IEEE