SAFARI Technical Report No. 2011-008 (September 13, 2011) MinBD: A Minimally-Buffered Deflection Router Approaching Conventional Buffered-Router Performance Chris Fallin Gregory Nazario Xiangyao Yu cfallin@cmu.edu gnazario@cmu.edu yxythu@gmail.com Kevin Chang Rachata Ausavarungnirun Onur Mutlu kevincha@andrew.cmu.edu rausavar@ece.cmu.edu onur@cmu.edu Computer Architecture Lab (CALCM) Carnegie Mellon University SAFARI Technical Report No. 2011-008 September 13, 2011 Abstract As interconnect becomes an important component of modern Chip Multiprocessors (CMPs), significant work has gone into finding the best tradeoff between performance and energy efficiency for the Network-on-Chip (NoC). Increasing core counts lead to high bandwidth demands and tight power and area constraints. One recent line of work examines bufferless deflection routing, which eliminates buffers in the NoC and instead handles contention by deflecting (misrouting) traffic. While bufferless NoC design has shown promising area and power reductions, and offers similar performance to conventional buffered designs for many workloads, such designs provide lower throughput than conventional, more complex, buffered routers at high network load. This degradation is a significant hurdle for widespread adoption of bufferless NoCs. In this work, we introduce a new minimally-buffered deflection router design, MinBD (Minimally-Buffered De- flection), that aims to obtain the performance of buffered design with nearly the area and power reductions of buffer- less design. Unlike past routers that combine buffers and deflection, MinBD buffers only a fraction of the traffic that passes through it, and it decides which traffic to buffer on a fine-grained basis. We observe that a significant portion of the performance degradation in bufferless networks comes not from fundamental limits of deflection routing, but from small inefficiencies in previous bufferless designs that can be corrected: specifically, a bottleneck in ejecting traffic from the network, and inefficient deflection arbitration. The remaining performance gap can be closed by using small router buffers. Compared to previous bufferless deflection router designs, MinBD contributes (i) a router microarchi- tecture with a wide ejection path and improved flit arbitration relative to previous designs, and (ii) small side-buffers to hold some traffic that would have otherwise been deflected. We show that MinBD degrades performance only 4.6% from a conventional buffered network over a set of 60 network-intensive workloads in 4x4 networks, with 72.5% less network power on average, 80.0% less router area, and a competitive clock frequency. 1 Introduction Interconnect is a first-order component of current and future multicore and manycore CMPs (Chip Multiprocessors). As manycore designs scale up, packet-switched interconnects replace conventional buses or crossbars [7]. Such net- works – Networks-on-Chip, or NoCs – transfer packets, consisting of one or multiple flits, between cores on a chip. As manycore chips incorporate increasing numbers of cores, on-chip accelerators and other components, the bandwidth demand on the network increases. The NoC’s design can thus become critical for system performance. Many current commercial CMPs use ring topologies [14, 27, 28], but several have moved to 2D-mesh NoCs (e.g., Tilera [34]), and 1