Future Generation Computer Systems 25 (2009) 489–498 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Performance analysis of deterministically-routed bi-directional torus with non-uniform traffic distribution S. Loucif a, , M. Ould-Khaoua b a Faculty of Engineering, Moncton University, Moncton, N.B, E1A 3E9, Canada b Department of Electrical & Computer Engineering, Sultan Qaboos University, P.O. Box 50 Muscat 123, Oman article info Article history: Received 6 March 2008 Received in revised form 14 October 2008 Accepted 19 October 2008 Available online 25 October 2008 Keywords: Bi-directional torus Deterministic routing Hot-spot Performance modelling Message latency abstract Existing research has most often relied on simulation and considered the uniform traffic distribution when investigating the performance properties of multicomputer networks (e.g. the torus). However, there are numerous parallel applications that generate non-uniform traffic patterns, such as hot-spot. Furthermore, much more attention has been paid to capturing the impact of non-uniform traffic on network performance, resulting in the development of a number of analytical models for predicting message latency in the presence of hot-spots in the network. For instance, analytical models have been reported for the adaptively-routed torus with uni-directional as well as bi-directional channels. However, models for the deterministically-routed torus have considered uni-directional channels only. In an effort to fill in this gap, this paper describes an analytical model for the deterministically-routed torus with bi- directional channels when subjected to hot-spot traffic. The modelling approach adopted for deterministic routing is totally different from that for adaptive routing due to the inherently different nature of the two types of routing. The validity of the model is demonstrated by comparing analytical results against those obtained through extensive simulation experiments. © 2008 Elsevier B.V. All rights reserved. 1. Introduction Interconnection networks, which are constructed from routers and channels, are one of the key components in the architecture of parallel machines, since the performance of those machines highly depends on the efficiency of their underlying networks. Several network topologies have been proposed in the literature, nevertheless, k-ary n-cube and its variants namely hypercubes and torus have been the most popular networks, used in the implementation of many parallel machines [3,10,17,18,21], due to their desirable properties, such as ease of implementation, recursive structures, and ability to exploit communication locality to reduce message latency. Network performance can be affected by several parameters such as topology, switching technique, routing strategy and traffic distribution. Wormhole routing [6,16] has been very popular switching technique in recent generations of parallel machines due to its low buffering requirements at routers, a reason which makes it a good candidate for On-Chip-Networks [14], and more importantly it makes message latency insensitive to the distance under low traffic loads. In this technique, a message is divided into flits, each of a few Corresponding author. E-mail address: samia_loucif@hotmail.com (S. Loucif). bytes, for transmission and flow control. The header flit establishes the route and the remaining data flits follow in a pipelined fashion. When the header gets blocked due to unavailability of a channel, the other flits are blocked in-situ. Many routing algorithms have been suggested for wormhole routed k-ary n-cubes and can be widely classified as determinis- tic [6] or adaptive [8]. In deterministic routing messages always use the same path between a given pair of nodes, while in adaptive routing more flexibility is given to messages to choose their path in the network, avoiding congested regions and thereby reducing their latency. However, this flexibility is achieved at the expense of complex router hardware [2] in order to guarantee deadlock- freedom, due to the time to decide a route and the use of virtual channels; a virtual channel [4] has its own flit queue, but shares the bandwidth of the physical channel with other virtual channels in a time-multiplexed fashion. Moreover, authors in [25] have shown that under realistic traffic patterns generated by typical parallel applications the performance advantages of deterministic routing can even approach those of adaptive routing without requiring complex routers. Analytical modelling represents a cost-effective tool to in- vestigate the performance of systems under different network conditions, which may not be feasible using simulation due to the excessive computation demands. Several analytical models have been proposed for different systems [1,4,5,7,9,13,15,19,20,23,24, 27], including models of both deterministic and adaptive routing in 0167-739X/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2008.10.008