Future Generation Computer Systems 25 (2009) 489–498
Contents lists available at ScienceDirect
Future Generation Computer Systems
journal homepage: www.elsevier.com/locate/fgcs
Performance analysis of deterministically-routed bi-directional torus with
non-uniform traffic distribution
S. Loucif
a,∗
, M. Ould-Khaoua
b
a
Faculty of Engineering, Moncton University, Moncton, N.B, E1A 3E9, Canada
b
Department of Electrical & Computer Engineering, Sultan Qaboos University, P.O. Box 50 Muscat 123, Oman
article info
Article history:
Received 6 March 2008
Received in revised form
14 October 2008
Accepted 19 October 2008
Available online 25 October 2008
Keywords:
Bi-directional torus
Deterministic routing
Hot-spot
Performance modelling
Message latency
abstract
Existing research has most often relied on simulation and considered the uniform traffic distribution
when investigating the performance properties of multicomputer networks (e.g. the torus). However,
there are numerous parallel applications that generate non-uniform traffic patterns, such as hot-spot.
Furthermore, much more attention has been paid to capturing the impact of non-uniform traffic on
network performance, resulting in the development of a number of analytical models for predicting
message latency in the presence of hot-spots in the network. For instance, analytical models have been
reported for the adaptively-routed torus with uni-directional as well as bi-directional channels. However,
models for the deterministically-routed torus have considered uni-directional channels only. In an effort
to fill in this gap, this paper describes an analytical model for the deterministically-routed torus with bi-
directional channels when subjected to hot-spot traffic. The modelling approach adopted for deterministic
routing is totally different from that for adaptive routing due to the inherently different nature of the two
types of routing. The validity of the model is demonstrated by comparing analytical results against those
obtained through extensive simulation experiments.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Interconnection networks, which are constructed from routers
and channels, are one of the key components in the architecture
of parallel machines, since the performance of those machines
highly depends on the efficiency of their underlying networks.
Several network topologies have been proposed in the literature,
nevertheless, k-ary n-cube and its variants namely hypercubes
and torus have been the most popular networks, used in the
implementation of many parallel machines [3,10,17,18,21], due
to their desirable properties, such as ease of implementation,
recursive structures, and ability to exploit communication locality
to reduce message latency. Network performance can be affected
by several parameters such as topology, switching technique, routing
strategy and traffic distribution.
Wormhole routing [6,16] has been very popular switching
technique in recent generations of parallel machines due to its low
buffering requirements at routers, a reason which makes it a good
candidate for On-Chip-Networks [14], and more importantly it
makes message latency insensitive to the distance under low traffic
loads. In this technique, a message is divided into flits, each of a few
∗
Corresponding author.
E-mail address: samia_loucif@hotmail.com (S. Loucif).
bytes, for transmission and flow control. The header flit establishes
the route and the remaining data flits follow in a pipelined fashion.
When the header gets blocked due to unavailability of a channel,
the other flits are blocked in-situ.
Many routing algorithms have been suggested for wormhole
routed k-ary n-cubes and can be widely classified as determinis-
tic [6] or adaptive [8]. In deterministic routing messages always
use the same path between a given pair of nodes, while in adaptive
routing more flexibility is given to messages to choose their path
in the network, avoiding congested regions and thereby reducing
their latency. However, this flexibility is achieved at the expense
of complex router hardware [2] in order to guarantee deadlock-
freedom, due to the time to decide a route and the use of virtual
channels; a virtual channel [4] has its own flit queue, but shares the
bandwidth of the physical channel with other virtual channels in a
time-multiplexed fashion. Moreover, authors in [25] have shown
that under realistic traffic patterns generated by typical parallel
applications the performance advantages of deterministic routing
can even approach those of adaptive routing without requiring
complex routers.
Analytical modelling represents a cost-effective tool to in-
vestigate the performance of systems under different network
conditions, which may not be feasible using simulation due to the
excessive computation demands. Several analytical models have
been proposed for different systems [1,4,5,7,9,13,15,19,20,23,24,
27], including models of both deterministic and adaptive routing in
0167-739X/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.future.2008.10.008