An Analytical Model of k-Ary n-Cube under Spatial Communication Locality
HU Kai, WANG Zhe
School of Computer Science and Engineering, Beihang University
Beijing, 100083, China
Email:{wangzhe@cse.buaa.edu.cn, hukai@buaa.edu.cn}
Abstract— The method of spatial communication locality is
adopted in many real parallel programs. But as we know, the
definition of spatial communication locality is not consistent
among existing analytical models and its impacts to latency and
throughput have not been reported systematically. K-ary n-cube
has been widely used in practical parallel computers, which
supports communication locality well. In this paper, we use
binary parameters— local message fraction and local domain’s
radius to describe spatial locality. Then we give an analytical
model of k-ary n-cube under spatial communication locality by
M/G/1 queuing model, and the situation that a message’s length is
less than the network radius in wormhole switching is considered.
The results from simulation show close agreement with our
analytical model.
Keywords— k-ary n-cube; spatial communication locality;
wormhole; virtual channel; M/G/1 queuing model.
1. Introduction and Preliminaries
The performance of a parallel computer greatly depends on
the efficiency of its interconnection networks. K-ary n-cube is
one of the most popular networks in parallel computing, dues
to its ease of implementation, recursive structure and ability to
exploit communication locality to reduce message latency. K-
ary n-cube has been used in many supercomputers
[1][4][12]
, and
its analytical models have been reported a lot, such as
literatures [2-3, 5-7, 9, 11, 14-18]. Dally
[6]
has shown that
under the constant wiring density constraint, the 2 and 3
dimensional torus outperform the hypercube; literature [2-
3][10][17] have considered communication locality; paper [2]
focuses on Mesh’s performance under different workloads by
mean value analysis technique; a latency comparison between
Torus and Mesh has been carried out in [9]; paper [7] gives a
comprehensive analytical model of wormhole switching by
M/G/1 queueing theory; a mathematical model of virtual
channels is proposed in [16] and [18], the authors also give an
easier method to predict average message latency; message
latency of Torus under hot-spot traffic is studied in [14-15] .
Nevertheless, as we known, the existing analytical models
didn’t consider two questions deeply: (1) The characterization
of spatial communication locality is not consistent in different
models, each of them just gives partial result on the
relationship between spatial communication locality and
network performance (such as latency and throughput). (2)
most analytical models of wormhole switching supposed that
the message length (in flits) is greater than the network radius
k/2, but this assumption is not accurate in some situations.
The parallel computer’s network can be characterized by
four elements: topology, routing algorithm, switching strategy
and flow control mechanism. Our analytical model considers k-
ary n-cube network topology with dimension-ordered routing,
wormhole switching and virtual channel flow control. These
technique details can be found in [8] and many other research
papers.
Moreover, the traffic loads of real parallel applications have
strong effects on the performance of interconnection networks,
too. In an analytical model, the definition of workload is the
approximation of the real traffic load. One workload model
should contain three elements: traffic pattern, message length
distribution and message arrival pattern, in which the traffic
pattern describes the destination node’s distribution. Basic
traffic patterns include uniform-random traffic, fixed
source/destination nodes distribution, communication locality,
hot-spot communication, etc. Communication locality is an
important method to optimize parallel application’s
performance, it has two aspects: temporal locality and spatial
locality. The message’s length and generate rate are also
significant to the analytical model. Most researchers assume
fixed message’s length and traffic follows the Poisson arrival
process. Kim and Chien
[11]
has indicated that a bimodal
message length model is more close to real parallel program’s
communication, the study of non-Poisson arrival process such
as batch message arrival is proposed in [20].
2. Analytical Model under Communication Locality
2.1 Assumptions and Definitions
The proposed model in this paper is based on the following
assumptions that are widely used in the similar researches:
• The nodes generate messages independently of each
other, following a Poisson process with a mean rate
g
λ
messages/node/cycle. For each channel, the message
arrival process can be approximated by an independent
Poisson process, too.
• Messages have a fixed length (M flits), the time a flit
passes a channel is one cycle.
• The injection channel in the source node has infinite
input buffers, thus the inject queue in the source node
has infinite capacity to hold the new messages.
• Each physical channel is shared by V virtual channels
and the organization of virtual channels follows the
definition in [18].
The message latency and throughput are critical indices for
network performance. In wormhole switching, a message’s
latency is the time from it generates in source node to its tail
flit arrivals to destination node. Throughput is defined as the
maximal information inject rate in source nodes.
2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops
978-0-7695-4019-1/10 $26.00 © 2010 IEEE
DOI 10.1109/WAINA.2010.30
24
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on June 13,2010 at 09:17:15 UTC from IEEE Xplore. Restrictions apply.