An Analytical Model of k-Ary n-Cube under Spatial Communication Locality HU Kai, WANG Zhe School of Computer Science and Engineering, Beihang University Beijing, 100083, China Email:{wangzhe@cse.buaa.edu.cn, hukai@buaa.edu.cn} Abstract— The method of spatial communication locality is adopted in many real parallel programs. But as we know, the definition of spatial communication locality is not consistent among existing analytical models and its impacts to latency and throughput have not been reported systematically. K-ary n-cube has been widely used in practical parallel computers, which supports communication locality well. In this paper, we use binary parameters— local message fraction and local domain’s radius to describe spatial locality. Then we give an analytical model of k-ary n-cube under spatial communication locality by M/G/1 queuing model, and the situation that a message’s length is less than the network radius in wormhole switching is considered. The results from simulation show close agreement with our analytical model. Keywords— k-ary n-cube; spatial communication locality; wormhole; virtual channel; M/G/1 queuing model. 1. Introduction and Preliminaries The performance of a parallel computer greatly depends on the efficiency of its interconnection networks. K-ary n-cube is one of the most popular networks in parallel computing, dues to its ease of implementation, recursive structure and ability to exploit communication locality to reduce message latency. K- ary n-cube has been used in many supercomputers [1][4][12] , and its analytical models have been reported a lot, such as literatures [2-3, 5-7, 9, 11, 14-18]. Dally [6] has shown that under the constant wiring density constraint, the 2 and 3 dimensional torus outperform the hypercube; literature [2- 3][10][17] have considered communication locality; paper [2] focuses on Mesh’s performance under different workloads by mean value analysis technique; a latency comparison between Torus and Mesh has been carried out in [9]; paper [7] gives a comprehensive analytical model of wormhole switching by M/G/1 queueing theory; a mathematical model of virtual channels is proposed in [16] and [18], the authors also give an easier method to predict average message latency; message latency of Torus under hot-spot traffic is studied in [14-15] . Nevertheless, as we known, the existing analytical models didn’t consider two questions deeply: (1) The characterization of spatial communication locality is not consistent in different models, each of them just gives partial result on the relationship between spatial communication locality and network performance (such as latency and throughput). (2) most analytical models of wormhole switching supposed that the message length (in flits) is greater than the network radius k/2, but this assumption is not accurate in some situations. The parallel computer’s network can be characterized by four elements: topology, routing algorithm, switching strategy and flow control mechanism. Our analytical model considers k- ary n-cube network topology with dimension-ordered routing, wormhole switching and virtual channel flow control. These technique details can be found in [8] and many other research papers. Moreover, the traffic loads of real parallel applications have strong effects on the performance of interconnection networks, too. In an analytical model, the definition of workload is the approximation of the real traffic load. One workload model should contain three elements: traffic pattern, message length distribution and message arrival pattern, in which the traffic pattern describes the destination node’s distribution. Basic traffic patterns include uniform-random traffic, fixed source/destination nodes distribution, communication locality, hot-spot communication, etc. Communication locality is an important method to optimize parallel application’s performance, it has two aspects: temporal locality and spatial locality. The message’s length and generate rate are also significant to the analytical model. Most researchers assume fixed message’s length and traffic follows the Poisson arrival process. Kim and Chien [11] has indicated that a bimodal message length model is more close to real parallel program’s communication, the study of non-Poisson arrival process such as batch message arrival is proposed in [20]. 2. Analytical Model under Communication Locality 2.1 Assumptions and Definitions The proposed model in this paper is based on the following assumptions that are widely used in the similar researches: The nodes generate messages independently of each other, following a Poisson process with a mean rate g λ messages/node/cycle. For each channel, the message arrival process can be approximated by an independent Poisson process, too. Messages have a fixed length (M flits), the time a flit passes a channel is one cycle. The injection channel in the source node has infinite input buffers, thus the inject queue in the source node has infinite capacity to hold the new messages. Each physical channel is shared by V virtual channels and the organization of virtual channels follows the definition in [18]. The message latency and throughput are critical indices for network performance. In wormhole switching, a message’s latency is the time from it generates in source node to its tail flit arrivals to destination node. Throughput is defined as the maximal information inject rate in source nodes. 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops 978-0-7695-4019-1/10 $26.00 © 2010 IEEE DOI 10.1109/WAINA.2010.30 24 Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on June 13,2010 at 09:17:15 UTC from IEEE Xplore. Restrictions apply.