Visualization Tool for Development of Communication Algorithms and a Case Study Using the K Computer Syunji Yazaki The University of Electro-Communications Tokyo, Japan Email: yazaki.syunji@uec.ac.jp Ryohei Suzuki Cresco Ltd. Tokyo, Japan Email: r-suzuki@cresco.co.jp Fumiyoshi Shoji RIKEN Advanced Institute for Computational Science Hyogo, Japan Email: shoji@riken.jp Kenichi Miura Fujitsu Limited Kanagawa, Japan Email: k.miura@jp.fujitsu.com Hiroaki Ishihata Tokyo University of Technology Tokyo, Japan Email: ishihata@stf.teu.ac.jp Abstract—In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assists the development of collective communication algorithms. We also present visualiza- tion results as a case study. CLV visualizes information regarding node events and network statistics in linked multiple views. CLV also has a function for analyzing the results obtained from network simulators and actual machines in the same framework, which is useful when developers repeatedly test their algorithms on a simulator and an actual system. For a case study, we visually evaluated two all-to-all algorithms on the full system of the K computer that has 82,944 nodes. As a result, we conﬁrmed that an optimized all-to-all algorithm implemented for the K computer performed better than an all-to-all implemented in Open MPI. We also conﬁrmed that the barrier operation used in the K computer’s Message Passing Interface (MPI) functions keep link utilization high. However, there is also a trade-off between the number of barriers and link utilization. Keywords–Visualization; Mesh/Torus network; All-to-all I. I NTRODUCTION Parallel application programmers frequently utilize collec- tive communications implemented in Message Passing Inter- face (MPI) libraries to design applications. Collective commu- nications usually produce a large number of communications, especially on the peta-scale parallel systems that consist of tens of thousands of nodes. In the applications running on such large systems, communication takes longer than computation. Optimizing communication algorithms is an important means of maximizing the performance of parallel applications [1]. Many parallel systems listed in the Top500 [2], such as the Cray XK7 [3], Blue Gene/Q [4], and K computer [5], employ mesh/torus topology. Mesh/torus topology generally provides better scalability with respect to hardware cost. However, the bisection bandwidth is relatively narrow compared to that of other topologies, such as Fattree [6] and Dragonﬂy [7]. Visualization tools that abstract and visualize communi- cation behavior are necessary tools for optimizing communi- cation algorithms [8]. Developers repeatedly test communi- cation algorithms under development on network simulators and actual systems to ﬁnd potential areas for optimization. The test results are usually obtained as huge logﬁles and extensive numerical data. Looking at the logﬁles and numerical data alone, it is difﬁcult to determine potential areas for optimization. We brieﬂy presented our visualization tool, the Communi- cation Log Viewer (CLV), that supports the design of collective communication algorithms in [9]. Our tool has a function that visualizes both the results obtained from a network simulator and an actual system in the same framework. Our tool also visualizes both events that occur in the node and statistics regarding trafﬁc in the network simultaneously with linked multiple views. This enables the user to distinguish quickly which events in the nodes correspond to which congested network links. In this paper, we describe the details of CLV that can visualize both the node events and network statistics. We also show a visual evaluation of all-to-all on a full system of the K computer that has 82,944 nodes. In the rest of this paper, Section II explains the workﬂow for developing communication algorithms and Section III presents related work. Section IV then describes the features of CLV. Section V provides a case study, and Section VI concludes the paper. II. WORKFLOW AND REQUIREMENTS FOR VISUALIZATION A workﬂow to develop collective communication algo- rithms involves the following steps. 1) Designing an algorithm that takes into account the net- work architecture of the target system 2) Testing the algorithm on a network simulator and gener- ating simulation logﬁles 3) Analyzing and evaluating the behavior and efﬁciency of the algorithm based on the information in the logﬁles 4) Implementing the algorithm on the target system, if the algorithm has achieved the expected performance in the simulation 5) Evaluating the algorithm based on logﬁles and numerical data obtained from the performance counters of the target system Considering this workﬂow, the following functions are needed in visualization tools: R1 Visualizing the simultaneous network statistics and node events with concise association R2 Mapping the information to the actual network structure of the target system R3 Showing the information in multiple linked views R4 Displaying concise information by ﬁltering R5 Supporting the outputs obtained both from simulators and actual systems in the same framework 54 Copyright (c) IARIA, 2015. ISBN: 978-1-61208-389-6 FUTURE COMPUTING 2015 : The Seventh International Conference on Future Computational Technologies and Applications