Visualization Tool for Development of Communication Algorithms and a Case Study Using the K Computer Syunji Yazaki The University of Electro-Communications Tokyo, Japan Email: yazaki.syunji@uec.ac.jp Ryohei Suzuki Cresco Ltd. Tokyo, Japan Email: r-suzuki@cresco.co.jp Fumiyoshi Shoji RIKEN Advanced Institute for Computational Science Hyogo, Japan Email: shoji@riken.jp Kenichi Miura Fujitsu Limited Kanagawa, Japan Email: k.miura@jp.fujitsu.com Hiroaki Ishihata Tokyo University of Technology Tokyo, Japan Email: ishihata@stf.teu.ac.jp Abstract—In this paper, we introduce our visualization tool, the Communication Log Viewer (CLV), that assists the development of collective communication algorithms. We also present visualiza- tion results as a case study. CLV visualizes information regarding node events and network statistics in linked multiple views. CLV also has a function for analyzing the results obtained from network simulators and actual machines in the same framework, which is useful when developers repeatedly test their algorithms on a simulator and an actual system. For a case study, we visually evaluated two all-to-all algorithms on the full system of the K computer that has 82,944 nodes. As a result, we confirmed that an optimized all-to-all algorithm implemented for the K computer performed better than an all-to-all implemented in Open MPI. We also confirmed that the barrier operation used in the K computer’s Message Passing Interface (MPI) functions keep link utilization high. However, there is also a trade-off between the number of barriers and link utilization. Keywords–Visualization; Mesh/Torus network; All-to-all I. I NTRODUCTION Parallel application programmers frequently utilize collec- tive communications implemented in Message Passing Inter- face (MPI) libraries to design applications. Collective commu- nications usually produce a large number of communications, especially on the peta-scale parallel systems that consist of tens of thousands of nodes. In the applications running on such large systems, communication takes longer than computation. Optimizing communication algorithms is an important means of maximizing the performance of parallel applications [1]. Many parallel systems listed in the Top500 [2], such as the Cray XK7 [3], Blue Gene/Q [4], and K computer [5], employ mesh/torus topology. Mesh/torus topology generally provides better scalability with respect to hardware cost. However, the bisection bandwidth is relatively narrow compared to that of other topologies, such as Fattree [6] and Dragonfly [7]. Visualization tools that abstract and visualize communi- cation behavior are necessary tools for optimizing communi- cation algorithms [8]. Developers repeatedly test communi- cation algorithms under development on network simulators and actual systems to find potential areas for optimization. The test results are usually obtained as huge logfiles and extensive numerical data. Looking at the logfiles and numerical data alone, it is difficult to determine potential areas for optimization. We briefly presented our visualization tool, the Communi- cation Log Viewer (CLV), that supports the design of collective communication algorithms in [9]. Our tool has a function that visualizes both the results obtained from a network simulator and an actual system in the same framework. Our tool also visualizes both events that occur in the node and statistics regarding traffic in the network simultaneously with linked multiple views. This enables the user to distinguish quickly which events in the nodes correspond to which congested network links. In this paper, we describe the details of CLV that can visualize both the node events and network statistics. We also show a visual evaluation of all-to-all on a full system of the K computer that has 82,944 nodes. In the rest of this paper, Section II explains the workflow for developing communication algorithms and Section III presents related work. Section IV then describes the features of CLV. Section V provides a case study, and Section VI concludes the paper. II. WORKFLOW AND REQUIREMENTS FOR VISUALIZATION A workflow to develop collective communication algo- rithms involves the following steps. 1) Designing an algorithm that takes into account the net- work architecture of the target system 2) Testing the algorithm on a network simulator and gener- ating simulation logfiles 3) Analyzing and evaluating the behavior and efficiency of the algorithm based on the information in the logfiles 4) Implementing the algorithm on the target system, if the algorithm has achieved the expected performance in the simulation 5) Evaluating the algorithm based on logfiles and numerical data obtained from the performance counters of the target system Considering this workflow, the following functions are needed in visualization tools: R1 Visualizing the simultaneous network statistics and node events with concise association R2 Mapping the information to the actual network structure of the target system R3 Showing the information in multiple linked views R4 Displaying concise information by filtering R5 Supporting the outputs obtained both from simulators and actual systems in the same framework 54 Copyright (c) IARIA, 2015. ISBN: 978-1-61208-389-6 FUTURE COMPUTING 2015 : The Seventh International Conference on Future Computational Technologies and Applications