Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications Kishwar Ahmed, Jason Liu Florida International University Email: {kahme006,liux}@cis.fiu.edu Stephan Eidenbenz, Joe Zerr Los Alamos National Laboratory Email: {eidenben,rzerr}@lanl.gov Abstract—Performance Prediction Toolkit (PPT) is a simulator mainly developed at Los Alamos National Laboratory to facilitate rapid and accurate performance prediction of large-scale scien- tific applications on existing and future HPC architectures. In this paper, we present three interconnect models for performance prediction of large-scale HPC applications. They are based on interconnect topologies widely used in HPC systems: torus, dragonfly, and fat-tree. We conduct extensive validation tests of our interconnect models, in particular, using configurations of existing HPC systems. Results show that our models provide good accuracy for predicting the network behavior. We also present a performance study of a parallel computational physics application to show that our model can accurately predict the parallel behavior of large-scale applications. Index Terms—Modeling and simulation; High-performance computing; Interconnection network; Performance evaluation I. I NTRODUCTION Recent years have witnessed dramatic changes in High- performance Computing (HPC) to accommodate the increasing computational demand of scientific applications. New archi- tectural changes, including the rapid growth of multi-core and many-core systems, deeper memory hierarchies, complex inter- connection fabrics that facilitate more efficient data movement for massive-scale scientific applications, have complicated the design and implementation of the HPC applications. Translating architectural advances to application performance improvement may involve delicate changes to sophisticated algorithms, to include new programming structures, different data layouts, more efficient buffer management and cache-effective methods, and alternative parallel strategies, which typically require highly skilled software architects and domain scientists. Modeling and simulation plays a significant role, in identify- ing performance issues, evaluating design choices, performing parameter tuning, and answering what-if questions. It is thus not surprising that there exists today a large body of literature in HPC modeling and simulation, ranging from coarse-level models of full-scale systems, to cycle-accurate simulations of individual components (such as processors, cache, memory, networks, and I/O systems), to analytical approaches. We note, however, that none of the existing methods is capable of modeling a full-scale HPC architecture running large scientific applications in detail. To do so would be both unrealistic and unnecessary. Today’s supercomputers are rapidly approaching exascale. Modeling and simulation needs to address important questions related to the performance of parallel applications on existing and future HPC systems at similar scale. Although a cycle-accurate model may render good fidelity for a specific component of the system (such as a multi-core processor) and a specific time scale (such as within a microsecond), the model cannot be naturally extended to handle arbitrarily larger systems or longer time durations. Partially this is due to the computational complexity of the models (both spatial and temporal). More importantly, no existing models are known capable of capturing the entire system’s dynamics in detail. HPC applications are written in specific programming languages; they interact with other software modules, libraries and operating systems, which in turn interact with underlying resources for processing, data access, and I/O. Any uncertainties involved with the aforementioned hardware and software components (e.g., a compiler-specific library) can introduce significant modeling errors, which may undermine the fidelity achieved by the cycle-accurate models for each specific component. George Box, a statistician, once said: “All models are wrong but some are useful.” In order to support full-system simulation, we must raise the level of modeling abstractions. Conceptually, we can adopt an approach, called “selective refinement codesign modeling”, where we begin with both architecture and application models at coarse level, gradually refine the models with potential performance bottlenecks, and eventually stop at models sufficient to answer the specific research questions. This iterative process is based on the assumption that we can identify performance issues from the models in a timely manner. To do so, we need to develop methods that facilitate rapid and yet accurate assessment and performance prediction of large-scale scientific applications on current and future HPC architectures. We set out to design and develop a simulator, called the Performance Prediction Toolkit (PPT). Four major aspects distinguish our effort from other existing approaches. First, our simulator needs to easily integrate large-scale applications (especially, computational physics code) with full-scale archi- tecture models (processors, memory/cache, interconnect, and so on). Second, our simulator must be able to combine selected models of various components, potentially at different levels of modeling abstraction, providing a trade-off between the computational demand of the simulator and the accuracy of the models. Third, the simulator needs to adopt a minimalistic approach in order to achieve a short development cycle. It