Scalable Interconnection Network Models for Rapid Performance Prediction of HPC Applications Kishwar Ahmed, Jason Liu Florida International University Email: {kahme006,liux}@cis.ﬁu.edu Stephan Eidenbenz, Joe Zerr Los Alamos National Laboratory Email: {eidenben,rzerr}@lanl.gov Abstract—Performance Prediction Toolkit (PPT) is a simulator mainly developed at Los Alamos National Laboratory to facilitate rapid and accurate performance prediction of large-scale scien- tiﬁc applications on existing and future HPC architectures. In this paper, we present three interconnect models for performance prediction of large-scale HPC applications. They are based on interconnect topologies widely used in HPC systems: torus, dragonﬂy, and fat-tree. We conduct extensive validation tests of our interconnect models, in particular, using conﬁgurations of existing HPC systems. Results show that our models provide good accuracy for predicting the network behavior. We also present a performance study of a parallel computational physics application to show that our model can accurately predict the parallel behavior of large-scale applications. Index Terms—Modeling and simulation; High-performance computing; Interconnection network; Performance evaluation I. I NTRODUCTION Recent years have witnessed dramatic changes in High- performance Computing (HPC) to accommodate the increasing computational demand of scientiﬁc applications. New archi- tectural changes, including the rapid growth of multi-core and many-core systems, deeper memory hierarchies, complex inter- connection fabrics that facilitate more efﬁcient data movement for massive-scale scientiﬁc applications, have complicated the design and implementation of the HPC applications. Translating architectural advances to application performance improvement may involve delicate changes to sophisticated algorithms, to include new programming structures, different data layouts, more efﬁcient buffer management and cache-effective methods, and alternative parallel strategies, which typically require highly skilled software architects and domain scientists. Modeling and simulation plays a signiﬁcant role, in identify- ing performance issues, evaluating design choices, performing parameter tuning, and answering what-if questions. It is thus not surprising that there exists today a large body of literature in HPC modeling and simulation, ranging from coarse-level models of full-scale systems, to cycle-accurate simulations of individual components (such as processors, cache, memory, networks, and I/O systems), to analytical approaches. We note, however, that none of the existing methods is capable of modeling a full-scale HPC architecture running large scientiﬁc applications in detail. To do so would be both unrealistic and unnecessary. Today’s supercomputers are rapidly approaching exascale. Modeling and simulation needs to address important questions related to the performance of parallel applications on existing and future HPC systems at similar scale. Although a cycle-accurate model may render good ﬁdelity for a speciﬁc component of the system (such as a multi-core processor) and a speciﬁc time scale (such as within a microsecond), the model cannot be naturally extended to handle arbitrarily larger systems or longer time durations. Partially this is due to the computational complexity of the models (both spatial and temporal). More importantly, no existing models are known capable of capturing the entire system’s dynamics in detail. HPC applications are written in speciﬁc programming languages; they interact with other software modules, libraries and operating systems, which in turn interact with underlying resources for processing, data access, and I/O. Any uncertainties involved with the aforementioned hardware and software components (e.g., a compiler-speciﬁc library) can introduce signiﬁcant modeling errors, which may undermine the ﬁdelity achieved by the cycle-accurate models for each speciﬁc component. George Box, a statistician, once said: “All models are wrong but some are useful.” In order to support full-system simulation, we must raise the level of modeling abstractions. Conceptually, we can adopt an approach, called “selective reﬁnement codesign modeling”, where we begin with both architecture and application models at coarse level, gradually reﬁne the models with potential performance bottlenecks, and eventually stop at models sufﬁcient to answer the speciﬁc research questions. This iterative process is based on the assumption that we can identify performance issues from the models in a timely manner. To do so, we need to develop methods that facilitate rapid and yet accurate assessment and performance prediction of large-scale scientiﬁc applications on current and future HPC architectures. We set out to design and develop a simulator, called the Performance Prediction Toolkit (PPT). Four major aspects distinguish our effort from other existing approaches. First, our simulator needs to easily integrate large-scale applications (especially, computational physics code) with full-scale archi- tecture models (processors, memory/cache, interconnect, and so on). Second, our simulator must be able to combine selected models of various components, potentially at different levels of modeling abstraction, providing a trade-off between the computational demand of the simulator and the accuracy of the models. Third, the simulator needs to adopt a minimalistic approach in order to achieve a short development cycle. It