Hybrid Performance Modeling and Prediction of Large-Scale Computing Systems Sabri Pllana and Siegfried Benkner Institute of Scientific Computing Faculty of Computer Science University of Vienna Nordbergstrasse 15/C/3 1090 Vienna, Austria Email: {pllana,sigi}@par.univie.ac.at Fatos Xhafa Department of Languages and Informatics Systems Polytechnic University of Catalonia C/Jordi Girona 1-3 08034 Barcelona, Spain Email: fatos@lsi.upc.edu Leonard Barolli Department of Information and Communication Engineering Fukuoka Institute of Technology 3-30-1 Wajiro-Higashi, Higashi-ku Fukuoka 811-0295, Japan Email: barolli@fit.ac.jp Abstract—Performance is a key feature of large-scale com- puting systems. However, theachieved performance when a certain program is executed isignificantly lowerthan the maximal theoretical performance of the large-scale computing system. The model-based performance evaluation may be used to support the performance-oriented program development for large-scale computing systems. In this paperwe present a hybrid approach for performance modeling and prediction of paralleland distributed computing systems, which combines mathematical modeling and discrete-event simulation. We use mathematical modeling to develop parameterized performance models for components of the system. Thereafter, we use discrete- eventsimulation to describe the structure of system and the interaction among its components. As a result, we obtain a high- levelperformance model, which combines the evaluation speed of mathematical models with the structure awareness and fidelity of the simulation model. We evaluate empirically our approach with a real-world material science program that comprises more than 15,000 lines of code. I. INTRODUCTION The solution ofresource-demanding scientific and engi- neering computational problems involvestheexecution of programs on large-scale computing systems, which commonly consist of multiple computational nodes, in order to solve large problems or to reduce the time to solution for a single problem. However, thereis a widening gap between the maximal theoretical performance and the achieved performance when a certain program is executed on a large-scale parallel and dis- tributed computing system. This gap may be reduced by tuning the performance of a program for a specific computing sys- tem.Commonly, the programmer develops multiple versions of the program following various parallelization strategies. Thereafter, the programmer assesses the performance of each program version, and selects the program version that achieves the highest performance. The code-based performance tuning of a program is a time-consuming and error-prone process that involves many cycles of code editing, compilation, execution, and performance analysis. This problem may be alleviated by using the model-based performance evaluation. In this paper we present a methodology and the correspond- ing tool-support for performance modeling and prediction of parallel and distributed computing systems, which may be used in the process of performance-oriented program development for providing performance prediction results starting from the early program development stages. Based on the performance model, the performance can be predicted and design decisions can be influenced without time-consuming modifications of large parts of an implemented program. We propose a hybrid approach for performance modeling and prediction of parallel and distributed computing systems, which combines mathematical modeling and discrete-event simulation. Ouraim is to combine the evaluation speed of mathematical models with the structure awareness and fidelity of the simulation model. For the purpose of evaluation of our approach we have developed a performance modeling and prediction system called Performance Prophet. We demon- strate the usefulness of Performance Prophet by modeling and simulating a real-world material science program that comprises more than 15, 000 lines of code. In our case study, the model evaluation with Performance Prophet on a single processor workstation is several thousand times faster than the execution time of the real program on our cluster. The rest of this paper is organized as follows. Our approach for hybrid performance modeling and prediction of parallel and distributed computing systems is described in Section II. We evaluate empirically our approach in Section III. The related work is discussed in Section IV. Finally,Section V concludes the paper and briefly describes the future work. II. H YBRID PERFORMANCE MODELING AND PREDICTION Commonly for performance modeling of computing systems is used mathematical modeling (MathMod) or discrete event simulation (DES). When applied separately, each ofthese approaches has severe limitations. Mathematical models commonly represent the whole com- puting system as a symbolic expression that lacks the structural information [1]. An example of a mathematical performance model that models the program execution time is expressed as follows, T P rogExec = C Op T Av , International Conference on Complex, Intelligent and Software Intensive Systems 0-7695-3109-1/08 $25.00 © 2008 IEEE DOI 10.1109/CISIS.2008.20 132 International Conference on Complex, Intelligent and Software Intensive Systems 0-7695-3109-1/08 $25.00 © 2008 IEEE DOI 10.1109/CISIS.2008.20 132 International Conference on Complex, Intelligent and Software Intensive Systems 0-7695-3109-1/08 $25.00 © 2008 IEEE DOI 10.1109/CISIS.2008.20 132