Software Probes: towards a quick method for machine characterization and application performance prediction Alexandre Strube alexandre@caos.uab.cat Dolores Rexachs dolores.rexachs@uab.es Emilio Luque emilio.luque@uab.es University Autonoma of Barcelona Computer Architecture and Operating System Department (CAOS) Barcelona, SPAIN Abstract Computers perform different applications in different ways. To characterize an application performance into a machine, the usual method is a throughout execution of it. This work is a step into a synthetic probe able to character- ize a master-worker application’s performance in a fraction of the time required to run it entirely. This is specially im- portant for CPU-intensive scientific applications, who runs for very long, as it makes sense that it runs as efficiently (and fast) as possible. To know how, and for how long a master-worker application is going to run can guide the decision to use this machine or not. Our software probe takes into account only the performance-relevant parts of the application, discovering a program’s relevant phases. Running solely these significant phases is a powerful way to quickly characterize the application’s performance on a machine. It can help to select the best computing nodes in a grid or in a multi-cluster to run this application, and even quickly predict the total execution time for this appli- cation/data set in the machine analyzed. We also present ongoing work on a fully synthetic probe generated from pro- grams’ phases. 1 1 Introduction Our objective is to build an application’s probe, to quickly characterize an application. If the probe is repre- sentative enough of the whole program, we can determine the performance of a machine just running it, what will be 1 This work has been supported by the MEC-Spain under contracts TIN 2004-03388 and TIN2007-64974 hundreds - or thousands - of times quicker than the applica- tion’s execution. Computers give different performance indexes according to the application it is running. The most precise way to de- termine the performance of a given application running on a computer is to run this application itself on the machine, as hardly a benchmark can give us a precise image of some machine’s performance that can match the behavior of our applications. Instead, most of them reflects a narrow set of applications at best, and it’s hard to reflect the behavior of new programs using old benchmarks [20]. In changing heterogeneous environments, such as grids or multi-clusters, where the computational resources are not necessarily known until the execution begins, it is interest- ing to decide if a machine is good enough to run our appli- cation in a short time. To run the application thoroughly can be too slow, and to run a benchmark is not precise enough. This work is focused on parallel master/workers applica- tions, running on multi-clusters. On those environments, master-worker applications can benefit from a correct node selection. In one side, by selecting the fastest nodes by examining its computation and communication characteristics, and by other side, those who are able to help with the computation being busy most of the time, that is, both for brute performance and over- all cluster efficiency. Argollo [4] proposed a methodology to select the best number of nodes in a multi-cluster en- vironment while maintaining the efficiency over a defined threshold in master/worker applications. According to his work, efficiency can be achieved through the correct nodes’ selection. An advantage of a quick probe of an application has to do with administrative issues. A large application running for hours long, interrupting a whole cluster of machines just for testing issues, can be a nuisance to say the least. A spe-