A job scheduling approach for multi-core clusters based on virtual malleability Gladys Utrera 1 , Siham Tabik 2 , Julita Corbalan 1 , and Jes´ us Labarta 3 1 Technical University of Catalonia (UPC) 08034 Barcelona, Spain {gutrera, juli}@ac.upc.edu 2 University of Malaga, 29071 Malaga, Spain stabik@uma.es 3 Barcelona Supercomputing Center (BSC) 08034 Barcelona, Spain jesus.labarta@bsc.es Abstract. Many commercial job scheduling strategies in multi process- ing systems tend to minimize waiting times of short jobs. However, long jobs cannot be left aside as their impact on the performance of the system is also determinant. In this work we propose a job scheduling strategy that maximizes resources utilization and improves the overall performance by allowing jobs to adapt to variations in the load. The experimental evaluations include both simulations and executions of real workloads. The results show that our strategy provides signiﬁcant im- provements over the traditional EASY backﬁlling policy, especially in medium to high machine loads. Keywords: job scheduling, MPI, malleability 1 Introduction Modern computational clusters tend to have thousands of execution units [5]. In order to make these investments proﬁtable, such clusters must have many users (clients). This leads to a large amount of job submissions that often exceeds the cluster capacity. Figure 1 shows a typical weekly load of the Marenostrum ma- chine [1]. Many of these clusters are composed by nodes of multi-core processors. Multi-core processors have two or more complete computational cores integrated in the same chip. As a processing core can act as an independent processor or CPU, in this work terms core and CPU are synonyms. A job scheduling strategy (JSS) is an algorithm that allocates resources to submitted jobs while applying system’s administrative policies and priorities. A JSS has to deal with a wide variety of applications, from sequential to highly parallel codes, with execution times that varies from minutes to days. This sce- nario converts the comparison of two JSS into a diﬃcult task. The high cost of the clusters usually makes user satisfaction the main objective for improving performance of the JSSs. For this reason, waiting times of short jobs that exceed by far their execution times are inadmissible. However, long jobs also play an important role in the performance which ﬁnally aﬀect short jobs as well.