Using On-the-Fly Simulation for Estimating the Turnaround Time on Non-dedicated Clusters ⋆ Mauricio Hanzich 2 , Josep L. Lérida 1 , Matías Torchinsky 2 , Francesc Giné 1 , Porfidio Hernández 2 , and Emilio Luque 2 1 Dept. Computer Science, University of Lleida, Spain {sisco, jlerida}@diei.udl.es 2 Dept. Computer Architecture and Operating Systems, University Autònoma of Barcelona, Spain {porfidio.hernandez, emilio.luque}@uab.es, {mauricio, matias}@aomail.uab.es Abstract. The computation capacity of the workstations of an open laboratory in almost every university is enough to execute not only the local workload but some distributed computation. Unfortunately, the local workload introduces a big uncertainty into the predictability of the system, which hinders the applicability of the job scheduling strategies. In this work, we introduce into our job scheduling system, termed CISNE, a simulator, which allows its scheduling decisions to be en- hanced by estimating the future cluster state. This process of estimation is backed by analytic procedures which are also described in this study. Likewise, the simulation let us assure some limit to the turnaround time for the parallel user. This paper analyses the performance of the simu- lation process in relation to different scheduling policies. These results reveal that those policies that respect an FCFS order for the waiting jobs are more predictable than those that alter the job ordering, like Backfilling. 1 Introduction Several studies [1] have revealed that a high percentage of computing resources (CPU and memory) in a Network Of Workstations (NOW/Cluster) are idle. The possibility of using this computing power to execute distributed applications with a performance equivalent to a Massively Parallel Processor (MPP) and without perturbing the performance of the local users applications on each workstation has led to a proposal for new resource management environments [2,3]. With the aim of taking advantage of these idle computing resources (CPU and memory) available across the cluster, we have developed a new scheduling environment, named CISNE [3], which combines space sharing and time sharing scheduling techniques. The space sharing scheduling component of CISNE is a job scheduler, named LoRaS (Long Range Scheduler). When a parallel job is submitted to the LoRas, the job waits in a queue until it is scheduled and ⋆ This work was supported by the MEyC-Spain under contract TIN 2004-03388. W.E. Nagel et al. (Eds.): Euro-Par 2006, LNCS 4128, pp. 177–187, 2006. c Springer-Verlag Berlin Heidelberg 2006