DRAFT -- DRAFT -- DRAFT -- DRAFT -- DRAFT -- CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Reparallelization techniques for migrating OpenMP codes in computational grids Michael Klemm, Matthias Bezold, Stefan Gabriel, Ronald Veldema, and Michael Philippsen Computer Science Department 2 • University of Erlangen-Nuremberg Martensstr. 3 • 91058 Erlangen • Germany {klemm, veldema, philippsen}@cs.fau.de, bezold@msbezold.de, stefan-gabriel@gmx.net SUMMARY Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short-running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overestimation is penalized by long queue times. As a solution, we present an automatic reparallelization and migration of OpenMP applications. A reparallelization is dynamically computed for an OpenMP work distribution when the number of CPUs changes. The application can be migrated between clusters when an allocated time slice is exceeded. Migration is based on a coordinated, heterogeneous checkpointing algorithm. Both reparallelization and migration enable the user to freely use computing time at more than a single point of the grid. Our demo applications successfully adapt to the changed CPU setting and smoothly migrate between, for example, clusters in Erlangen, Germany, and Amsterdam, the Netherlands, that use diﬀerent kinds and numbers of processors. Benchmarks show that reparallelization and migration impose average overheads of about 4% and 2%, respectively. 1. Introduction While oﬀering novel computing opportunities, the boundaries between individual clusters of a computational grid are still visible to users. In addition to the problem of heterogeneity (e. g. diﬀerent architectures and diﬀerent interconnects), the user is faced with a cluster’s job scheduling mechanism that assigns computing resources to jobs. The scheduler asks the user for an estimation of the job’s runtime when the job is submitted to the cluster. From all submitted jobs, the scheduler then creates an execution plan that assigns the jobs to nodes. Usually, the scheduler prefers short-running over long-running jobs and it prefers jobs that only need a small number of CPUs over more demanding ones. Short jobs with only a few CPUs increase the cluster’s utilization, while long-running jobs or jobs that require many CPUs often cause Received December, 06th 2006 Copyright c  2000 John Wiley & Sons, Ltd. Revised