Processing data intensive Matlab jobs through Condor Fanar M. Abed Stephen McGough Newcastle University, Newcastle upon Tyne, NE1 7RU, UK Email:{f.m.al-fadhly, stephen.mcgough}@newcastle.ac.uk Abstract Condor provides a powerful job invocation en- vironment which is capable of successfully execut- ing large sets of parameter sweep jobs. Though eviction of jobs from execution nodes can become expensive if the amount of data which needs to be sent to a node is large (such as Lidar data) and checkpointing and migration is not possible. In this paper we propose here a mechanism which can be used alongside Condor to convert the nor- mal Condor job push model into a pull model where sweeps can be separated from data transfer allowing multiple sweeps on a node. This allows for smaller sub-jobs and better efficiency with re- spect to data transfer. We exemplify this work through the analysis of Lidar data processed us- ing Matlab code. 1 Introduction The Condor system [8] provides a high throughput en- vironment for processing computationally independent runs of executions. Often referred to as parameter sweep operations where many similar jobs are run changing only the input parameters. Many Condor deployments exploit cycle stealing where idol execution time on computers normally used for other purposes (such as an open ac- cess cluster within a University) are used to run Condor jobs. This tends to lend itself well to sweeps of jobs which require little data transfer and short execution times as eviction of Condor jobs from a computer, as it reverts to normal use, will have less impact on the overall flow of the sweep. Conversely if the amount of time required to stage data to / from a computational resource is high there is a desire to perform the maximum amount of work on this resource to reduce effective overhead of transferring the data. Condor provides the ability to perform checkpoint- ing and migration of executions on remote computers along with file transparency where inputs and outputs from a users program are staged back to the submitting computer. This does however require that the user can compile their code against the Condor libraries and that you are running under a UNIX based operating system. This is something which is not always possible - such as when you are using a commercial package such as Matlab. However, it would be desirable to provide some equivalent functionality’s to help reduce failed execution time. In this abstract we propose an execution environment for use within Condor which provides the following ben- efits: When data is staged to a Condor computer it can be used many times Data generated on the Condor computer is staged back to the submission computer as soon as possi- ble. We therefore separate the data staging part of the Con- dor job submission from the job deployment phase and provide a mechanism for returning data to the user while code is still running on the remote computer. The user is required to provide new logic in the form of how to process the returned data and how to deal with incom- plete returns where the job is evicted before it completes execution. Our approach lends itself best to programs where a large data set is used repeatedly, which means that a large number of jobs are needed to be run concurrently on Condor. 2 General Architecture Our general architecture separates the data staging and the job execution stages of a Condor submission. Figure 1 illustrates the overall architecture. Server Condor Client Client Client sub-job and data processing sub-job and data processing sub-job and data processing Data and Executable Data and Executable Data and Executable Data and Executable Figure 1: The general architecture