Dynamic Processor zyxwv Assignment in a Task System Load Anna Brunstrom Rahul Simha brunstro@cs.wm.edu simha@ cs.wm .edu Department of Computer Science The College of William and Mary Williamsburg, VA 23185 Abstract- In many applications a task is repeat- edly executed on several sets of data in a pipeline fashion. For example, image processing software is frequently executed zyxwvuts on a sequence of images. Due to varying semantic content in the data, each subtask zyxwvutsr of the overall task may experience varia- tions in execution time for different instances of the data. We consider the problem of efficiently executing a task on a large parallel machine. In particular, we focus on dynamically assigning processors to subtasks in response to changing workloads seen by the subtasks. We present several processor as- signment algorithms and study their performance through simulation. Our simulation study is based on an application zyxwvuts b. computer vision. Our results suggest dynamic re-assignment can perform very close to the theoretical optimum and distinctly better than static assignments. I. INTRODUCTION As parallel machines with large numbers of processors get cheaper and more accessible, it is anticipated that sev- eral generic computations which are now executed on se- rial machines will be ported to large parallel machines to obtain improved response times. In this paper, we con- sider task systems and parallel machines in which the num- ber of processors exceeds the number of subtasks and in which the task structure is known. We are particularly interested in tasks for pipelined applications wherein the task is repeatedly executed on different data sets. For example, image processing and computer vision applica- tions execute the same subtasks on each image in a se- quence of images; in this case, the number of subtasks is usually small (about 10 subtasks), the number of proces- sors in the target parallel machines (typically more than 32 processors) is larger and the relationship between the subtasks (a serial pipeline) is known. It is often the case that benchmark studies provide response time information 0-7803-2642-3/95/$4.00 01 995 IEEE with Time-varying about each individual subtask, i.e., the response time for each subtask given that a certain number of processors are assigned to the subtask. A processor assignment prob- lem then naturally arises [2]: given that the subtasks must execute in parallel, how should the (large number of) pro- cessors be divided among the subtasks? In [2] algorithms are presented for an optimal assignment of processors to subtasks in pipeline computations with zyxw static or unchang- ing response times. In this paper, we present algorithms to allocate proces- sors to tasks in which response times for subtasks change with time. In applications, this variability in time is typ- ically caused by heterogeneity in the data. For example, in a computer vision system, the semantic content of im- ages often determines how long it takes for some subtasks to process the images. In this case, it is desirable to dy- namically alter the assignment of processors to subtasks in order to improve overall response times. We present two classes of algorithms, algorithms which rely on extensive accumulated information and, on the other hand, simple heuristics that move a few processors to heavily loaded subtasks. Our simulation study compares their perfor- mance relative to each other and to theoretically optimal assignments. Our investigation is based on a parallelized computer vision system consisting of a simple pipeline of nine subtasks, in which subtasks synchronize after process- ing an image. Some other related processor assignment problems have been studied in the literature, mostly focused on static al- locations. An approximation algorithm for assigning pro- cessors to a set of independent tasks is given by Krishna- murti and Ma in [lo]. A comparison of several algorithms for static assignment of processors during initialization of an application is presented in [14]. The algorithms are based on varying degrees of information on the parallelism inherent in the application. In [15] an algorithm is pre- sented for mapping a DAG, representative of the tasks to be performed and their dependencies, to the Pipelined Image-Processing Engine. Petri nets are used to provide 300