Random Seeking: A General, Efficient, and Informed Randomized Scheme for Dynamic Load Balancing Nihar R. Mahapatra and Shantanu Dutt mahapatra, dutt @ee.umn.edu Department of Electrical Engineering, University of Minnesota, Minneapolis, MN 55455 Abstract We propose a completely general, informed randomized dy- namic load balancing method called random seeking (RS) suitable for parallel algorithms with characteristics found in many search algorithms used in artificial intelligence and operations research and many divide-and-conquer al- gorithms. In it, source processors randomly seek out sink processors for load balancing by flinging “probe” messages. These probes not only locate sinks, but also collect load dis- tribution information which is used to efficiently regulate load balancing activities. We empirically compare RS with a well-known randomized dynamic load balancing method, the random communication (RC) strategy, by using them in parallel best-first branch-and-bound algorithms on up to 512 processors of an nCUBE2 multicomputer. We find that the RC execution times are more than those of RS by 8– 67% when used to perform combined dynamic quantitative and qualitative load balancing, and by 5–74% when used to perform just dynamic quantitative load balancing. 1. Introduction In this paper, we consider randomized methods for dy- namic load balancing in parallel algorithms with the follow- ing characteristics. (1) The work available at any processor either (a) comprises of independent work pieces or (b) can be partitioned into such pieces as long as it is more than some non-decomposable unit; in this case work partitioning takes much less time compared to work processing. (2) The time to transfer a piece of work from one processor to another is small compared to its processing time. (3) It is not possible or is very difficult to estimate the processing time for a piece of work. These are characteristics of many search algo- rithms used in artificial intelligence and operations research and many divide-and-conquer algorithms [6]. Randomized methods are of interest because of their simplicity, ease of implementation, and good performance. Since in the above applications different work pieces can be of widely differing and unpredictable sizes and/or quality, in general, they may require either combined dynamic quantitative and qualita- tive load balancing (i.e., balancing of both “quantity” and “quality” of work pieces between different processors), or only dynamic quantitative load balancing. The random communication (RC) strategy of [4, 5] is a well-known method for performing both dynamic quantita- tive and qualitative balancing. In it, a processor on generat- This research was funded in part by a Grant-in-Aid from the University of Minnesota and in part by NSF grant MIP-9210049. Sandia National Labs provided access to their 1024-processor nCUBE2 parallel computer. ing a new piece of work transfers it to a random processor. The random quantitative load balancing schemes of [7, 9] are similar to the RC strategy. Although, due to randomization, work will be sent from source to sink processors in the RC strategy, reasonable likelihood exists for useless work trans- fers between source processors, between sink processors, and from sink to source processors (which actually aggra- vates the existing load imbalance). Also, because of them, the overhead per useful work transfer can be quite high. Due to these reasons, substantial scope exists for obtaining bet- ter performance using some type of informed randomized method that avoids these pitfalls. The random seeking (RS) strategy described in the next section is such a method. 2. The Random Seeking Strategy In this section, we describe our general RS strategy for informed, randomized dynamic load balancing. Every pro- cessor has a load attribute associated with it that char- acterizes its work load and is application dependent (e.g., it may be the number or cost of work pieces has). Depending upon their load attributes, any two processors and may either have a source-sink (denoted by ), peer- peer (denoted by ), or sink-source (denoted by ) relationship. The load attribute should be de- fined so that: (1) implies is more likely than to have useful work load to process, and can grant some of its work load to the latter to make it equally likely to per- form useful computation; (2) indicates if one processor is performing useful computation, then the other is also likely to be doing the same; and (3) signifies the converse of case (1). The RS strategy strives to establish peer-peer relationships between all processor pairs by probabilistically locating source-sink pairs via “probe” messages flung to random processors and transferring work from the sources to the corresponding sinks, and thereby attempts to maximize processor utilization. These probes not only locate sinks, but also collect load distribution infor- mation which is used to efficiently regulate load balancing activities. RS is designed to take advantage of the fact that in most parallel algorithms with characteristics men- tioned earlier, the load distribution across processors does not change drastically in a short time (i.e., most processors that are sources one instant do not become sinks the next, and vice versa). In case, this is not true for an application, even then RS will perform better than RC since it performs only useful work transfers from source to sink processors. The load balancing overhead of RS is directly related to how stringent the load balancing requirement implied by peer-peer relationships is, which should therefore be chosen