Scalable approaches to load sharing in the presence of multicasting Craig E Wills and David Finkel This work examines policies based on multicasting for load sharing in a local area network environment. In these policies, lightly-loaded nodes join multicast groups to indicate their ability to accept additional work, and heavily-loaded nodes send multicast messages to locate these lightly-loaded nodes. Simulation is used to study the performance of these load sharing policies and compare them with previously proposed load sharing policies. The results show that multicasting is an efficient method for locating lightly-loaded nodes, yielding better response time compared to previous policies. In addition, the results show that multicast-based policies can be used to lessen network traffic to zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA busy nodes and nodes on remote LANs, while scaling to large numbers of machines. Keywords: load sharing, multicasting, distributed systems, local area networks As network environments grow to contain more machines, the possibility of transferring work from heavily-loaded to lightly-loaded machines is attractive. Studies have shown that in large networks there are many idle workstations at any time, even at peak usage periods in the day’. The process of redistributing workload in a network is the problem of zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA load sharing, which has been examined in detail by much previous work. We previously examined approaches to load sharing based on multicasting in a small local network environment*. This work specifically examines load sharing policies based on multicasting that can be scaled to networks with hundreds, even thousands of machines. The use of multicasting is appealing because it supports efficient delivery of messages to a subset of machines, as opposed to broadcast, which delivers messages to all machines, or unicast, which delivers messages from one machine to another. Multicasting, which supports logical addresses, is used in our work to group lightly-loaded machines together in a single Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA Paper received: March 1994: revbed puper received: 22 May 1994 multicast address group. Requests to locate lightly- loaded machines can be multicast to this address rather than probing random machines, or broadcasting infor- mation between machines. There are three benefits to this approach. First, use of multicasting increases the chance that a lightly-loaded machine can be found efficiently even if the overall load of the network is high. Second, multicasting allows only lightly-loaded nodes to be recipients of load sharing messages. Most load sharing policies involve sending probe messages to potential transfer destinations, or broadcasting state information periodically. Either method interrupts both heavily- and lightly-loaded nodes. Because interrupt handling and context switching are relatively more expensive in modern architectures3, avoiding interruptions to busy nodes is of even greater impor- tance. The third benefit is to limit traffic on shared resources such as backbone networks that interconnect individual local area networks (LAN). Although bandwidth of the backbone network is a potential bottleneck, the ability of bridges to switch packets between a LAN and the backbone network is often a more serious problem. By using separate multicast addresses to represent lightly- loaded nodes on individual LANs, bridges can be configured to restrict multicast messages to the locat LAN or to accept only multicast messages directed specifically to the LAN. Our approach is to examine load sharing policies based on multicasting in such a way as to make direct comparisons with other load sharing work. We examine how these policies work as we scale to networks of hundreds or thousands of machines using selective multicasting to a LAN. In addition to increasing the scale, we examine non-homogeneous workloads and the effect of message costs on the policies. The policies are evaluated in terms of the overall task response time, the number of messages transmitted on the network and the number trans- mitted by each machine. We also examine the number of messages handled by busy machines, those not willing to accept transferred tasks, and the number of 620 0140~3664/95/$09.50 0 1995-Elsevier Science B.V. All rights reserved computer communications volume 18 number 9 September 1995