Hierarchical Partitioning and Dynamic Load Balancing for Scientiﬁc Computation James D. Teresco 1 , Jamal Faik 2 , and Joseph E. Flaherty 2 1 Department of Computer Science, Williams College Williamstown, MA 01267 USA terescoj@cs.williams.edu 2 Department of Computer Science, Rensselaer Polytechnic Institute Troy, NY 12180, USA {faikj,ﬂaherje}@cs.rpi.edu Abstract. Cluster and grid computing has made hierarchical and het- erogeneous computing systems increasingly common as target environ- ments for large-scale scientiﬁc computation. A cluster may consist of a network of multiprocessors. A grid computation may involve commu- nication across slow interfaces. Modern supercomputers are often large clusters with hierarchical network structures. For maximum eﬃciency, software must adapt to the computing environment. We focus on par- titioning and dynamic load balancing, in particular on hierarchical pro- cedures implemented within the Zoltan Toolkit, guided by DRUM, the Dynamic Resource Utilization Model. Here, diﬀerent balancing proce- dures are used in diﬀerent parts of the domain. Preliminary results show that hierarchical partitionings are competitive with the best traditional methods on a small hierarchical cluster. Modern three-dimensional scientiﬁc computations must execute in parallel to achieve acceptable performance. Target parallel environments range from clus- ters of workstations to the largest tightly-coupled supercomputers. Hierarchical and heterogeneous systems are increasingly common as symmetric multipro- cessing (SMP) nodes are combined to form the relatively small clusters found in many institutions as well as many of today’s most powerful supercomputers. Net- work hierarchies arise as grid technologies make Internet execution more likely and modern supercomputers are built using hierarchical interconnection net- works. MPI implementations may exhibit very diﬀerent performance character- istics depending on the underlying network and message passing implementation (e.g., [32]). Software eﬃciency may be improved using optimizations based on system characteristics and domain knowledge. Some have accounted for clusters of SMPs by using a hybrid programming model, with message passing for inter- node communication and multithreading for intra-node communication (e.g., [1, 27]), with varying degress of success, but always with an increased burden on programmers to program both levels of parallelization. Our focus has been on resource-aware partitioning and dynamic load bal- ancing, achieved by adjusting target partition sizes or the choice of a dynamic