Partitioning & Mapping of Unstructured Meshes to Parallel Machine Topologies C. Walshaw, M. Cross, M. G. Everett, S. Johnson, and K. McManus Parallel Processing Group, Centre for Numerical Modelling & Process Analysis, University of Greenwich, London, SE18 6PF. E-mail: C.Walshaw@gre.ac.uk Abstract. We give an overview of some strategies for mapping unstructured meshes onto processor grids. Sample results show that the mapping can make a considerable difference to the communication overhead in the parallel solution time, particularly as the number of processors increase. 1 Introduction The use of unstructured mesh codes on parallel machines can be one of the most efficient ways to solve large Computational Mechanics problems. Completely general geometries and complex behaviour can be readily modelled and, in principle, the inherent sparsity of many such problems can be exploited to obtain excellent parallel efficiencies. An im- portant issue for such codes is the problem of distributing the mesh across the memory of the machine at runtime so that the computational load is evenly balanced and the com- munication overhead is minimised. It is well known that this problem is NP complete, so in recent years much attention has been focused on developing suitable heuristics, and some powerful methods, many based on a graph corresponding to the communication requirements of the mesh, have been devised, e.g. [2]. A pertinent but often ignored factor in parallel processing is the underlying topology of the machine’s interconnection network. For example, even on machines with small numbers of processors, it is possible to detect variations between the latencies of proces- sors which are closely linked and those which are ‘far apart’. Although most machines now have facilities for ‘wormhole routing’ (i.e. the passing of messages between two non-adjacent processors without interrupting intermediate processors), high contention of the interprocessor links can result if adjacent partitions are mapped to, say, opposite corners of a processor array. As the trend towards massively parallel machines contin- ues, these effects are likely to be exacerbated and the machine topologies will have an increasingly important effect on the parallel overhead arising from any given partition. Most of the current generation of mesh partitioning algorithms, however, take no ac- count of the topology. The mapping to the machine is either treated as a post-processing step, applied after the data has been partitioned, or even ignored. For some machines with small numbers of processors this may be a legitimate simplification, but as ma- chine sizes increase it is likely that a poor mapping will cause significant performance degradation. In: A. Ferreira and J. Rolim, editors, Proc. Irregular ’95: Parallel Algorithms for Irregularly Structured Problems, volume 980 of LNCS, pages 121-126. Springer, 1995.