RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration ˚ Ashild Grønstad Solheim, Olav Lysne, and Tor Skeie Networks and Distributed Systems Group, Simula Research Laboratory Lysaker, Norway Department of Informatics University of Oslo Oslo, Norway Abstract. Reconfiguration of an interconnection network is fundamen- tal for the provisioning of a reliable service. Current reconfiguration methods either include deadlock-avoidance mechanisms that impose per- formance penalties during the reconfiguration, or are tied to the Up*/Down* routing algorithm which achieves relatively low performance. In addition, some of the methods require complex network switches, and some are limited to distributed routing systems. This paper presents a new dynamic reconfiguration method, RecTOR, which ensures deadlock- freedom during the reconfiguration without causing performance degra- dation such as increased latency or decreased throughput. Moreover, it is based on a simple concept, is easy to implement, is applicable for both source and distributed routing systems, and assumes Transition- Oriented Routing which achieves excellent performance. Our simulation results confirm that RecTOR supports a better network service to the applications than Overlapping Reconfiguration does. 1 Introduction Reliable interconnection networks [1] are essential for the operation of current high-performance computing systems. An important challenge in the effort to support a reliable network service is the ability to efficiently restore a coherent routing function when a change has occurred in the interconnection network’s topology. Such a change in topology could be a result of an unplanned fault in one of the network’s components, and, as the size of systems grow, the probability of a failing component increases. Furthermore, planned system updates, where network components are removed or added, could also cause changes in the topology. Regardless of the cause of the topology change, the disturbance of the network service provided to the running applications should be minimized. When a change has occurred, a new routing function must be calculated for the resulting topology, and we refer to the transition from the old routing function to the new one as reconfiguration. A main challenge related to reconfiguration is deadlock-avoidance. The transition from one routing function to another may result in deadlock even if each routing function is deadlock-free, as packets that H. Sips, D. Epema, and H.-X. Lin (Eds.): Euro-Par 2009, LNCS 5704, pp. 1052–1064, 2009. c Springer-Verlag Berlin Heidelberg 2009