Framework for performance engineering of OSPF software H. El-Sayed, M. Ahmed, M. Jaseemuddin and D.C. Petriu Abstract: The performance of the Open Shortest Path first (OSPF) routing protocol software is pre- sented, which includes measuring its performance, analysing the results, proposing solutions for improvement and evaluating their effect. First, a reusable framework for evaluating the perform- ance of routing software is proposed, which allows to perform reproducible experiments in a con- trolled environment with different network topologies. Then, performance bottlenecks are identified and the relative performance of several low-level optimisations suggested to improve the route computation code and data structures is discussed. In addition, the design and implemen- tation of an algorithm-level optimisation is presented, using the Incremental Shortest Path First (ISPF) algorithm, and its performance benefits are then presented. Substantial gains in performance are achieved by using ISPF, more than what is possible by employing techniques for code optim- isation and by using efficient data structures to implement Dijkstra’s SPF algorithm. Finally, the effect of topological change on the size of the affected subtree is investigated, and it is found that most of the time a topological change affects a small number of nodes in an OSPF area, causing a small number of route updates in the routing table and consequently, a smaller execution time for ISPF. 1 Introduction Routing protocols are a critical component of the Internet. Their performance in laying down the data path is crucial in order to achieve high performance for data delivery within a network. Routing protocols update their routing tables in response to network changes. For example, com- munication links or router failures in the network can change the optimal routes to certain destinations. It takes some time for any routing protocol to compute new stable optimal routes after a network change, and the routes used in the interim might be sub-optimal or even non-functional. The process of finding new optimal routes after the network changes is called convergence [1, 2]. The convergence time of a routing protocol should be short to avoid packet losses due to transient routing black-holes, which can occur when a non-functional route is used during the time routing con- verges to the new stable optimal routes. Routing protocols with short convergence time are important for building high-performance stable networks. The Open Shortest Path First (OSPF) is used widely as an intra-domain routing protocol in IP networks today [1]. OSPF is a link-state protocol, where each router generates and reliably floods Link State Advertisements (LSAs) to create and maintain a local, consistent view of the topology of the entire routing domain. Currently, there are several OSPF implementations available, which are generally robust and of high quality [3, 4]. However, the performance of these implementations in large operational IP networks is not well understood, especially under transient stress [5, 6]. Service level agreements and quality assurance depend on routing stability. Any internal topological or OSPF con- figuration change, in general, results in altering traffic flows throughout the network, especially during the conver- gence time. Further, intra-domain routing changes also cause inter-domain routing changes, since Border Gateway Protocol uses intra-domain (OSPF) distance calculations to break ties between candidates for traffic egress points [7]. Thus, OSPF events potentially impact a very large number of flows and a very large number of customers [8]. A number of key tasks internal to OSPF implementations affect the speed at which updates propagate in the network, the load on individual routers, and the time needed for both intra-domain and inter-domain routing to re-converge fol- lowing an internal topology or a configuration change. Hence, it is important to characterise the performance of any given OSPF implementation, in order to ensure its effi- ciency and to enhance the understanding of its performance after deployment. In the work of Eramo et al. [9] it is shown that the performance of an open-source OSPF implementation is improved through proper analysis to the extent that it outperforms a widely used commercial implementation. In this paper, we present a framework for measuring the performance of the OSPF routing protocol. Our main focus is to measure the overheads involved in the routing table computation, which is known to be one of the most CPU-intensive activities within OSPF-based routers [1, 8]. Dijkstra’s Shortest Path First (DSPF) algorithm [10] is a de-facto standard used in almost all OSPF implementations for routing table computation. We have developed a set of white-box tests to analyse and characterise the performance # The Institution of Engineering and Technology 2006 IEE Proceedings online no. 20060032 doi:10.1049/ip-sen:20060032 Paper first received 21st June and in revised form 24th October 2006 H. El-Sayed is with UAE University, Al-Ain, PO Box 17555, UAE M. Ahmed is with Juniper Networks, Sunnyvale, CA, USA M. Jaseemuddin is with Ryerson University, Toronto, Canada, ON M5B 2K3 D.C. Petriu is with Carleton University, Ottawa, Canada, ON K1S 5B6 E-mail: helsayed@uaeu.ac.ae IEE Proc.-Softw., Vol. 153, No. 6, December 2006 219