Improving Scalability and Robustness of NQOSP Algorithm in Dynamic Traffic’s Network Said Hoceini , Abdelhamid Mellouk, Yacine Amirat Image, Signal and Intelligent Systems Lab – LISSI University of Paris XII-Val de Marne, IUT de Créteil-Vitry 120-122, Rue Paul Armangot - 94400 Vitry / Seine - FRANCE Tel.: 00 33 (0)1 41 80 73 74 - fax. : 00 33 (0)1 41 80 73 76 E-mail: {hoceini, mellouk, amirat}@univ-paris12.fr Abstract. This paper improves scalability and robustness of our earlier developed approach based on adaptive algorithm for packet routing using reinforcement learning called N Q-routing Optimal Shortest Paths (NQOSP). In contrast with other algorithms that are also based on Reinforcement Learning (RL) methods, the N Q-Routing Optimal Shortest Paths is based on a multi-paths routing technique combined with the Q- Routing algorithm. In this case, the exploration space is limited to N-best non loop paths in term of hops number (number of routers in a path) leading to a substantial reduction of convergence time. Moreover, each router uses an on line learning module to optimize the path in terms of average packet delivery time. In this paper, we focus on improving the scalability and robustness of our earlier developed approach. The performance of NQOSP is evaluated experimentally with OPNET simulator for different levels of traffic’s load and compared to standard shortest path, N-best algorithm and Q-routing algorithms on large interconnected network. Our Approach prove superior to a classical algorithms and are able to route efficiently even when critical aspects, such as the link broken network, are allowed to vary dynamically. 1. Introduction A routing algorithm is based on the hop-by-hop shortest-path paradigm. The source of a packet specifies the address of the destination, and each router along the route forwards the packet to a neighbor located “closest” to the destination. The best optimal path is choused according to given criteria. When the network is heavily loaded, some of the routers introduce an excessive delay while others are under- utilized. In some cases, this non-optimized usage of the network resources may introduce not only excessive delays but also high packet loss rate. Among routing algorithms extensively employed in Autonomous System Router’s, one can note: distance vector algorithm such as RIP [1] and the link state algorithm such as OSPF [2]. These kinds of algorithms do take into account variations of load leading to limited performances. A lot of study has been conducted for an alternative routing paradigm that would address the integration of dynamic criteria. The most popular formulation of the optimal distributed routing problem in a data network is based on a multicommodity flow optimization whereby a separable objective function is minimized with respect to the types of flow subject to multicommodity flow constraints [3]. However, due their complexity, increased processing burden, a few proposed routing schemes could been accepted for the internet. We listed here some QoS based routing algorithms proposed in the literature: QOSPF (Quality Of Service Path First) [4] is an extension of OSPF. Combined with a protocol of reservation, this protocol of routing with quality of service makes it possible to announce to all the routers the capacity of the links to support QOS constraints. MPLS (Multiprotocol label switching) [5] is a protocol which allow to assign a fixed path to the different flows toward their destination. It is based on the concept of label switching. A traffic characterization [6] representing the required QoS, is associated to each flow. Wang-Crowcroft algorithm [7] consists of finding a bandwidth-delay-constrained path by Dijkstra’s Proceedings of the Joint International Conference on Autonomic and Autonomous Systems and International Conference on Networking and Services (ICAS/ICNS 2005) 0-7695-2450-8/05 $20.00 © 2005 IEEE