1 A Hybrid Distributed Fault-Management Protocol for Combating Single-Fiber Failures in Mesh-based DWDM Optical Networks Chadi Assi 1 , Y. Ye 2 , A. Shami 1 , S. Dixit 2 , and M. Ali 1 1 Dept. of Electrical Engineering; Graduate School of The City University of New York, {assi, shami, eeali}@ees1s0.engr.ccny.cuny.edu 2 Nokia Research Center, {yinghua.ye, sudhir.dixit}@nokia.com Abstract: This paper presents a novel hybrid distributed fault- management protocol for combating single-fiber failures in mesh-based DWDM optical networks. The proposed hybrid approach combines Link State Protocol to disseminate and update information only about the physical connectivity of the network and a distributed local information-based signaling algorithm for connection management. The purpose of using a hybrid approach is two vantages: 1)- reducing the signaling overhead associated with the global information-based Link State Protocol by using a distributed approach where only local information is maintained at each node; 2)- eases the implementation of the routing protocol where physical constraints, such as link/node diversity, are imposed. The performance of the proposed hybrid approach is evaluated via comparing the dedicated-path protection and the shared-path protection schemes in terms of blocking probability, restoration time under failure assumption, and data loss incurred during the recovery phase. I- INTRODUCTION Recent advances in optical networking technology, first with wavelength-division multiplexing (WDM) and more recently with optical cross-connects (OXCs) along with the wide deployment of high-speed IP/MPLS routers is setting the foundation for the next-generation data-centric networking paradigm. In this scenario, the role of synchronous digital hierarchy/optical network (SDH/SONET) will diminish, and future IP networks will evolve towards a model comprising high-performance IP/MPLS routers interconnected by intelligent optical core networks (IP-over-WDM) that will directly provide a global transport infrastructure for legacy and new IP services. A major driver for realizing this evolution is the potential ability of such networks to provide fast automatic setup and teardown of lightpaths across the optical network, with the capability of supporting diverse client signals on the paths. Provisioning of lighpaths requires control and management protocols to perform routing and wavelength assignment (RWA) [1] functions, as well as to exchange signaling information and to reserve resources along the provisioned paths. Equally important to the process of dynamically provisioning lightpaths in mesh-based wavelength-routed networks is the reliability offered by the network to the services and lightpaths it supports. This requires the development of the appropriate protection and restoration schemes, which minimize the data loss when a link failure occurs [5, 6]. In mesh-based WDM networks, end-to-end path-protection schemes can be employed to achieve efficient resource utilization. Several approaches for connection/fault management have been proposed and compared in the literature including the global information-based link state approach and the local information-based distributed-routing approach [1-4, 6]. In the link state approach [1,3], each node in the network must maintain complete network state information, including the network topology and wavelength usage on each link. Based on this global information, the source node can calculate an optimum route to the destination and a wavelength to be assigned along the route. Thus, one needs to propagate throughout the network all information about the state of every wavelength on every link of the network. As a result, the state required and the overhead involved in maintaining this information would be excessive. Previous studies have also showed that the link state protocol exhibits longer connection setup delays, higher bandwidth requirements for control messages, and higher blocking probability compared to the distributed approach [1]. The distributed-routing approach [2,4] works well in terms of the amount of information stored at each node and the connection setup delay. Although this approach is not as overhead consuming as the first one, however, the drawback is that the chosen wavelength is now only locally optimized, and consequently the utilization of network resources are not globally optimized. This work proposes a hybrid distributed fault-management protocol that attempts to combine the best of both the link state and distributed-routing approaches. Specifically, the proposed hybrid approach combines Link State Protocol to disseminate and update information only about the physical connectivity of the network and a distributed local information-based signaling algorithm for connection and fault management. We employ end-to-end path-protection schemes to examine the applicability of the proposed approach. The traffic pattern considered here is dynamic where connection requests arrive one at a time and each connection exists for only a finite duration, called the connection-holding time. The performance of the proposed hybrid approach is then evaluated via comparing the dedicated-path protection and the shared-path protection schemes in terms of blocking probability, restoration time under failure assumption, and data loss incurred during the recovery phase. The inherent complexity associated with the shared-path protection scheme is addressed here through the introduction of two novel concepts, namely, the “sharability database” and the “most shared wavelength (MSW)