Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. SURVIVABILITY MODELING WITH STOCHASTIC REWARD NETS Poul E. Heegaard Department of Telematics Norwegian University of Science and Technology (NTNU) Trondheim, N-7491, Norway Kishor S. Trivedi Pratt School of Engineering Duke University, Durham, NC 27708, USA ABSTRACT Critical services in a telecommunication network should survive and be continuously provided even when undesirable events like sabotage, natural disasters, or network failures happen. The network survivability is quantified as defined by the ANSI T1A1.2 committee which is the transient performance from the instant an undesirable event occurs until steady state with an acceptable performance level is attained. Performance guarantees such as minimum throughput, maximum delay or loss should be considered. This paper demonstrates alternative modeling approaches to quantify network survivability, including stochastic reward nets and continuous time Markov chain models, and cross-validates these with a process-oriented simulation model. The experience with these modeling approaches applied to networks of different sizes clearly demonstrates the trade-offs that need to be considered with respect to flexibility in changing and extending the model, model abstraction and readability, and scalability and complexity of the solution method. 1 INTRODUCTION Our society is critically dependent on a wide variety of telecommunication services, and telecommunication networks and services today are part of the national critical infrastructure that needs to be protected. Hence, evaluation of network survivability is of outmost importance under a variety of threats, like attacks, accidents, and failures, that may cause minor or major service degradations. Specifically, survivability is quantified by the transient performance after an undesired event has occurred, as specified by (ANSI T1A1.2 Working Group on Network Survivability Performance 2001). In a multi-service telecommunication network it is essential to provide virtual connections between peering nodes ensuring an overall good utilization of the network resources, and at the same time providing differentiated and guaranteed Quality of Service and resilience requirements. The management of such virtual connections is a challenging task since virtual connections need to be continuously operational without unnecessary delays and with priority to highly critical services even when undesired events occur. Many management techniques exist that apply to different network layers, use pre-planned or reactive techniques, and utilize various setup methods with different resource utilization on local or global operational domain and scope of repair. See (Cholda et al. 2007) for an excellent classification of recovery techniques and recent state of the art. A model for the evaluation of the virtual connection management needs to consider both the behavioral as well as the structural aspects of the system. This means that the model must capture how the performance of the virtual connection is affected by routing and rerouting, by failures, by traffic load variations, by changes in network capacities, and by different service requirements. Structural dependability models typically focus on the probabilities of terminal connectivity, while behavioral models, e.g., as proposed in (Gan and Helvik 2006), take the network dynamics into account and provide steady state service availability. Combining structural and behavior aspects is typically done using simulation models, stochastic Petri nets such as stochastic reward nets, or continuous time Markov chains, e.g., using Markov dependability models or queuing network models for performance analysis, or combined performance and dependability Markov reward type models as in (Meyer 1980, Haverkort et al. 2001, Trivedi 2001). 807 978-1-4244-5771-7/09/$26.00 ©2009 IEEE