Restoration Based on Bandwidth Degradation and Service Restoration Delay for Optical Cloud Networks Felipe Lisboa 1 , Keiko V. O. Fonseca 1 , Luis C. Vieira 1 , Paolo Monti 2 , Gustavo B. Figueiredo 3 , Juliana de Santi 1 1 Federal University of Technology - Paraná (UTFPR) - Curitiba, Brazil Email: felipe@ecentric.com.br Email: {keiko, vieira, jsanti}@utfpr.edu.br 2 KTH Royal Institute of Technology, Sweden Email: pmonti@kth.se 3 Federal University of Bahia, Brazil Email: gustavo@dcc.ufba.br Abstract—The use of an optical fiber infrastructure to accom- modate cloud services is gaining a lot of momentum. This is mainly driven by the bandwidth and latency performance that optical transmission can guarantee. On the other hand failures in the optical infrastructure may result in the concurrent loss of a possibly high number of cloud services. For this reason, being able to offer cloud service resiliency at a contained cost is of the utmost importance for operators. This paper proposes a heuristic for the restoration of optical cloud services in the presence of a single fiber link failure. The heuristic leverages on two parameters specified in the service class (i.e., restoration delay and bandwidth degradation) in order to make the best use of the available optical resources during the recovery process. The numerical results presented in the paper show that the proposed restoration algorithm is able to improve cloud service restorability without a negative impact on the cloud service blocking probability. I. I NTRODUCTION Optical networks are an appealing solution to support a wide range of cloud services in scientific, business, and consumer- based applications (e.g., content delivery services). Large bandwidth availability, low latencies, and reduced energy con- sumption are some of the important characteristics that make optical networks well-suited for this task [1]. Nonetheless, due to the accidental cuts of fibers, equipment failures, or even malicious attacks, the optical network infrastructure is susceptible to failures. In the presence of a failure, one or more lightpaths are disrupted, potentially affecting several cloud services and consequently causing the loss of a large amount of data. For this reason, network operators must implement recovery schemes to maintain an acceptable level of cloud services survivability while, at the same time, making sure that the resiliency is provided at a contained extra cost (i.e., in terms of how efficiently optical resources are used). Protection strategies are based on the allocation of redun- dant optical resources, to be used only in the occurrence of a failure. As a result, these strategies guarantee 100% recovery but have an inherent cost in terms of resource efficiency (i.e., protection resources are most of the time unused) [2]. In order to reduce such cost, operators may use strategies based on the restoration concept. With this approach, no backup optical resources are reserved beforehand. After the occurrence of a failure, the affected lightpaths are re-routed (based only on the available optical resources) in order to restore as many cloud services as possible. As a result restoration strategies are more resource efficient, but cannot guarantee 100% recovery [2]. In the literature, there are a number of studies that try to improve the recovery performance of restoration strategies. Some of them leverage on the concept of cloud service degra- dation, whenever this possibility is allowed by the specific cloud service class. One possibility is to allow for bandwidth degradation during the restoration process [3]–[6]. Another option is to leverage the maximum allowable restoration delay information (i.e., the time between the occurrence of the failure and the time in which a cloud service is restored) to decide when to restore a cloud service [7]–[12]. Although the aforementioned research directions have been extensively studied, the combined use of both bandwidth degradation and restoration delay has not been investigated. This paper proposes an approach that combines both these con- cepts to offer a resource-efficient restoration-based recovery strategy for optical cloud services. The joint use of bandwidth- degradation- and restoration-delay-based approaches can be particularly useful since cloud services usually have different requirements [13] which are usually specified in the Service Level Agreements (SLAs) and can be leveraged to decide which strategy to use to recover a cloud service in the presence of a failure. Hard-real-time applications (i.e., surgical procedures) must be immediately restored using an alternative lightpath with enough capacity to recover the entire cloud service. Alternatively, soft-real-time applications (e.g., video streaming) can be restored through an alternative lightpath with a reduced capacity, degrading the cloud service [4] to a minimum acceptable level. On the other hand, non-real- time applications (e.g., grid services for data processing) have some flexibility in both the bandwidth and time domain.