Performance Evaluation 70 (2013) 197–211 Contents lists available at SciVerse ScienceDirect Performance Evaluation journal homepage: www.elsevier.com/locate/peva Dynamic software rejuvenation policies in a transaction-based system under Markovian arrival processes Hiroyuki Okamura , Tadashi Dohi Department of Information Engineering, Graduate School of Engineering, Hiroshima University, 1–4–1 Kagamiyama, Higashi-Hiroshima 739–8527, Japan article info Article history: Available online 31 August 2012 Keywords: Software aging Software rejuvenation Long-run average reward Power efficiency Markov decision process Optimality of rejuvenation policy abstract This paper presents a Markov decision process (MDP) formulation for a transaction-based system with software aging and rejuvenation. In our formulation, the arrival process of transactions is described as a Markovian arrival process (MAP). In addition, we introduce a probabilistically degrading processing rate to model the software aging. Furthermore, the paper focuses on two performance criteria to determine the optimal rejuvenation strategy: the long-run average reward and the power efficiency. Under these performance criteria, we formulate the optimality equations of MDPs for the maximization of the long-run average reward and power efficiency. Numerical experiments show that the optimal rejuvenation policy has the monotone property, and can be characterized by a threshold policy with the number of transactions through the sensitivity and statistical analysis using real traffic and aging data. © 2012 Elsevier B.V. All rights reserved. 1. Introduction The concept of software aging and rejuvenation has widely spread to the system design with low-cost fault tolerance technique. The software aging is caused by aging-related bugs [1]. Such bugs cause performance degradation or a sudden hang/crash of the system, which is called the software aging phenomenon. Typical examples of software aging are memory leaks and round-off errors. They lead to the exhaustion of system resources and accumulation of errors. In general, software aging can be predicted by monitoring elapsed time, workload or other system attributes. Based on the monitored attributes, a proactive action is feasible such as a system reboot to prevent the performance degradation and system failure caused by software aging. Such proactive actions are called software rejuvenation. Garbage collection, flushing operating system kernel tables, reinitializing internal data structures, and hardware reboot are examples of software rejuvenation [2,3]. Software rejuvenation has been recognized as an important technique for a software application that executes continuously for long periods of time. One of the significant issues in software rejuvenation is how to determine the time to rejuvenate, because the rejuvenation needs overhead time. There are mainly two approaches to determine the time for software rejuvenation: model-based and measurement-based approaches. Huang et al. [4] provided a seminal work on the model-based approach for the software aging and rejuvenation process in a real telecommunication billing application. Their model was based on a continuous-time Markov chain (CTMC) with four states, and focused on the steady-state system availability and the expected operation cost per unit time in steady state. Since Huang et al.’s work, many authors have discussed software rejuvenation policies from the viewpoint of model-based analysis [5–11]. On the other hand, Garg et al. [12] tried to characterize and predict software aging by system attributes that can be observed in real systems. Their approach is called measurement-based analysis. Vaidyanathan et al. [13], Alonso et al. [14], Corresponding author. Tel.: +81 82 424 7697; fax: +81 82 422 7025. E-mail address: okamu@rel.hiroshima-u.ac.jp (H. Okamura). 0166-5316/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.peva.2012.07.004