Modeling of User Perceived Webserver Availability Wei Xie * , Hairong Sun † , Yonghuan Cao * and Kishor S. Trivedi * {wxie, hairong, ycao, kst}@ee.duke.edu * Center for Advanced Computing and Communications Department of Electrical and Computer Engineering Duke University, Durham, NC 27708 † High Reliability and Availability Technology Center Motorola, Elk Grove Village, IL 60007, USA Abstract—We propose to use Markov regenerative process (MRGP) models to study the availability of Internet-based services perceived by a Web user, which capture the interactions between the service facility and the user. The necessity of the sophisticated MRGP modeling is evidenced by the comparisons with the corresponding continuous time Markov chain (CTMC) models, which show that the popular convenient CTMC models tend to overestimate user-perceived service unavailabilities by 26% to 125%. We study two different online service scenarios: (1) single- user-single-host and (2) single-user-multiple-host. It is found that user-perceived service unavailability depends not only on the infrastructure’s failure-recovery characteristics but also, more importantly, on the user’s behavior. Also, for a service provider, to improve users’ satisfaction, inventing a fast recovery mechanism is more effective than striving for a more reliable platform given the platform availability is the same. Index Terms—User-perceived online service availability, Web user behavior, Markov regenerative process (MRGP) I. I NTRODUCTION The trend of e-commerce poses an increasingly imperative demand on the availability and reliability of the Internet-based services. The so-called “24×7” (24-hours-a-day-and-7-days- a-week) requirement for online services presents an unprece- dented technical challenge given the fact that the exponentially growing Internet is of such a large-scaled, vastly distributed and heterogeneous nature. To design high-availability (HA) service systems, it is critical to deepen our understanding of not only the causes of the failure-and-recovery behaviors of the service infrastructure, but also the users behaviors and their subjective perceptions and reactions to the provided services. There have been separated research efforts on either behaviors of the underlying infrastructures or the online user behaviors. However, the lack of effort connecting the two is obvious. This paper intends to fill this gap by providing a more complete This research was supported in part by the Air Force Office of Scientific Research under MURI Grant No. F49620-00-1-0327, and in part by DARPA and US Army Research Office under Award No. C-DAAD19 01-1-0646. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and does not necessarily reflect the view of the sponsoring agencies. This work was done while K. Trivedi was a visiting Professor in the Department of Computer Science and Engineering holding the Poonam and Prabhu Goel Chair at the Indian Institute of Technology, Kanpur. modeling for online service availability that is a result of the interactions between service platforms and users. The unavailability of the Internet-based services stems from various type of failures, malfunctions, and planned outages from a broad range of network components, service provider equipments, and user accessing facilities. Govindan et al. revealed that both the route availability and the mean reach- ability duration have degraded with the Internet growth [1]. Li et al. studied Webserver aging phenomenon and proactive software rejuvenation techniques [2]. Long et al. evaluated mean time to failure (MTTF), mean time to repair (MTTR), and availability and reliability of a sample of hosts by repeatedly polling the hosts and discovered that daily and weekly shutdowns appeared very commonly in the Internet [3]. By periodically collecting data on a set of nearly 100 popular Web sites, Kalyanakrishnan et al. in [4] found that the mean availability of Internet hosts is two-nines, i.e., about 0.99, which is far below that of telephone systems. The aforementioned research efforts all focused on the study of platform outage-recovery of Internet-based services. However, for a particular Web user, a more important performance index is service availability perceived by himself, the probability that the users service request is fulfilled. Studying the platform availability alone is apparently inadequate for this purpose. We have yet to characterize the behavior of online users and reveal the interplay between the service platform and users. It is widely accepted that the behavior of Web browsers is fairly complicated. Deng of then-GTE lab proposed a tractable empirical model, which was able to capture the behavior of World-wide-web (WWW) browsers [5]. The activity of a Web browser is modeled as an ON-OFF process, with the ON period having a Weibull distribution and the OFF time following a long-tailed Pareto distribution. ON periods are initiated by the users clicking on the hypertext links on a Web service page while OFF periods are those in which the user is reading and/or thinking and hence no requests is generated. In this study, we adopt this model as the starting point of user behavior modeling. The purpose of this paper is to evaluate the service avail- ability for the Web users. We assume that the time to failure (TTF) and the time to repair/recovery (TTR) of the Internet