Abstract In this paper we present a method useful for the system engineer to improve the service performance of a Web server through session-based Web workload, the best indicator of the users perception of the Web quality. Bytes transferred per session is one of the characteristics of intra-session which collectively describe session-based Web workload. This characteristic exhibits heavy-tailed behavior and its distribution match well with the Pareto Type I distribution [Goseva-Popstojanova et al. (2006)]. So for the performance study, we estimate the probability, = ( > ) R PX Y , when X and Y are two independent but not identically distributed random variables following Pareto Type I distribution, using the maximum likelihood estimator. Extensive simulation studies are carried out to study the performance of the estimator. A generalized two-sided confidence interval for R of the Pareto type I distribution is constructed. The derived confidence interval suits both small samples and large samples. The average width and the coverage probability of this confidence interval is compared with the usual asymptotic confidence interval through simulations. Using real data, we illustrate how R and generalized confidence interval of R can be used for improving the service performance of a Web server. Keywords - Generalized confidence interval, Generalized pivotal quantity, Heavy tail distributions, Pareto distribution. I. INTRODUCTION World Wide Web (WWW), the largest distributed system ever built have made it possible to access vast amounts of information at the touch of a button and has become part of the fabric of our society. Its tremendous growth brought huge challenges to system engineers, Web site designers, maintainers and content producers. A clear understanding of the WWW workloads and its characterization is fundamental to the goal of improving Web performance. The alarming growth of Web traffic has sparked much research activity on improving the World Wide Web. Though there are several studies reported in the literature [Braun and Claffy (1994), Bestavros et al. (1995), Arlitt and Williamson (1997), Fengbin et al. (2007)] most studies focus on characterizing Web clients, rather than Web servers. In our earlier work [Dais and Sebastian (2008)], we studied the workload characteristics of Internet Web servers, using a college Web server data. In this paper we are considering the performance study, especially service performance of a Web server with due importance to user sessions. A session is defined as a sequence of requests from the same user during a single visit to the Web site. We can see considerable amount of research work focussing on characterizing Web user sessions for different purposes such as capacity planning, finding user navigational patterns etc. in the literature. Arlitt (2000) presented a detailed characterization of user sessions of the 1998 World Cup Web site and showed how these characteristics can be utilized in improving Web server performance. Goseva- Popstojanova et al. (2006) introduced several inter-session and intra-session characteristics which collectively describe session-based workload. In this work we are concentrating on one intra-session characteristic, bytes transferred per session. This characteristic exhibits heavy- tailed behavior and its distribution match well with the Pareto Type I distribution [Goseva-Popstojanova et al. (2006)]. Generalized Confidence Interval for R = of Pareto Distribution and Its Application in Web Performance P(X > Y) 3 1 2 3 Dais George 1 , Pit Pichappan 2 and Sebastian George Catholicate College, Pathanamthitta, Kerala, India Faculty of Computer and Information Sciences, Al Imam University, Riyadh St.Thomas College, Palai, Kerala, India (daissaji@rediffmail.com, ppichappan@gmail.com, sthottom@gmail.com) The Pareto distribution, is a power law probability distribution that coincides with social, scientific, geophysical, actuarial and many other types of observable phenomena. The univariate Pareto distribution is a simple model for non- negative data with a power law probability tail. It is a useful model in the analysis of income data, reliability studies, risk modeling and business failure data [Lomax (1954)]. Arnold and Press (1983) gave an extensive historical survey of its uses in the context of income distribution. Jan Beirlant et al. (1996), Embrechts et al. (1997), Reed (2003) and Vandewalle et al. (2007) discuss the applications of the Pareto distribution in various fields. The sizes of human settlements, file size distribution of internet traffic which uses the TCP Protocol, clusters of Bose-Einstein condensate near absolute zero, the values of oil reserver in oil fields, the length of distribution in jobs assigned in super computers, the standardized price returns in individual stocks, sizes of sand particles, sizes of materiorites, number of species per genes, areas burn in forest fires and severity of large casuality losses for certain lines of business such as general liability, commercial auto and workers compensation are examples of random variables following the Pareto Type I distribution. Power laws have been discovered for Web file sizes, Web site connectivities and the router connection degrees. A Web file size distribution is important for Web servers scheduling. More importantly, these discoveries motivate us to identify the mechanisms behind the 978-1-4244-9825-3/11/$26.00 ©2011 IEEE 153