Abstract – In this paper we present a method useful for
the system engineer to improve the service
performance of a Web server through session-based
Web workload, the best indicator of the users
perception of the Web quality. Bytes transferred per
session is one of the characteristics of intra-session
which collectively describe session-based Web
workload. This characteristic exhibits heavy-tailed
behavior and its distribution match well with the
Pareto Type I distribution [Goseva-Popstojanova et al.
(2006)]. So for the performance study, we estimate the
probability, = ( > ) R PX Y , when X and Y are two
independent but not identically distributed random
variables following Pareto Type I distribution, using
the maximum likelihood estimator. Extensive
simulation studies are carried out to study the
performance of the estimator. A generalized two-sided
confidence interval for R of the Pareto type I
distribution is constructed. The derived confidence
interval suits both small samples and large samples.
The average width and the coverage probability of this
confidence interval is compared with the usual
asymptotic confidence interval through simulations.
Using real data, we illustrate how R and generalized
confidence interval of R can be used for improving the
service performance of a Web server.
Keywords - Generalized confidence interval, Generalized
pivotal quantity, Heavy tail distributions, Pareto
distribution.
I. INTRODUCTION
World Wide Web (WWW), the largest
distributed system ever built have made it possible to
access vast amounts of information at the touch of a
button and has become part of the fabric of our society. Its
tremendous growth brought huge challenges to system
engineers, Web site designers, maintainers and content
producers. A clear understanding of the WWW workloads
and its characterization is fundamental to the goal of
improving Web performance.
The alarming growth of Web traffic has sparked
much research activity on improving the World Wide
Web. Though there are several studies reported in the
literature [Braun and Claffy (1994), Bestavros et al.
(1995), Arlitt and Williamson (1997), Fengbin et al.
(2007)] most studies focus on characterizing Web clients,
rather than Web servers. In our earlier work [Dais and
Sebastian (2008)], we studied the workload characteristics
of Internet Web servers, using a college Web server data.
In this paper we are considering the performance
study, especially service performance of a Web server
with due importance to user sessions. A session is defined
as a sequence of requests from the same user during a
single visit to the Web site. We can see considerable
amount of research work focussing on characterizing Web
user sessions for different purposes such as capacity
planning, finding user navigational patterns etc. in the
literature. Arlitt (2000) presented a detailed
characterization of user sessions of the 1998 World Cup
Web site and showed how these characteristics can be
utilized in improving Web server performance. Goseva-
Popstojanova et al. (2006) introduced several inter-session
and intra-session characteristics which collectively
describe session-based workload. In this work we are
concentrating on one intra-session characteristic, bytes
transferred per session. This characteristic exhibits heavy-
tailed behavior and its distribution match well with the
Pareto Type I distribution [Goseva-Popstojanova et al.
(2006)].
Generalized Confidence Interval for R = of Pareto
Distribution and Its Application in Web Performance
P(X > Y)
3
1
2
3
Dais George
1
, Pit Pichappan
2
and Sebastian George
Catholicate College, Pathanamthitta, Kerala, India
Faculty of Computer and Information Sciences, Al Imam University, Riyadh
St.Thomas College, Palai, Kerala, India
(daissaji@rediffmail.com, ppichappan@gmail.com, sthottom@gmail.com)
The Pareto distribution, is a power law
probability distribution that coincides with social,
scientific, geophysical, actuarial and many other types of
observable phenomena. The univariate Pareto distribution
is a simple model for non- negative data with a power law
probability tail. It is a useful model in the analysis of
income data, reliability studies, risk modeling and
business failure data [Lomax (1954)]. Arnold and Press
(1983) gave an extensive historical survey of its uses in
the context of income distribution. Jan Beirlant et al.
(1996), Embrechts et al. (1997), Reed (2003) and
Vandewalle et al. (2007) discuss the applications of the
Pareto distribution in various fields. The sizes of human
settlements, file size distribution of internet traffic which
uses the TCP Protocol, clusters of Bose-Einstein
condensate near absolute zero, the values of oil reserver in
oil fields, the length of distribution in jobs assigned in
super computers, the standardized price returns in
individual stocks, sizes of sand particles, sizes of
materiorites, number of species per genes, areas burn in
forest fires and severity of large casuality losses for
certain lines of business such as general liability,
commercial auto and workers compensation are examples
of random variables following the Pareto Type I
distribution.
Power laws have been discovered for Web file
sizes, Web site connectivities and the router connection
degrees. A Web file size distribution is important for Web
servers scheduling. More importantly, these discoveries
motivate us to identify the mechanisms behind the
978-1-4244-9825-3/11/$26.00 ©2011 IEEE 153