1414 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 20, NO. 7, SEPTEMBER 2002 WebGraph: A Framework for Managing and Improving Performance of Dynamic Web Content Prasant Mohapatra, Senior Member, IEEE, and Huamin Chen, Student Member, IEEE Abstract—The proportion of dynamic objects has been growing at a fast rate in the World Wide Web. In the e-commerce environ- ment, these objects form the core of all web transactions. However, because of additional resource requirements and the changing na- ture of these objects, the performance of accessing dynamic Web contents has been observed to be poor in the current generation Web services. We propose a framework called WebGraph that helps in improving the response time for accessing dynamic objects. The WebGraph framework manages a graph for each of the Web pages. The nodes of the graph represent weblets, which are components of the Web pages that either stay static or change simultaneously. The edges of the graph define the inclusiveness of the weblets. Both the nodes and the edges have attributes that are used in managing the Web pages. Instead of recomputing and recreating the entire page, the node and edge attributes are used to update a subset of the weblets are then integrated to form the entire page. In addition to the performance benefits in terms of lower response time, the WebGraph framework facilitates Web caching, quality-of-service (QoS) support, load balancing, overload control, personalized ser- vices, and security for both dynamic as well as static Web pages. A detailed implementation methodology for the proposed framework is also described. We have implemented the WebGraph framework in an experimental setup and have measured the performance im- provement in terms of server response time, throughput, and con- nection rate. The results demonstrate the feasibility and validates a subset of the advantages of the proposed framework. Index Terms—Dynamic content caching, Internet, quality-of- service, Web servers, WebGraph, weblet, World Wide Web. I. INTRODUCTION P ERFORMANCE and management of Web servers has been a very active area of research in recent years. Several techniques have been proposed for improving the performance and management of Web services. Some of the most common techniques that have been proposed include mirroring, caching Web contents at proxy servers, and distributed server farms with load balancers. These approaches are effective for Web sites that have predominantly static Web contents. However, all of these techniques are limited in terms of handling dynamic requests, scalability, overload, personalized services, and quality-of-service (QoS) assurances. With the increase in Web usage and applications, several chal- lenges are being faced in the Web environment. The proportion of dynamic Web contents have been increasing in most Web Manuscript received April 30, 2001; revised January 10, 2002. This work was supported in part by the National Science Foundation under Grant CCR- 9988179. The authors are with the Department of Computer Science, University of California, Davis, CA 95616 USA (e-mail: prasant@cs.ucdavis.edu; chenhua@ cs.ucdavis.edu). Publisher Item Identifier 10.1109/JSAC.2002.802072. sites. E-commerce has been a major business model in the ex- panding economy. In these environment, almost all of the Web pages are dynamic in nature. Dynamic contents change with re- spect to time and events in varying granularity or with respect to the nature of queries. Processing of dynamic requests is usually compute-intensive and could be network intensive if it needs accesses to back-end servers like databases, application servers, etc. Because of the changing nature of dynamic Web objects, they are usually not cached and, thus, do not exploit the caching advantages of proxy servers. Since most dynamic requests need the access of back-end servers, mirroring and load distribution techniques does little good for these type of requests. The current generation Web service used in the e-commerce environment suffers from several serious problems. Because of the presence of a high proportion of dynamic contents, the re- sponse time for accessing these sites has been very poor [6]. Both network and server contribute to the response delay. Poor response delay lead to significant revenue losses in e-commerce environments. The revenue loss in 1998 was estimated to be 1.9 billion dollars owing to long response delays [38]. In ad- dition, overload conditions have serious impositions on the per- formance of Web servers. Overload situations can degrade the server performance drastically, cause denial of services, and in some cases, crash the server. These situations arise because of the inherent nature of Internet traffic that includes unpre- dictability, burstiness, and short-term cyclic shifts (e.g., hourly, daily, and seasonal variations, etc.). Applications with higher variability in resource requirements and lower delay tolerance are dominant contributors to the overload situations. Web envi- ronment with more of dynamic and multimedia components cer- tainly add to the server load. Furthermore, QoS support through service differentiation and personalized services are highly de- sirable features in most Web sites, especially the ones used in commercial environments. An examination of several dynamic pages on the Web re- vealed that in most cases, only a part or a few parts of the pages are dynamic in nature. Other portions of these pages con- stitute static images or text. However, for every access of the dynamic pages, the entire page gets constructed and rendered to the clients or the proxies. Thus, the static characteristics of these pages are not being exploited in the current model of Web access. Our initial motivation was to exploit this nature of the dynamic pages. Thus, we developed a framework, called WebGraph that uses a graphical representation of Web pages to serve dynamic pages very efficiently. Parts of the Web page that the same attributes are called weblets. The weblets can be static or dynamic and are used to construct and reconstruct the Web pages. In addition, we have also enriched the framework such that it can be used for other important attributes such as overload 0733-8716/02$17.00 © 2002 IEEE