A New Model for Web Pre-fetching Using Popularity-Based Prediction Dr. Neena Gupta Mahesh Manchanda Department of Computer Science Department of Computer Application Kanya Gurukul Mahavidyalaya, Graphic Era University Dehradun (Uttrakhand) – India Dehradun (Uttrakhand)-India neena71@hotmail.com manchandamahesh@rediffmail.com Abstract--Due to the fast development of internet services and a huge amount of network traffic, it is becoming an essential issue to reduce World Wide Web user-perceived latency. Although web performance is improved by caching, the benefit of caches is limited. To further reduce the retrieval latency, web prefetching becomes an attractive solution to this problem. In this work, we introduce a simple and transparent popularity-based prefetching algorithm which combines both the top 10 and next-n prefetching approaches. In addition to using access-frequency as the criteria for prefetching, we also use the time of access of web documents to generate the top 10 list. This approach of using access- frequency and time of access is known as the GDSP approach, which has been used in cache management. Instead of generating next-n list for all the documents accessed by the users, we log the next-n documents for the top 10 documents only, thus reducing complexity and overhead. The results obtained from algorithm, in terms of hit rate and prefetching effectiveness show the efficacy of our proposed algorithm as compared to other approaches. Keywords : GDSP, Predictive Pre-fetching, Access Latency. I. INTRODUCTION The increase in demand for WWW resources has exacerbated the response time as perceived by users in retrieving web objects, also referred to as access latency. The World Wide Web became the “World Wide Wait”. The users started experiencing long delays for their requests over the web. The growth of Internet and the increase in number of clients using the Internet have challenged researchers to improve web performance. Even with the availability of high bandwidth Internet connections, fast processors and large amount of storage, the access latency problem has remained a challenge for researchers. One solution to improving performance of the web or reducing the user perceived latency is to use web caching, which provides for local storage and management of previously accessed web documents. A request for a document already present in the web cache can be directly serviced instead of being forwarded to the intended web server. This reduces the latency experienced by the clients as well as the Internet traffic. Web caching can be done at different locations in the Internet architecture. Depending on the location of cache, it can be classified into three categories: client cache, proxy cache and server cache. A proxy server, also referred to as proxy, is in the middle tier of the network architecture. It acts as an intermediary between clients and actual web servers. However, web caching has limitations that can be overcome by using other techniques that work with web caching. Abrams [1] et al. has shown that the maximum achievable hit rate of caching proxies is 30% to 50%. The caching cannot prove beneficial if the web pages were not visited in the past. To improve cache performance, researches have introduced web prefetching to work in conjunction with web caching, which means prefetching web documents from web servers, even before the user requests them. Prefetching techniques rely on predictive approaches to speculatively retrieve and store web objects into the cache for future use. Predictions on what to prefetch are made based on different criteria such as history, popularity and content. Much work has already been done in the field of prefetching [2, 3], which has shown to effectively reduce web latencies by utilizing the user idle time. The idle time is the time elapsed between the current request and the next request of the user. Web prefetching is based on the similar idea of prefetching used in memory management of computer machines. Although analogous, the techniques applied in web prefetching are very different from the ones applied in memory management of computer hardware. Researchers take advantage of the facts that the loading and displaying of pages requires a few seconds and that there is a substantial time gap between two consecutive requests from the user in different ways. Thus, different strategies need to be developed for web prefetching. Algorithms developed for web prefetching are prominently categorized into three categories: • Popularity-based algorithms make predictions based on the popularity of the web pages. • Structure-based algorithms make use of the Annual International Conference on Advances in Distributed and Parallel Computing (ADPC 2010) Copyright © GSTF 2010 ISBN: 978-981-08-7656-2 doi:10.5176/978-981-08-7656-2 A-43 R-39