A New Model for Web Pre-fetching Using
Popularity-Based Prediction
Dr. Neena Gupta Mahesh Manchanda
Department of Computer Science Department of Computer Application
Kanya Gurukul Mahavidyalaya, Graphic Era University
Dehradun (Uttrakhand) – India Dehradun (Uttrakhand)-India
neena71@hotmail.com manchandamahesh@rediffmail.com
Abstract--Due to the fast development of internet services and a
huge amount of network traffic, it is becoming an essential issue
to reduce World Wide Web user-perceived latency. Although
web performance is improved by caching, the benefit of caches is
limited. To further reduce the retrieval latency, web prefetching
becomes an attractive solution to this problem. In this work, we
introduce a simple and transparent popularity-based prefetching
algorithm which combines both the top 10 and next-n prefetching
approaches. In addition to using access-frequency as the criteria
for prefetching, we also use the time of access of web documents
to generate the top 10 list. This approach of using access-
frequency and time of access is known as the GDSP approach,
which has been used in cache management. Instead of generating
next-n list for all the documents accessed by the users, we log the
next-n documents for the top 10 documents only, thus reducing
complexity and overhead. The results obtained from algorithm,
in terms of hit rate and prefetching effectiveness show the
efficacy of our proposed algorithm as compared to other
approaches.
Keywords : GDSP, Predictive Pre-fetching, Access Latency.
I. INTRODUCTION
The increase in demand for WWW resources has
exacerbated the response time as perceived by users in
retrieving web objects, also referred to as access latency. The
World Wide Web became the “World Wide Wait”. The users
started experiencing long delays for their requests over the
web. The growth of Internet and the increase in number of
clients using the Internet have challenged researchers to
improve web performance. Even with the availability of high
bandwidth Internet connections, fast processors and large
amount of storage, the access latency problem has remained a
challenge for researchers.
One solution to improving performance of the web or
reducing the user perceived latency is to use web caching,
which provides for local storage and management of
previously accessed web documents. A request for a document
already present in the web cache can be directly serviced
instead of being forwarded to the intended web server. This
reduces the latency experienced by the clients as well as the
Internet traffic. Web caching can be done at different locations
in the Internet architecture. Depending on the location of
cache, it can be classified into three categories: client cache,
proxy cache and server cache. A proxy server, also referred to
as proxy, is in the middle tier of the network architecture. It
acts as an intermediary between clients and actual web
servers.
However, web caching has limitations that can be
overcome by using other techniques that work with web
caching. Abrams [1] et al. has shown that the maximum
achievable hit rate of caching proxies is 30% to 50%. The
caching cannot prove beneficial if the web pages were not
visited in the past. To improve cache performance, researches
have introduced web prefetching to work in conjunction with
web caching, which means prefetching web documents from
web servers, even before the user requests them. Prefetching
techniques rely on predictive approaches to speculatively
retrieve and store web objects into the cache for future use.
Predictions on what to prefetch are made based on different
criteria such as history, popularity and content. Much work
has already been done in the field of prefetching [2, 3], which
has shown to effectively reduce web latencies by utilizing the
user idle time. The idle time is the time elapsed between the
current request and the next request of the user. Web
prefetching is based on the similar idea of prefetching used in
memory management of computer machines. Although
analogous, the techniques applied in web prefetching are very
different from the ones applied in memory management of
computer hardware.
Researchers take advantage of the facts that the loading and
displaying of pages requires a few seconds and that there is a
substantial time gap between two consecutive requests from
the user in different ways. Thus, different strategies need to be
developed for web prefetching.
Algorithms developed for web prefetching are prominently
categorized into three categories:
• Popularity-based algorithms make predictions based
on the popularity of the web pages.
• Structure-based algorithms make use of the
Annual International Conference on Advances in Distributed and Parallel Computing (ADPC 2010)
Copyright © GSTF 2010
ISBN: 978-981-08-7656-2
doi:10.5176/978-981-08-7656-2 A-43
R-39