An Alternate Downloading Methodology of Webpages Anirban Kundu 1 , Alok Ranjan Pal 1 , Tanay Sarkar 1 , Moutan Banerjee 1 , Subhendu Mandal 1 , Rana Dattagupta 2 and Debajyoti Mukhopadhyay 3 1 Netaji Subhash Engineering College (West Bengal University of Technology), West Bengal-700 152, India {anik76in, chhaandasik, tanay.sarkar, moutanbanerjee, subhendu.mndl}@gmail.com 2 Jadavpur University, West Bengal-700 032, India rdattagupta@cse.jdvu.ac.in 3 Calcutta Business School, Diamond Harbour Road, Bishnupur, West Bengal-743 503, India debajyoti.mukhopadhyay@gmail.com Abstract We propose an advanced method for downloading Web- pages from the internet. In this technique, the whole sys- tem is considered as a bundle of crawlers which have been created dynamically at execution time. Numbers of crawlers are used depending on the requirement of down- loading Webpages. The software module which interacts with WWW to search one or more Webpages is known as crawler. The numbers of crawlers are generated using the hierarchy structure of the Web server from which the data would be downloaded. Webpage downloader is an impor- tant issue for downloading Web documents from the internet to facilitate a Web user in terms of knowledge gathering. This type of downloaders are very popular in the ‘Infor- mation Technology’ field. All kinds of public data, accessi- ble throughout the world without any authentication, can be retrieved any time from any geographic location using the downloading methodology. Typically, a downloading tech- nique has been utilized to accumulate Webpages of different domains within a single computer machine one at a time. So, our aim in this paper is to show an advanced technique for downloading a lot of related Webpages with a minimum effort and time using Hierarchical Downloader consisting of several dynamic crawlers. Keyword - Multi-downloading, Hierarchical downloading. 1 Introduction In recent years, it has become important for perform downloading operation efficiently in terms of information retrieval [1] due to the enormous growth of World Wide Web (WWW). The world at present generates near about 1 to 2 exabytes of unique information each year, and also translates to about 250 megabytes for every man, woman and child on earth (an exabyte is a billion gigabytes). The World Wide Web Worm (WWWW) was one of the first Web Search Engines, and was basically a storage of huge volume of information [2]. With the advent of the WWW, users are now trying to propagate the information to a much wider audience more quickly via some medium of communica- tion. In this information & technology era, if somebody wishes to gather information, he/she can find a lot of data related to the topic through WWW from any location in the world using some Web browser [3]. Web browser helps people to reach the desired information with an ease instan- taneously over the internet. In practical scenario, a typical Web browserinvokes Webpages one at a time. One has to check all the links available on a Webpage for downloading more than one Webpage for collecting overall information on a particular topic [4]. For example, if the user wishes to read a tutorial of a specific subject, all the hyperlinks should be checked in a trial and error basis. So, all of the information is specified within a Webpage in terms of their URLs [5]. A typical Webpage consists of a set of URLs that possibly contain the sought information. So, it will take a longer time to retrieve complete information using the avail- able methods. 2008 Seventh Mexican International Conference on Artificial Intelligence 978-0-7695-3441-1/08 $25.00 © 2008 IEEE DOI 10.1109/MICAI.2008.13 393 Authorized licensed use limited to: Haldia Inst of Tech. Downloaded on December 27, 2008 at 07:54 from IEEE Xplore. Restrictions apply.