International Journal of Computer Applications (0975 8887) Volume 67No.14, April 2013 44 A Parametric Layered Approach to Perform Web Page Ranking Ratika Goel Dept. of Computer Science Amity School of Engineering and Technology Amity University, India ABSTRACT Web crawling is the foremost step to perform the effective and efficient web content search so that the user will get the specific web pages initially in an indexed form. Web crawling is not only used for searching a webpage over the web but also to order them according to user interest. There are number of available search engines and the crawlers that accept the user query and provide the page search. But, there is still the requirement and scope to improve the search mechanism. In this present work, dynamic and user interest evolution based parametric approach is defined to perform the web crawling and to arrange the web pages in more definite way. In this work a layered approach is defined, in which the initial indexing will be performed based on the keyword oriented content match and later on the indexing will be modified based on user recommendation. The presented work will provide an recommendation based web page indexing so that effective web crawling will be performed. Keywords Crawling, Indexing, Recommender system 1. INTRODUCTION A web crawler is the heart of search engine that work actively as the central part of the search engine. A crawler actually performs the web search in a fraction to perform the related content search. The efficiency and the reliability of a search engine actually depend on the efficiency vector of a web crawler. A search engine actually passes the user query to the web crawler and the crawler search the information over the public webpages. The work of web crawler is to process this query and identify the keywords from the query. By using these keywords the page search will be performed over the web. As we can see in figure 1, the basic architecture of the web crawler is presented. The crawler used a queue and a scheduler to perform the dynamic web search evaluation and fix storage for the statistical analysis. There are number of algorithmic approaches provided by different authors to process on user web request. The main challenges faced by any algorithmic approach is given as under A) Scale The web is very huge and frequently evolving. Crawlers that search for broad coverage and good newness must achieve extremely high throughput, which poses many difficult engineering problems. Present search engines running the multiple servers simultaneously to handle maximum number of user request without the service delay. B) Content Specific Issues It is not possible for any crawler to perform content check on Anchal Garg Dept. of Computer Science Amity School of Engineering and Technology Amity University, India Figure 1: Web crawler architecture all sites over the web. The another criticality is the updation of the web contents regularly. Because of this the selective crawling is required. Because of this content based search is performed over the web. C) Social Obligations Crawlers does not added much extra load on a website while performing the crawling. They uses the safety mechanism so that the high throughput from the crawler will be obtained. D) Adversaries The search engines also filter the web contents before presenting them. The filtration is performed respective the duplicate contents and the relevancy to the user requirements. 1.1 DATA DETECTION APPROACHES The main objective of the web crawler is to identify the data over the web based on user query filtration as well as user query analysis. Some of such approaches are discussed in this section A) Copy Detection Approach In such approach, the documents are being searched over the web based on the perfect match. These kinds of approaches are being used by many plagiarism detection software systems. They basically perform the data check based on the different coping detection forms such as sentence based check, word based check, paragraph based check etc. Some of the crawler also performs the word substitution as well as the partial WWW Downloader Scheduler User Query Database Web Page Text & Meta data URL