A Comparative Study of Page Ranking Algorithms for Information Retrieval Ashutosh Kumar Singh, Ravi Kumar P Abstract—This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format. Keywords—Web Mining, Web Structure, Web Graph, Link Analysis, PageRank, Weighted PageRank, HITS, DistanceRank, DirichletRank, I. INTRODUCTION HE World Wide Web (WWW) is growing tremendously on all aspects and is a massive, explosive, diverse, dynamic and mostly unstructured data repository. As on today WWW is the largest information repository for knowledge reference. There are a lot of challenges [1] in the Web: Web is huge, Web pages are semi-structured, Web information tends to be diversity in meaning, degree of quality of the information extracted and the conclusion of the knowledge from the extracted information. A Google report [5] on 25 th July 2008 says that there are 1 trillion (1,000,000,000,000) unique URLs (Universal Resource Locator) on the Web. The actual number could be more than that and Google could not index all the pages. When Google first created the index in 1998 there were 26 million pages and in 2000 Google index reached 1 billion pages. In the last 9 years, Web has grown tremendously and the usage of the web is unimaginable. So it is important to understand and analyze the underlying data structure of the Web for effective Information Retrieval. Web mining techniques along with other areas like Database (DB), Information Retrieval (IR), Natural Language Processing (NLP), Machine Learning etc. can be used to solve the above challenges. Ashutosh Kumar Singh is with the Department of Electrical and Computer Engineering, Curtin University of Technology, Miri, Sarawak, Malaysia (e- mail: ashutosh.s@curtin.edu.my). Ravi Kumar P is with the Department of Computer & I.T., Jefri Bolkiah College of Engineering, Brunei, doing his PhD at Curtin University of Technology, Miri, Malaysia (e-mail: ravi2266@gmail.com). With the rapid growth of WWW and the user’s demand on knowledge, it is becoming more difficult to manage the information on WWW and satisfy the user needs. Therefore, the users are looking for better information retrieval techniques and tools to locate, extract, filter and find the necessary information. Most of the users use information retrieval tools like search engines to find information from the WWW. There are tens and hundreds of search engines available but some are popular like Google, Yahoo, Bing etc., because of their crawling and ranking methodologies. The search engines download, index and store hundreds of millions of web pages. They answer tens of millions of queries every day. So Web mining and ranking mechanism becomes very important for effective information retrieval. The sample architecture [2] of a search engine is shown in Fig. 1. Fig. 1 Sample Architecture of a Search Engine There are 3 important components in a search engine. They are Crawler, Indexer and Ranking mechanism. The crawler is also called as a robot or spider that traverses the web and downloads the web pages. The downloaded pages are sent to an indexing module that parses the web pages and builds the index based on the keywords in those pages. An alphabetical index is generally maintained using the keywords. When a user types a query using keywords on the interface of a search engine, the query processor component match the query keywords with the index and returns the URLs of the pages to the user. But before presenting the pages to the user, a ranking mechanism is done by the search engines to present the most relevant pages at the top and less relevant ones at the bottom. It makes the search results navigation easier for the user. The ranking mechanism is explained in detail later in this paper. This paper is organized as follows. Section II provides the basic Web mining concepts and the three areas of Web mining. In this section Web Structure mining is described in detail because most of the Page Rank algorithms are based on T Web Crawler Indexer Index Query Processor Web Mining Query Interface WWW World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:3, No:4, 2009 1154 International Scholarly and Scientific Research & Innovation 3(4) 2009 ISNI:0000000091950263 Open Science Index, Computer and Information Engineering Vol:3, No:4, 2009 publications.waset.org/3153/pdf