International Journal of Computer Applications (0975 8887) Volume ICACEA - No. 2, 2014 A Complete Survey on Web Document Ranking Shashank Gugnani BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 Tushar Bihany BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 Rajendra Kumar Roul BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 ABSTRACT Today, web plays a critical role in human life and also simpliﬁes the same to a great extent. However, due to the towering increase in the number of web pages, the challenge of providing quality and relevant information to the users also needs to be addressed. Thus, search engines need to implement such algorithms which spans the pages as per user’s interest and satisfaction and rank them accordingly. The concept of web mining tremendously assists in the mentioned scenario. Web mining helps in retrieving potentially useful information and patterns from web. This paper includes different Page Ranking algorithms and compares those algorithms used for Information Retrieval. Additionally it also presents some interesting facts about research in page ranking to ﬁnd further scope of research in this area. General Terms: web document ranking, page rank Keywords: web structure mining, web content mining, web usage mining, document ranking 1. INTRODUCTION With the size of the World Wide Web increasing at an exponential rate, it is becoming increasingly difﬁcult to ﬁnd relevant information. This main task of a search engine is to reduce this difﬁculty. It is the duty of a search engine to provide relevant information to the user on receiving a query. However, considering the size of the World Wide Web, a typical query might give more than a million results. The user does not have the time or patience to go through this huge list. Thus, ranking of web documents becomes a critical component of a search engine. Search Engines constantly need to ﬁnd better and more efﬁcient ranking methods, which can return high quality information to the user in as small a time frame as possible. Search engines ﬁrst create an index of all the web documents and store it on the server. After the user submits a query, the query is given to the index, which returns the documents containing the words in the query. Then, the returned documents are sent to a ranking function which gives a rank to each document and the top-k documents are returned to the user. Figure 1 shows the working of a typical search engine. Web Mining is the task of extracting useful information from web documents. Web Mining comprises of three types: Web Structure Mining (WSM), Web Content Mining (WCM), and Web Usage Fig. 1: Working of a Search Engine Mining (WUM). Web Structure Mining uses the structure of the web, i.e. the hyperlinks between the web pages, Web Content Mining uses the content of the web documents and Web Usage Mining uses user click through data available in server logs. Every ranking algorithm employs a combination of one or more of these three types of Web Mining. The purpose of this paper is to list the important page ranking algorithms developed till date and compare their strengths, weaknesses, run time and efﬁciency so as to help in further research in this ﬁeld. In addition, the page ranking algorithms have been compared according to 3 evaluation measures. Also, we have presented a summary of research work in ranking over the years and done an analysis of the same. The rest of this paper is organized as follows. Section 2 enlists a summary of ranking algorithms arranged in ascending order of year to trace the development of ranking algorithms. Section 3 compares various ranking algorithms on a number of factors such as methodology, type of web mining, quality of results, etc. Section 4 compares the algorithms on the quality of their results based on three evaluation measures (NDCG, P @n and MAP ). Section 5 presents some interesting facts about research work in web document ranking and ﬁnally, Section 6 concludes the paper. 1