Comparative Study of Different Web Mining Algorithms to Discover Knowledge on the Web Mohd Shoaib 1,∗ and Ashish K. Maurya 1 PG Student Faculty of Computer Science and Engineering, Shri Ramswaroop Memorial University, Uttar Pradesh 225 053, India. e-mail: md.shoaibs@gmail.com, er.ashishmaurya@gmail.com Abstract. Nowadays the World Wide Web (commonly called as Web) is used widely and it has impacted on almost every facet of our lives. To search and retrieve the information from the web requires an effective and efﬁcient technique as it has become a challenge due to expanding size and complexity of web. Web Mining tackles this problem by gathering useful information from web by using its three categories web structure mining, web content mining and web usage mining. In this paper discussion is done by explaining the area of Web Mining, its categories and algorithms associated with it. The algorithms discussed are PageRank, SimRank, TF-IDF, k- nearest neighbour, PageGather and CDL4. Then we summarize the algorithms over parameters such as its working, input parameters, complexity and their pros and cons. Also we analyze discussed algorithms over the parameters: relevance, their technique and regression analysis. Keywords: Web mining, Web structure mining, Web content mining, Web usage mining, PageRank, SimRank, TF-IDF, kNN, PageGather, CDL4. 1. Introduction Web Mining is a technique of data mining, commonly deﬁned as the process of discovering useful patterns, knowledge or information from sources of data, in form of texts, images, databases, multimedia, etc from the Web. The patterns must be suitable, convincing, potentially useful, and understandable. Web mining is a multi-disciplinary area which combines various ﬁelds such as artiﬁcial intelligence, information retrieval machine learning, statistics, databases and visualization [1]. The web mining can be decomposed into subtasks [2]: • Resource ﬁnding • Information selection and pre-processing • Generalization • Analysis Thus, the aim of web mining is to determine useful knowledge or information from the usage data, page content and web hyperlink structure. Although Web mining uses various techniques of data mining but still it is not completely applied usage of traditional data mining due to the multifarious and semi-structured or unstructured nature of the data on web. A variety of new mining tasks and algorithms were invented in past, based on the basic types of data used in the mining process, Web mining tasks can be categorized into three types [3–5]: Web content mining, Web structure mining and Web usage mining as shown in ﬁgure 1. 2. Web Structure Mining Web structure mining determines useful knowledge from hyperlinks or links, which represent the structure of the Web. For example, from the links, it can be determined about important Web pages which is a main technology used in search engines. It can also be used to discover about communities of users who share common interests. ∗ Corresponding author 648  Elsevier Publications 2014.