IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 5, Issue 4 (Sep-Oct. 2012), PP 31-36 www.iosrjournals.org www.iosrjournals.org 31 | Page Literature Survey on Web Mining Geeta R. Bharamagoudar 1 , Shashikumar G.Totad 2 , Prasad Reddy PVGD 3 1 (Associate Professor of Information Technology, GMRIT RAJAM, Andhra Pradesh, India) 2 (Professor, Department of Computer Science and Engineering, GMRIT RAJAM, Andhra Pradesh, India) 3 (Professor, Department of CS & SE, Andhra University Vizag, India) Abstract : Web Mining is the process of retrieving non-trivial and potentially useful information or patterns from web. Web mining is universal set of Web Structure Mining, Web Usage Mining and Web content Mining. This paper describes and compares these three categories. It also provides comparative statements of various page ranking algorithms with link editing, General Utility Mining and Topological frequency Utility Mining Model by taking constraints such as Web Mining activity, topology, Process, Weighting factor, Time complexity, and Limitations etc.. This also helps in comparing WPs-Tree and WPs-Itree structures. Keywords: Log File, Page Rank, Topology, Web Structure Mining, Web Usage Mining I. INTRODUCTION World Wide Web or Web is the largest and popular source that is easily available, reachable and accessible at low cost, provides quick response to the users and reduces burden on the users of physical movements. The population of web is gigantic. It is made up of billions of interconnected hypertext documents/ web pages which are designed by millions of designers. Ted Nelson conceived the idea of hypertext in 1965. Web supports hypermedia documents. Hypermedia documents are the documents which contain in addition to text, image, audio and video files. The Web page is a page with Hypertext Markup Language (HTML) tags. A web site is a collection of several interrelated or intra related web pages. Web site designer can link one web page with other web page in the web with the help of links called hyperlinks. These links are used to connect one hypertext document with other hypertext document. It resembles a virtual society. It follows Client/Server Model. The client acts as a service consumer. The Server acts as a service provider. The interaction between client and server is using Hypertext Transfer Protocol (HTTP). The client can navigate through the web by means of client program called browser, e.g Internet Explorer, Google Chrome, Netscape Navigator, Mozilla etc. The client will send request to the server through the browser. The request is analyzed by the server and if the requested information is available at the server side, the information is delivered to the client. The request will be in the form of Universal Resource Locator (URL), which specifies resource available on the web universally. Tim Berners-Lee invented the Web in 1989 at CERN in Switzerland. The term World Wide Web is coined by him and the first World Wide Web server, httpd, and the first client program (a browser and editor) “WorldWideWeb” is written by him. With vast amount information being shared worldwide, there was a requisite to find required information in a systematic and effective way. Search Engines came into existence. Six Stanford students introduced the search system Excite in 1993. MCC Research Consortium at University of Texas established EINet Galaxy in 1994. Yahoo was produced by Jerry Yang and David Filo in 1994 and offered directory search containing favorite websites. In consequent years, many search engines developed, e.g. Lycos, Inforseek, AltaVista, Inktomi, Ask Jeeves, Northernlight. etc.. Sergery Brin and Larry at Stanford University coined Google in 1998. MSN Search engine was tossed by Microsoft in spring 2005. MIT and CERN took the lead in the formation of W3C (The World Wide Web) consortium in December 1994. The main goal of W3C was to encourage standards for the progression of the Web and allow interoperability between WWW products by producing provisions and reference software. The First International Conference on WWW was held on 1994 [21]. Many trades started on the web and turn out to be more coherent. Data Mining is also referred as knowledge discovery in databases (KDD). It is a process of discovering useful patterns or knowledge from data sources. It is a multidisciplinary field involving artificial intelligence, statistics, information retrieval, statistics and visualization. Web mining is a process of discovering useful and intelligent information or knowledge from the web site topology, web page content and web usage data. This paper is organized as follows. Section I describes related work. Section II compares Web Mining Categories i.e Web Structure Mining, Web Content Mining and Web usage Mining. Section III describes Page ranking algorithms along with Link Editing, General Utility Mining and Topological Frequency Utility Mining Model algorithms. Section IV describes WPs-Tree, and WPs-Itree algorithms. Section V draws conclusions and presents future developments of the proposed approach.