International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 7 Issue: 2 01 - 04 ______________________________________________________________________________________ 1 IJRITCC | February 2019, Available @ http://www.ijritcc.org _______________________________________________________________________________________ A Novel Approach of Development of Web Pattern by Focusing on Web Structure Mining Techniques Prabhat Kumar Bharti* (Asst. Prof.) Dept. of Computer Science and Engineering Goel Institute of Technology and Management Lucknow, India e-mail: dept.csprabhat@gmail.com Deena Nath (Asst. Prof.) Dept. of Computer Science and Engineering Goel Institute of Technology and Management Lucknow, India e-mail: sonu.deena19@gmail.com Vandana Yadav (Asst. Prof.) Goel Institute of Technology and Management Lucknow, India e-mail:- Vandanayadav771@gmail.com Abstract—The World Wide Web is a very useful and interactive resource of information like hypertext, multimedia etc. When we search any information on the Google, there are many URL’s has been opened. The bulk amount of information becomes very difficult for th e users to find, extract and filter the relevant information, so that some techniques are used to solve these problems. The objective of current manuscript is focus on processing of structured and unstructured data mining. With the tremendous growth in website, web portal to provide downloaded data to the user. The semantic web is about machine-understandable web pages to make the web more intelligent and able to provide useful services to the users. The data structure definition and recognition is to estimate the accurate page ranking and to produce better result while searching operation with web data. Keywords-Web Structure, Weighted Page Rank, Topic Sensitive Page Rank and TC-Page Rank, Hypertext Induced Topic Search. __________________________________________________*****_________________________________________________ I. INTRODUCTION Web structure Mining concentrates on link structure of the web site. The different web pages are linked in some fashion. The potential correlation among web pages makes the web site design efficient. This process assists in discovering and modeling the link structure of the web site. Generally, topology of the web site is used for this purpose. The linking of web pages in the Web site is challenge for Web Structure Mining. The structure of the web page is as shown below. <html> <a href= “filename”>link</a> </html> The WWW is a collection of various hyperlinked pages. The analysis of these linked pages is of very high importance. In addition to the text contents of a page, the link structure of such pages should be observed while searching for a particular resource. Consider the significance of a link A B: With such a link A recommends, that surfers visiting A follow the link and visit B. This may reflect the fact that pages A and B share a common topic of interest, and that the author of A thinks contents of page B. These links are called an informative link [1]. There are a number of link structure algorithms. In this paper I have given just an introduction of three algorithms namely PageRank, Weighted PageRank and HITS. The Page Rank method is used by the Google Web search engine to compute the importance of Web pages. The interpretation of the Page Rank method can be seen in two different view [2] and values (a) Stochastic or random surfer method in this method the Page Rank values can be viewed as the steady-state distribution of a Markov chain, and (b) Algebraic method where the Page Rank values taken as the eigenvector corresponding to Eigen value of the Link structure matrix. The quality of the search results has been immensely improved by analyzing link structure of webpages [3]. The search engine Google uses an iterative algorithm that determines the importance of web pages based on the importance of its parent pages. The logs include information about the referring pages, user identification, time a user spend at a site and the sequence of pages visited. There are number of algorithms proposed based on link analysis. But none of these algorithms talks about the temporal interest shown by a user. As the page becomes older, it earns more links. The analysis of the link structure shows about the user behavior and the amount of time spent by a user on a specific web page. This temporal interest shown by a user which changes according to the time spent by a user affects the importance of a web page. In the present paper, a different interpretation of Page Rank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method for a particular time.