TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A. Padmapriya Associate Professor, Department of Computer Science, Alagappa University, Karaikudi. mailtopadhu@yahoo.co.in Abstract - The World Wide Web has been loaded with enormous amount of data in the recent years. Web Mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data and hence it has gained a significant importance. Information filtering (IF) has recently emerged as a technique for effective delivery of the required and also relevant information. Most of the information filtering in web is being done with the help of search engines which provide the user with large number of documents matching the search query. The users have to filter out the relevant documents from the search results, which is a cumbersome process. Various methods have been developed for information filtering in search engines using ontology. The proposed system retrieves the results from the search engine and ranks them in the decreasing order of relevance. The relevance is determined by the occurrence of the search query terms or their corresponding synonyms in the HTML source code of the webpage. Different weights are assigned to the matching terms in URL of the web page and different tags such as title, meta tags, headings and image tags. Weights are also given to the occurrence of the keywords/synonyms in the body of the code as free text and also their nearness. A relevance score is calculated for each of the web pages based on the weights which determine its position in the search results. The experimental results are analyzed with the popular search engine Google. The proposed method filters more relevant pages of the search results at the top of the list thereby helping the users in finding their information need in a more easy way. Keywords : Information Filtering, Web mining, Search engine, Relevance score 1. INTRODUCTION The World Wide Web has been loaded with enormous amount of data in the recent years. So finding useful information from such large data repository has become more and more difficult with such huge increase in data. Web Mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data and hence it has gained a significant importance. Mining web has been difficult due to the following reasons. 1. Vast size of the web 2. Diverse nature of the web since it consists of text, images, video, audio and other multimedia content. 3. Exponential increase in size of web due to addition of new information. 4. Dynamic nature of web since the most of the web content is modified frequently. 5. Duplicate content in the web. 6. Hyperlinks to other web documents. 7. Unstructured or semi-structured nature of information. Web Mining is classified into three types namely Web Content Mining Web Usage Mining Web Structure Mining ISSN (Print) : 2319-8613 ISSN (Online) : 0975-4024 Mu. Annalakshmi et al. / International Journal of Engineering and Technology (IJET) DOI: 10.21817/ijet/2018/v10i1/181001302 Vol 10 No 1 Feb-Mar 2018 23