Clustering of Web Search Results using Suffix Tree Algorithm and Avoidance of Repetition of same Images in Search Results using L-Point Comparison Algorithm Manne suneetha Assistant Professor, Department of Information Technology Velagapudi Ramakrishna Siddhartha Engineering College Vijayawada, Andhra Pradesh, India manne_suni@vrsiddhartha.ac.in Dr. S Sameen Fatima Professor, Dept. of Computer Science and Engg. University College of Engineering, Osmania University Hyderabad, Andhra Pradesh, India sameenf@gmail.com Shaik Mohd. Zaheer Pervez 4/4B.Tech., Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh zaheerimpeccable@gmail.com Abstract—It is a common experience to the web users with the existing search engines like Google, Yahoo, MSN, Ask, e.t.c., that the information related to the entered query returns a long ranked list of results (snippets).It becomes cumbersome to the user to go through each title, snippet and even sometimes link of the search results until relevant results are found to the query. Clustering of search results is a special technique in data mining using which the retrieved results are organized into meaningful groups enlightening the user work. This paper deals with the generalized Suffix tree based clustering approach. The most repeated phrase in the document tags is considered as cluster name. Thus in short, web search results that are fetched from the prevailing web search engines grouped under phrases that contain one or more search keywords. This paper aims at organizing web search results into clusters facilitating quick browsing options to the browser providing an excellent interface to results precisely. Suffix tree clustering produces comparatively more accurate and informative grouped results. A basic problem during image searching in any search engine is Image Repetition. This can be avoided by using the L-Point Comparison algorithm, a specially worked out technique in field of Information Retrieval systems, is also discussed with a practical example. Keywords- Coherent clustering, Cleaning of Document, Suffix Tree Based Clustering (STBC), L-point image Comparison (LPC), Shared phrase I. INTRODUCTION Internet is undoubtedly the fastest and easiest mode of access for unlimited resources of information. But the same reason is disabling the increasing efficiency of accessing Information. It is aptly said that Internet is an unorganized, unstructured and decentralized place of accessing Information [4].As the web pages are increasing in billions since times forth, the scientists realized that maintaining web directories are particularly beneficial to users who are not familiar with the topics and their relations. Yahoo was the first service to provide the most complex human made directory of the Web in the year 2001. However, some results show cross links with related topics and they do not show the relations between topics at the same level, rather the topics are sorted alphabetically or by popularity. However due to the rapidly growing and unstable characteristic of the web, such directories often point to outdated ,even not existing documents. Querying large amount of Search results into groups of similar data (web directories) is becoming one of the most complex applications of emerging Web applications. It is not only posing challenges in the field of Data mining but also in the areas of Information Retrieval Systems and in Data warehousing. The term clustering deals with grouping the number of similar kinds of data in respect to related phrases. The search result clustering mechanisms that have been investigated has seriously confronted with the drawbacks like clustering labels screening, cluster quality assessment and overlapping clusters controlling. It has been a furiously investigating topic for the developers to check it out which is the best Clustering algorithm opting with reference to less time complexity and Multilingual Clustering features. Web search result clustering based on suffix tree clustering algorithm is a promising approach to work on a long list of snippets returned by search engines. The original STC algorithm can often construct a long path of suffix tree, particularly when the same snippets are feed to the STC algorithm [5]. The modeling and analysis section throws light on the structure and designing aspects of suffix tree while the PROCEEDINGS OF ICETECT 2011 978-1-4244-7925-2/11/$26.00 ©2011 IEEE 1041