International Journal of Electrical and Computer Engineering (IJECE) Vol. 6, No. 3, June 2016, pp. 995 ~ 1001 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i3.9878 995 Journal homepage: http://iaesjournal.com/online/index.php/IJECE Automatic Detection of Illegitimate Websites with Mutual Clustering K. Kanaka Durga, V. Rama Krishna Dept of CSE, K L University, Guntur, AP Article Info ABSTRACT Article history: Received Jan 6, 2016 Revised Mar 10, 2016 Accepted Mar 25, 2016 In the websites the contents will be are similarity when we compared with other search engines. So to check the similar content in the websites and its web contents we created a overhead to the search engine which will severely effect its performance & quality. So to detect the silmilar or same content or web documenattion some techniques are implemented by web crawling research community. So it is one of major factor for the search engines to provide some applicatory data to users in the first page itself. So to avoid such issues we proposed a methodlogy called Automatic Detection of illegitimate websites with Mutual Clustering (ADIWMC) paper we are presenting a peculiar and efficacious path for the detection of similarities in the web pages in web clustering. Detection of same and similar web pages and web content will be done by storing the crawled web pages into depository. Initially the adwords will be extracted from the crawled pages and similarity checking will be done between the two pages based in the usage of adwords. So a threshold value is set for this, if the similarity checking percentage is greater than the threshold then similarity content is reduced and improves the depositary and improves the search engine quality. In the sections of existing analysis and the proposed analysis we are clearly exploring how it works. Keyword: Illegitimate Mutual Clustering Phising Web Crawled Copyright © 2016 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: K. Kanaka Durga, M. Tech Student, Dept of CSE, K L University, Vaddeswaram 522502, Guntur District, Andhra Pradesh, India. 1. INTRODUCTION Large-scale and targeted attacks: Cybercriminals have to cheat online strategy adopted two familiar consumer. Many scams are designed for large-scale success [1]. Phasing scams posing as banks and online service providers by the thousand ray’s million spam messages fail fraction of users to a fake website penal control [2]. In fact, many thieves are working somewhere in between, faithfully reproduce the logic of fraud, without hardware to reproduce from previous versions of the attack. Thus, criminals engaged in advanced banking fraud cost places exist for banks with online banking, which the victim has access to the inspection of their 'deposits'. When a false bank is off, the criminals a new optimized from the old site. Criminals have the fake escrow services as part of an advanced higher tax fraud. On the surface, escrow sites seem different, but often share similarities in the text or HTML structure leg. Yet another example is online Ponzi 'high yield investment programs (HYIP). The programs offer investors the extravagant interest, which means that inevitably collapse when dry attract new deposits. The authors are behind the scenes as the creation of new programs that often share similarities with previous versions [3]. The designers of these scams have a strong incentive to distinguish from the old to keep their new copies. Potential victims may be afraid when they realize that an earlier version of this site, "reported as fraudulent. So, criminals a concerted effort to distinguish new copies of old.