International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 351 ISSN 2229-5518 IJSER © 2015 http://www.ijser.org A Survey of different methods of clustering for anomaly detection Sarita Tripathy,Prof(Dr.)Laxman Sahoo Abstract - Anomaly detection is the process of identifying unusual behavior. It is widely used in data mining, for example, to identify fraud, customer behavioral change, and manufacturing flaws, data mining techniques make it possible to search large amounts of data for characteristic rules and patterns .With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing and because of the base-rate fallacy the amount the false alarms will also increase. Another problem with detection of attacks is that they usually aren’t detected until after the attack has taken information. Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. Clustering is now the most widely used technique for intrusion detection. Index Terms: Anomaly detection, Unsupervised learning,K-means Clustering, Fuzzy C-means clustering, Genetic Algorithm,Non negative matrix factorization,Principal component analysis,Coclustering,ID3 decision tree,Hierarchical based clustering. 1 INTRODUCTION Anomaly detection refers to the problem of finding patterns in data that do not confirm to expected behavior. These non-conforming patterns are often referred to as anomalies, outliers, discordant observations, expectations, aberrations, surprises, peculiarities or contaminants in different application domains. Of these, anomalies and outliers are two terms used most commonly in the context of anomaly detection. Anomaly detection finds extensive use in wide variety of application domains, for example, an anomalous traffic pattern in computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination. An anomalous MRI image may indicate credit card or identity theft or anomalous readings from a space craft sensor could signify a fault in some component of the space craft. Detecting outliers or anomalies in data has been studied in the statistics community as early as the 19th century, over time, a variety of anomaly detection techniques has been developed in several research communities. Many of these techniques have been specifically developed for certain application domains, while others are more generic. Clustering basically is the task in which the data points are divided into homogenous classes or clusters. Homogenous means there are similar Items present within the same class which are as much as similar. Thus this process can also be referred to as grouping. Clustering is a popular unsupervised pattern classification technique which partitions the input space into number of regions based on some similarity/dissimilarity metric such that similar elements are placed in the same cluster while dissimilar ones are placed in separate clusters. This survey tries to provide an overview of various clustering methods used for anomaly detection. Reminders of this paper organized as the second section gives an overview of how clustering is useful in anomaly detection. Third section gives a description of different anomaly detection approaches, fourth section describes feature selection and reduction, fifth section gives an overview of different clustering algorithms for anomaly detection, and sixth section is the final conclusion. 2.HOW IS CLUSTERING USEFUL IN ANOMALY DETECTION Clustering can be used as a technique for training of the normality model, where similar data points are grouped together into clusters using a distance function. Clustering is suitable for anomaly detection, since no knowledge of the attack classes IJSER