International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 351
ISSN 2229-5518
IJSER © 2015
http://www.ijser.org
A Survey of different methods of
clustering for anomaly detection
Sarita Tripathy,Prof(Dr.)Laxman Sahoo
Abstract - Anomaly detection is the process of identifying unusual behavior. It is widely used in data mining, for example, to identify
fraud, customer behavioral change, and manufacturing flaws, data mining techniques make it possible to search large amounts of
data for characteristic rules and patterns .With the ever increasing amount of new attacks in today’s world the amount of data will
keep increasing and because of the base-rate fallacy the amount the false alarms will also increase. Another problem with detection
of attacks is that they usually aren’t detected until after the attack has taken information. Most current network intrusion detection
systems employ signature-based methods or data mining-based methods which rely on labeled training data. Clustering is now the
most widely used technique for intrusion detection.
Index Terms: Anomaly detection, Unsupervised learning,K-means Clustering, Fuzzy C-means clustering, Genetic Algorithm,Non
negative matrix factorization,Principal component analysis,Coclustering,ID3 decision tree,Hierarchical based clustering.
1 INTRODUCTION
Anomaly detection refers to the problem of
finding patterns in data that do not confirm to
expected behavior. These non-conforming patterns
are often referred to as anomalies, outliers,
discordant observations, expectations, aberrations,
surprises, peculiarities or contaminants in different
application domains. Of these, anomalies and
outliers are two terms used most commonly in the
context of anomaly detection. Anomaly detection
finds extensive use in wide variety of application
domains, for example, an anomalous traffic pattern
in computer network could mean that a hacked
computer is sending out sensitive data to an
unauthorized destination. An anomalous MRI
image may indicate credit card or identity theft or
anomalous readings from a space craft sensor
could signify a fault in some component of the
space craft. Detecting outliers or anomalies in data
has been studied in the statistics community as
early as the 19th century, over time, a variety of
anomaly detection techniques has been developed
in several research communities. Many of these
techniques have been specifically developed for
certain application domains, while others are more
generic.
Clustering basically is the task in which the data
points are divided into homogenous classes or
clusters. Homogenous means there are similar
Items present within the same class which are as
much as similar. Thus this process can also be
referred to as grouping. Clustering is a popular
unsupervised pattern classification technique
which partitions the input space into number of
regions based on some similarity/dissimilarity
metric such that similar elements are placed in the
same cluster while dissimilar ones are placed in
separate clusters. This survey tries to provide an
overview of various clustering methods used for
anomaly detection. Reminders of this paper
organized as the second section gives an overview
of how clustering is useful in anomaly detection.
Third section gives a description of different
anomaly detection approaches, fourth section
describes feature selection and reduction, fifth
section gives an overview of different clustering
algorithms for anomaly detection, and sixth section
is the final conclusion.
2.HOW IS CLUSTERING USEFUL IN ANOMALY
DETECTION
Clustering can be used as a technique for training
of the normality model, where similar data points
are grouped together into clusters using a distance
function. Clustering is suitable for anomaly
detection, since no knowledge of the attack classes
IJSER