International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 11 | Nov -2016 www.irjet.net p-ISSN: 2395-0072
© 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 563
COMPARISON OF CLUSTERING ALGORITHMS BASED ON OUTLIERS
Shivanjli Jain
1
, Amanjot Kaur Grewal
2
1
Research Scholar, Punjab Technical University, Dept. of CSE, Baba Banda Singh Bahadur Engineering College,
Fatehgarh Sahib, Punjab, India
2
Assistant Professor, Punjab Technical University, Dept. of CSE, Baba Banda Singh Bahadur Engineering College,
Fatehgarh Sahib, Punjab, India
---------------------------------------------------------------------------------------------------------------------------------------------------------
Abstract: Data mining, in general, deals with the discovery of non-trivial, hidden and interesting knowledge from different types
of data. With the development of information technologies, the number of databases, as well as their dimension and complexity,
grow rapidly. It is necessary what we need automated analysis of great amount of information. The analysis results are then used
for making a decision by a human or program. One of the basic problems of data mining is the outlier detection. The outlier
detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier
detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make
more reliable clustering. In this research, the ability to detect outliers can be improved using a combined perspective from outlier
detection and cluster identification. In proposed work comparison of four methods will be done like K-Mean, k-Mediods, Iterative
k-Mean and density based method. Unlike the traditional clustering-based methods, the proposed algorithm provides much
efficient outlier detection and data clustering capabilities in the presence of outliers, so comparison has been made. The purpose
of our method is not only to produce data clustering but at the same time to find outliers from the resulting clusters. The goal is to
model an unknown nonlinear function based on observed input-output pairs. The whole simulation of this proposed work has been
taken in MATLAB environment.
Keywords: Outliers, Data mining, Clustering, K-mean, K-medoid, DBSCAN, Iterative k-mean
1. Introduction
Outlier refers to the additional data which occur in
the dataset when the clustering is done. Outliers are
patterns in data that do not conform to a well-defined
notion of normal behavior.
Such data objects, which are grossly different from or
inconsistent with the remaining data, are called
outliers as shown in figure 1.
Figure 1: Outliers