Research Article
Canonical PSO Based -Means Clustering Approach
for Real Datasets
Lopamudra Dey
1
and Sanjay Chakraborty
2
1
Heritage Institute of Technology, Kolkata, West Bengal 700 107, India
2
Institute of Engineering & Management, Kolkata, West Bengal 700 091, India
Correspondence should be addressed to Sanjay Chakraborty; sanjay ciem@yahoo.com
Received 14 June 2014; Revised 19 September 2014; Accepted 2 October 2014; Published 13 November 2014
Academic Editor: Francesco Camastra
Copyright © 2014 L. Dey and S. Chakraborty. Tis is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
“Clustering” the signifcance and application of this technique is spread over various felds. Clustering is an unsupervised process
in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are
important issues. Te procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Diferent
types of indexes are used to solve diferent types of problems and indices selection depends on the kind of available data. Tis paper
frst proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster,
intracluster) and then evaluates the efects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle
datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering
algorithms. Tis paper also describes the nature of the clusters and fnally compares the performances of these clustering algorithms
according to the validity assessment. It also defnes which algorithm will be more desirable among all these algorithms to make
proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with
respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
1. Introduction
One of the best known problems in the data mining is
the clustering. Clustering is the task of categorising objects
having several attributes into diferent classes such that the
objects belonging to the same class are similar, and those
that are broken down into diferent classes are not [1]. Tere
are several clustering algorithms that have been proposed till
now. Due to no prior information in clustering, the suitable
evaluation of the results is necessary. Evaluation means
measuring the similarity between clusters, measuring the
compactness, and separation between clusters [2]. Evaluation
measurement is also proposed as a key feature in internal and
external cluster validation indexes [3]. Such a measure can be
used to compare the performance of diferent data clustering
algorithms on diferent real life datasets. Tese measures
are usually tied to the type of criterion being considered in
assessing the quality of a clustering method. Tree diferent
techniques are available to evaluate the clustering results:
external, internal, and relative [4]. Both internal and external
criteria are based on statistical methods and they have high
computation demand. Te external validity methods evaluate
the clustering based on some user specifc intuitions [4]. Te
objective of this paper is the comparison of the diferent
clustering schemas that have been already proposed [5] with
Canonical PSO based K-means clustering algorithm.
Te rest of the paper is organized as follows. Te Canon-
ical PSO based K-means algorithm is proposed in Section 2
with some other existing clustering algorithms. Some popular
and widely used validity indices are introduced in Section 3.
Section 4 demonstrates the clustering compactness measure-
ments on a toy example dataset using K-means and DBSCAN
clustering algorithms. Section 5 demonstrates the clustering
compactness measurements with experimental results and
comparison of the indices is outlined in this section, and
Section 7 gives a brief conclusion of this paper. Interested
Hindawi Publishing Corporation
International Scholarly Research Notices
Volume 2014, Article ID 414013, 11 pages
http://dx.doi.org/10.1155/2014/414013