International Journal of Research and Reviews in Soft and Intelligent Computing (IJRRSIC)
Vol. 2, No. 3, September 2012, ISSN: 2046-6412
161
© Science Academy Publisher, United Kingdom
www.sciacademypublisher.com/journals/index.php/IJRRSIC
Restarted Simulated Annealing Particle Swarm Optimization
used in Cluster Analysis
Yudong Zhang and Lenan Wu
School of Information Science and Engineering, Southeast University, Nanjing China
Email: zhangyudongnuaa@gmail.com, wuln@seu.edu.cn
Abstract – In order to solve the cluster analysis problem more efficiently, we presented a new approach based on Particle
Swarm Optimization Sequence Quadratic Programming (RSAPSO). First, we created the optimization model using the
variance ratio criterion (VRC) as fitness function. Second, RSAPSO was introduced to find the maximal point of the VRC. The
experimental dataset contained 400 data of 4 groups with three different levels of overlapping degrees: non-overlapping, partial
overlapping, and severely overlapping. We compared the RSAPSO with genetic algorithm (GA) and combinatorial particle
swarm optimization (CPSO). Each algorithm was run 20 times. The results showed that RSAPSO could found the largest VRC
values among all three algorithms, and meanwhile it cost the least time. It can conclude that RSAPSO is effective and rapid for
the cluster analysis problem.
Keywords – Cluster Analysis, Variance Ratio Criterion, Genetic Algorithm, Particle Swarm Optimization, Sequence
Quadratic Programming
1. Introduction
Cluster analysis is the assignment of a set of observations
into subsets without any priori knowledge so that
observations in the same cluster are similar to each other than
to those in other clusters [1]. Clustering is a method of
unsupervised learning, and a common technique for statistical
data analysis used in many fields [2], including machine
learning [3], data mining [4], pattern recognition [5], image
analysis [6] and bioinformatics [7]. Cluster analysis can be
achieved by various algorithms that differ significantly.
Those methods can be basically classified into four
categories:
1) Hierarchical Methods. They find successive clusters
using previously established clusters. They can be
further divided into the agglomerative methods and the
divisive methods [8]. Agglomerative algorithms start
with one-point clusters and recursively merges two or
more most appropriate clusters [9]. Divisive
algorithms begin with the whole set and proceed to
divide it into successively smaller clusters [10].
2) Partition Methods. They generate a single partition of
data with a specified or estimated number of non
overlapping clusters, in an attempt to recover natural
groups present in the data [11].
3) Density-based Methods. They are devised to discover
arbitrary-shaped clusters. In this approach, a cluster is
regarded as a region in which the density of data
objects exceeds a threshold. DBSCAN [12] is the
typical algorithm of this kind.
4) Subspace Methods. They look for clusters that can
only be seen in a particular projection (subspace,
manifold) of the data. These methods thus can ignore
irrelevant attributes [13].
In this study, we focus our attention on Partition
Clustering methods. The K-means clustering [14] and the
fuzzy c-means clustering (FCM) [15] are two typical
algorithms of this type. They are iterative algorithms and the
solution obtained depends on the selection of the initial
partition and may converge to a local minimum of criterion
function value if the initial partition is not properly chosen
[16]. Branch and bound algorithm was proposed to find the
global optimum clustering. However, it takes too much
computation time [17].
In the last decade, evolutionary algorithms were proposed
to clustering problem since they are not sensitive to initial
values and able to jump out of local minimal point. For
example, Elcio Sabato de Abreu e Silva et al. [18] proposed
the application of a genetic algorithm (GA) for determining
global minima to be used as seeds for a higher level ab initio
method analysis such as density function theory (DFT).
Water clusters were used as a test case and for the initial
guesses four empirical potentials (TIP3P, TIP4P, TIP5P and
ST2) were considered for the GA calculations. Two types of
analysis were performed namely rigid (DFT_RM) and non
rigid (DFT_NRM) molecules for the corresponding structures
and energies. For the DFT analysis, the PBE exchange
correlation functional and the large basis set A-PVTZ had
been used. All structures and their respective energies
calculated through the GA method, DFT_RM and
DFT_NRM are compared and discussed.
The proposed methodology showed to be very efficient in
order to have quasi accurate global minima on the level of ab
initio calculations and the data are discussed in the light of
previously published results with particular attention to
(H
2
O)n clusters. Lin et al. [19] pointed out that k-Anonymity
has been widely adopted as a model for protecting public
released microdata from individual identification. Their work