Applied Soft Computing 13 (2013) 1592–1607
Contents lists available at SciVerse ScienceDirect
Applied Soft Computing
j ourna l ho me p age: www.elsevier.com/l ocate/asoc
A multivariate fuzzy c-means method
Bruno A. Pimentel, Renata M.C.R. de Souza
∗
Centro de Informática, Av. Jornalista Anibal Fernandes, s/n – Cidade Universitária 50.740-560, Recife (PE), Brazil
a r t i c l e i n f o
Article history:
Received 10 July 2012
Received in revised form
27 November 2012
Accepted 30 December 2012
Available online 11 January 2013
Keywords:
Fuzzy c-means method
Unsupervised pattern recognition
Clustering
Membership degree
a b s t r a c t
Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several
application domains such as pattern recognition, machine learning and data mining. Although the FCM
has shown good performance in detecting clusters, the membership values for each individual computed
to each of the clusters cannot indicate how well the individuals are classified. In this paper, a new approach
to handle the memberships based on the inherent information in each feature is presented. The algorithm
produces a membership matrix for each individual, the membership values are between zero and one
and measure the similarity of this individual to the center of each cluster according to each feature.
These values can change at each iteration of the algorithm and they are different from one feature to
another and from one cluster to another in order to increase the performance of the fuzzy c-means
clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class
membership values is also proposed in this work. Experiments with synthetic and real data sets show
that the proposed approach produces good quality of clustering.
© 2013 Elsevier B.V. All rights reserved.
1. Introduction
A growing number of application domains such as pattern
recognition, machine learning, data mining, computer vision and
computational biology have used clustering algorithms [1–3]. Clus-
tering is a method of unsupervised learning whose objective is to
group a set of elements into clusters such that elements within a
cluster have a high degree of similarity, while elements belonging
to different clusters have a high degree of dissimilarity. Math-
ematically, the degree of dissimilarity can be measured using,
for instance, distance, angle, curvature, symmetry, connectivity or
intensity with information from the data set [4]. Hierarchical and
partitioning methods are the most popular clustering techniques.
Hierarchical clustering find a sequence of partitions where the algo-
rithm starts from one group with all objects and is executed until
find singletons groups, or vice versa, whereas partitioning cluster-
ing directly divides data objects into some fixed number of clusters
[5] using a suitable objective function. An advantage of the par-
titional method is its ability to manipulate large data sets, since
the construction of dendrogram by the hierarchical method may
be computationally impractical in some applications.
Partitioning clustering can be divided into hard and fuzzy meth-
ods. The concept of fuzzy set was initially explored by Zadeh [6] and
applied to clustering by Ruspini [7]. Works with fuzzy set appli-
cations in cluster analysis were proposed and applied in several
∗
Corresponding author.
E-mail address: rmcrs@cin.ufpe.br (R.M.C.R. de Souza).
areas [8]. The concept of fuzzy set allowed works on industrial and
academic fields [9]. Termini [10] used the definition of fuzzy sets
to create an interaction with human sciences. Wong and Lai [11]
described the applications of the fuzzy set theory in production and
operations management, for example, planning, quality control and
artificial intelligence (AI) techniques. Moreover, the work of Wong
and Lai is based on the information of 402 articles published on
journals between 1998 and 2009 that deal with application of fuzzy
set theory techniques. In the medical field, Kuo et al. [12] used fuzzy
set theory and, health care failure mode and effect analysis to study
patient according the decision-making factors: severity, incidence,
and detection.
In the hard approach each element of the data set can be associ-
ated to only one cluster, while in the fuzzy approach each element
of the data set has a possibility of belonging to all cluster but
with different membership degrees. Therefore, the calculation of
membership functions is an important problem in fuzzy cluster-
ing. When each pattern is associated to the cluster with the largest
measure of membership, the fuzzy clustering is equivalent to hard
clustering. Three examples of categories of fuzzy used in the clus-
ter analysis are: fuzzy clustering based on fuzzy relation, fuzzy
clustering based on objective functions and the fuzzy generalized
k-nearest neighbor rule [8]. The most popular fuzzy clustering
method based on objective functions is the Fuzzy c-means (FCMs)
[1,13]. An advantage of the FCM is that it may be used in applications
where the clusters are overlapping [8].
There are several papers with related works to theory and
applications of the FCM algorithm such as stochastic and numer-
ical theorems, image processing, parameter estimation and many
1568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.asoc.2012.12.024