Applied Soft Computing 13 (2013) 1592–1607 Contents lists available at SciVerse ScienceDirect Applied Soft Computing j ourna l ho me p age: www.elsevier.com/l ocate/asoc A multivariate fuzzy c-means method Bruno A. Pimentel, Renata M.C.R. de Souza ∗ Centro de Informática, Av. Jornalista Anibal Fernandes, s/n – Cidade Universitária 50.740-560, Recife (PE), Brazil a r t i c l e i n f o Article history: Received 10 July 2012 Received in revised form 27 November 2012 Accepted 30 December 2012 Available online 11 January 2013 Keywords: Fuzzy c-means method Unsupervised pattern recognition Clustering Membership degree a b s t r a c t Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several application domains such as pattern recognition, machine learning and data mining. Although the FCM has shown good performance in detecting clusters, the membership values for each individual computed to each of the clusters cannot indicate how well the individuals are classiﬁed. In this paper, a new approach to handle the memberships based on the inherent information in each feature is presented. The algorithm produces a membership matrix for each individual, the membership values are between zero and one and measure the similarity of this individual to the center of each cluster according to each feature. These values can change at each iteration of the algorithm and they are different from one feature to another and from one cluster to another in order to increase the performance of the fuzzy c-means clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class membership values is also proposed in this work. Experiments with synthetic and real data sets show that the proposed approach produces good quality of clustering. © 2013 Elsevier B.V. All rights reserved. 1. Introduction A growing number of application domains such as pattern recognition, machine learning, data mining, computer vision and computational biology have used clustering algorithms [1–3]. Clus- tering is a method of unsupervised learning whose objective is to group a set of elements into clusters such that elements within a cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. Math- ematically, the degree of dissimilarity can be measured using, for instance, distance, angle, curvature, symmetry, connectivity or intensity with information from the data set [4]. Hierarchical and partitioning methods are the most popular clustering techniques. Hierarchical clustering ﬁnd a sequence of partitions where the algo- rithm starts from one group with all objects and is executed until ﬁnd singletons groups, or vice versa, whereas partitioning cluster- ing directly divides data objects into some ﬁxed number of clusters [5] using a suitable objective function. An advantage of the par- titional method is its ability to manipulate large data sets, since the construction of dendrogram by the hierarchical method may be computationally impractical in some applications. Partitioning clustering can be divided into hard and fuzzy meth- ods. The concept of fuzzy set was initially explored by Zadeh [6] and applied to clustering by Ruspini [7]. Works with fuzzy set appli- cations in cluster analysis were proposed and applied in several ∗ Corresponding author. E-mail address: rmcrs@cin.ufpe.br (R.M.C.R. de Souza). areas [8]. The concept of fuzzy set allowed works on industrial and academic ﬁelds [9]. Termini [10] used the deﬁnition of fuzzy sets to create an interaction with human sciences. Wong and Lai [11] described the applications of the fuzzy set theory in production and operations management, for example, planning, quality control and artiﬁcial intelligence (AI) techniques. Moreover, the work of Wong and Lai is based on the information of 402 articles published on journals between 1998 and 2009 that deal with application of fuzzy set theory techniques. In the medical ﬁeld, Kuo et al. [12] used fuzzy set theory and, health care failure mode and effect analysis to study patient according the decision-making factors: severity, incidence, and detection. In the hard approach each element of the data set can be associ- ated to only one cluster, while in the fuzzy approach each element of the data set has a possibility of belonging to all cluster but with different membership degrees. Therefore, the calculation of membership functions is an important problem in fuzzy cluster- ing. When each pattern is associated to the cluster with the largest measure of membership, the fuzzy clustering is equivalent to hard clustering. Three examples of categories of fuzzy used in the clus- ter analysis are: fuzzy clustering based on fuzzy relation, fuzzy clustering based on objective functions and the fuzzy generalized k-nearest neighbor rule [8]. The most popular fuzzy clustering method based on objective functions is the Fuzzy c-means (FCMs) [1,13]. An advantage of the FCM is that it may be used in applications where the clusters are overlapping [8]. There are several papers with related works to theory and applications of the FCM algorithm such as stochastic and numer- ical theorems, image processing, parameter estimation and many 1568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2012.12.024