A cluster validity index for fuzzy clustering Kuo-Lung Wu a , Miin-Shen Yang b, * a Department of Information Management, Kun Shan University of Technology, Yung-Kang, Tainan 71023, Taiwan, ROC b Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li 32023, Taiwan, ROC Received 26 September 2004 Available online 19 December 2004 Abstract Cluster validity indexes have been used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity index for fuzzy clustering called a partition coefficient and exponential separation (PCAES) index. It uses the factors from a normalized partition coefficient and an exponential separation measure for each cluster and then pools these two factors to create the PCAES validity index. Considerations involving the com- pactness and separation measures for each cluster provide different cluster validity merits. In this paper, we also discuss the problem that the validity indexes face in a noisy environment. The efficiency of the proposed PCAES index is com- pared with several popular validity indexes. More information about these indexes is acquired in series of numerical comparisons and also three real data sets of Iris, Glass and Vowel. The results of comparative study show that the pro- posed PCAES index has high ability in producing a good cluster number estimate and in addition, it provides a new point of view for cluster validity in a noisy environment. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Fuzzy clustering; Cluster validity; Fuzzy c-means; Fuzzy c-partitions; Partition coefficient and exponential separation 1. Introduction Cluster analysis is a method for clustering a data set into groups of similar characteristics. It is an approach to unsupervised learning and also one of the major techniques in pattern recognition. The conventional (hard) clustering methods re- strict each point of the data set to exactly one clus- ter. Since Zadeh (1965) proposed fuzzy sets that produced the idea of allowing to have membership functions to all clusters, fuzzy clustering has been widely studied and applied in a variety of substan- tive areas (Bezdek, 1981; Ho ¨ppner et al., 1999; Yang, 1993; Baraldi and Blonda, 1999a,b). In the fuzzy clustering literature, the fuzzy c- means (FCM) clustering algorithm and its varia- tion are the most well-known and used methods (Bezdek, 1981; Ho ¨ ppner et al., 1999; Yang, 0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.11.022 * Corresponding author. Tel.: +886 3 265 3100; fax: +886 3 265 3199. E-mail address: msyang@math.cycu.edu.tw (M.-S. Yang). Pattern Recognition Letters 26 (2005) 1275–1291 www.elsevier.com/locate/patrec