Pattern Recognition Letters 11 (1990) 7-12 January 1990 North-Holland Cluster validity based on the fuzzy classification hard tendency of the F.F. RIVERA, E.L. ZAPATA Dept. of Electronics. Faculty of Physics, Univ. Santiago de Compostela, Santiago de Compostela, Spain J.M. CARAZO Centro de Biologla Molecular, UniversidadAutdnoma de Madrid, Madrid, Spain Received 26 April 1989 Abstract. We present two new fuzzy cluster validity functionals (minimum and mean hard tendencies), based on the analysis of the hard tendency of the fuzzyclassification generated by the fuzzy c-means algorithm. We have used the bootstrap technique, to avoid the possible influence of local minimums, obtained by the fuzzy c-meansalgorithm. Key words: Cluster validity, fuzzy c-means algorithm, bootstrap technique. 1. Introduction By cluster validity analysis we refer to the process of studying the data set and deciding whether the data occur in a basically uniform distribution, or whether they have a certain cluster structure. In the latter case the number of clusters that best represent the data structure should also be obtained. Ideally, this analysis should be carried out before perform- ing any classification on the data set. Unfortunate- ly, the basic question of 'how many clusters (if any) are best' is very difficult to answer without having performed any classification (Dubes and Jain, 1979; Dubes, 1987). One of the most complicated questions to be ad- dressed is the testing for randomness in the data, that is, to determine if the data set constitutes a re- alization of a uniform distribution. If this were the case, then of course no clustering process should be applied to the data since they lack any valid cluster- This work was supported by the Ministry of Education and Science (CICYT) of Spain under contracts TIC88-0094, MIC88- 0549 and Xunta de Galicia XUGA80406488. ing structure (although a continuum can certainly be 'dissected', and Everit (1979) reported that im- portant conclusions can be drawn from this). A number of methods have been proposed to test for randomness by Smith and Jain (1984), Windham (1982) and Dubes (1987) with no conclusive find- ings. We have extensively used the bootstrap tech- nique (sampling with replacement) proposed by Efron (1979) for the calculation of all these func- tionals. Bootstrapping was first used in multidimen- sional pattern recognition by Moreau and Jain (1986) in the definitions of a given validity function- al. The rationality of this approach derives from the consideration that the solution to the cluster validi- ty problem should stay the same when some mode- rate variations are applied to the data set, if the cluster structure has been correctly found. The pref- erence of the sampling with replacement (boot- strap) over the sampling without replacement (jackknife technique) is based on the statistical re- sults presented by Efron (1979). Practically, the use of this technique amounts to performing a number of different classifications for 0167-8655/90/$3.50 © 1990, ElsevierScience Publishers B.V. (North-Holland) 7