Pattern Recognition Letters 11 (1990) 7-12 January 1990
North-Holland
Cluster validity based on the
fuzzy classification
hard tendency of the
F.F. RIVERA, E.L. ZAPATA
Dept. of Electronics. Faculty of Physics, Univ. Santiago de Compostela, Santiago de Compostela, Spain
J.M. CARAZO
Centro de Biologla Molecular, UniversidadAutdnoma de Madrid, Madrid, Spain
Received 26 April 1989
Abstract. We present two new fuzzy cluster validity functionals (minimum and mean hard tendencies), based on the analysis
of the hard tendency of the fuzzyclassification generated by the fuzzy c-means algorithm. We have used the bootstrap technique,
to avoid the possible influence of local minimums, obtained by the fuzzy c-meansalgorithm.
Key words: Cluster validity, fuzzy c-means algorithm, bootstrap technique.
1. Introduction
By cluster validity analysis we refer to the process
of studying the data set and deciding whether the
data occur in a basically uniform distribution, or
whether they have a certain cluster structure. In the
latter case the number of clusters that best represent
the data structure should also be obtained. Ideally,
this analysis should be carried out before perform-
ing any classification on the data set. Unfortunate-
ly, the basic question of 'how many clusters (if any)
are best' is very difficult to answer without having
performed any classification (Dubes and Jain, 1979;
Dubes, 1987).
One of the most complicated questions to be ad-
dressed is the testing for randomness in the data,
that is, to determine if the data set constitutes a re-
alization of a uniform distribution. If this were the
case, then of course no clustering process should be
applied to the data since they lack any valid cluster-
This work was supported by the Ministry of Education and
Science (CICYT) of Spain under contracts TIC88-0094, MIC88-
0549 and Xunta de Galicia XUGA80406488.
ing structure (although a continuum can certainly
be 'dissected', and Everit (1979) reported that im-
portant conclusions can be drawn from this). A
number of methods have been proposed to test for
randomness by Smith and Jain (1984), Windham
(1982) and Dubes (1987) with no conclusive find-
ings.
We have extensively used the bootstrap tech-
nique (sampling with replacement) proposed by
Efron (1979) for the calculation of all these func-
tionals. Bootstrapping was first used in multidimen-
sional pattern recognition by Moreau and Jain
(1986) in the definitions of a given validity function-
al. The rationality of this approach derives from the
consideration that the solution to the cluster validi-
ty problem should stay the same when some mode-
rate variations are applied to the data set, if the
cluster structure has been correctly found. The pref-
erence of the sampling with replacement (boot-
strap) over the sampling without replacement
(jackknife technique) is based on the statistical re-
sults presented by Efron (1979).
Practically, the use of this technique amounts to
performing a number of different classifications for
0167-8655/90/$3.50 © 1990, ElsevierScience Publishers B.V. (North-Holland) 7