Pattern ItatcoCmttoa, vol. 21. No Z pp. 169 174, 1988. Pr'mted in Great Bntaan. 0031 3203885300- (X) Pergamon Presa pie Pattern RecognJtJotl Society OPTIMALITY OF REASSIGNMENT RULES IN DYNAMIC CLUSTERING J. KII-rLER Dept Electronic and Electrical Engineering. University of Surrey, Guildford GU2 5XH. U.K. and D. PAIRMAN* Atmospheric Physics Dept, Oxford University, Oxford, OXI 3PU, U.K. (Receired 22 Janmlry 1986; in rerised fi,rm 14,4u~tnst 1987) Abstract The paper is concerned with reassignment rules for the dynamic clustering algorithm family which includes ISODATA. It is shown that contrary to popular beliefthese iterative clustering algorithms do not guarantee that each stable partition is IocaUy optimi,I. The main result derived herein is a multiple-point reassignment rule which assumes a Gat,ssian density model for each cluster. The new rule should reduce the chances of the iterative optimisation algorithm yielding partitions which do not correspond to local minima of the clustering criterion. hcrative chlstcring ISODATA Gaussian models Reassignment rules I. INTROliUCI'ION Itcrativc variancc-minimising clustering algorithms of the late nineteen sixties such as ISODATA of Ball and l lall") and the k-means algorithm MacQucen(') have made a considerable impact on clustering methodol- ogy. Their popularity owes to their simplicity, eflicacy and computational efficiency which for many users seem to outweigh the well known drawbacks associated with using vari:mce as a clustering criterion) ) In general, when natural clusters in a data structure have covariance matrices which differ from the identity matrix, a minimum variance solution can result in undesirable artifacts. Other problems include the presence of multiple local minima in the clustering criterion which impede the iterative minimisation algorithm in finding the globally optimum partition. Various attempts have been made to overcome these problems while retaining the computationally attractive shell of the algorithm. Among the most notable developments is Diday's gencralisation of ISODATA referred to as a dynamic clustering ~''~ which permits the use of arbitrary models for cluster representation rather than the cluster mean. The main premise of using more sophisticated cluster models was that this would result in the reduction of the number of plausible minima of the "Supported by NRAC (NZ) fellowship while on study leave from the Department of Scientific and Industrial Research. New Zealand. clustering criterion function and therefore the chances of tinding the glob:dly optimum partition would be enh~mced, ilowever, the straightforward extension of ISODATA to dynamic clustering led to the propaga- tion of other, less visible problems of the ISODATA algorithm into more general equivalent. The first of these less visible problems lies in the postulation of the objective function in dynamic clustering which by a direct analogy to ISODATA is defined as the sum of values of a (dis)similarity measure between data points and their associated cluster models. If such a measure of attinity of a point and a cluster model becomes the corner stone of a clustering philosophy then it may seem inappropriate to take also cluster size into account when cluster allocation of data points is being decided. Yet the failure to consider cluster sizes can lead to clustering solutions which do not reflect the natural structure of the data. This has been pointed out in Refs (6-8) and demonstrated on synthetically generated data in Ref. (9). This particular drawback has been overcome by Symons '~' who arrived at the dynamic clustering algorithm by formulating the clustering problem in the framework of maximum likelihood estimation and by Kittler and Pairman~'~} who approached the problem from a statistical pattern recognition point of view. The second potential source of difficulties is the use of the actual similarity measure as the point-reassign- ment rule. The establishment of this practice~'u°)again has its roots in the direct analogy between the dynamic clustering and ISODATA algorithms. ISODATA Pr zl:~.-e 169