Pattern Recognition 36 (2003) 1901 – 1912 www.elsevier.com/locate/patcog Premature clustering phenomenon and new training algorithms for LVQ Mohhamad-Taghi Vakil-Baghmisheh ∗ , Nikola Pave si c Faculty of Electrical Engineering, Laboratory of Articial Perception, Systems and Cybernetics, University of Ljubljana, Tra za ska 25, Ljubljana, Slovenia Received 4 March 2002; received in revised form 26 September 2002; accepted 26 September 2002 Abstract Five existing LVQ algorithms are reviewed. The Premature Clustering Phenomenon, which downgrades the performance of LVQ is explained. By introducing and applying the “equalizing factor” as a remedy for the premature clustering phenomenon a breakthrough is achieved in improving the performance of the LVQ network, and its performance becomes competitive with that of the best known classiers. For estimating the equalizing factor four dierent formulas are suggested, which result in four dierent versions of the LVQ4a algorithm. A new weight-updating formula for LVQ is presented, and the LVQ4b algorithm is presented as implementation of this new weight-updating formula in batch mode training. In addition, four variants of the LVQ4c algorithm are presented as the customized LVQ4b algorithm for pattern mode training. A meticulous analysis of their performances and that of ve early training algorithms has been carried out and they have been compared against each other, on 16 databases of the Farsi optical character recognition problem. ? 2003 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Neural networks; LVQ; Pattern recognition; Farsi optical character recognition; Premature clustering phenomenon; Equalizing factor; LVQ4a; LVQ4b; LVQ4c 1. Introduction Learning vector quantization (LVQ), originally was in- troduced by Linde et al. [1] and Gray [2] as a tool for image data compression, and later on was adapted by Kohonen [3] for pattern recognition. Its main idea is to divide the input space R n into a num- ber of distinct regions, called decision regions (Voronoi cells), and for each region one codebook (Voronoi) vec- tor is assigned. Classication is performed based on the vicinity of the input vector x to the codebook vectors; x will be classied as the label of its nearest neighbor among codebook vectors. During the training, the codebook vectors ∗ Corresponding author. Tel.: +386-1-4768839; fax: +386-1- 4768316. E-mail addresses: vakil@luz.fe.uni-lj.si (M.-T. Vakil- Baghmisheh), nikola.pavesic@fe.uni-lj.si (N. Pave si c). and consequently the borders of decision regions are ad- justed through an iterative process. Already, there exist ve versions of the training algorithm, LVQ1, LVQ2.1, LVQ3, OLVQ1 and CLVQ (Combined LVQ) [4–6]. The main drawbacks of the existing algorithms are their slow conver- gence and weak recognition rate, because of the premature clustering phenomenon. In the following, rst we will review the existing algo- rithms. In Section 3, the premature clustering phenomenon will be explained, and the equalizing factor—as our solution to the problem—will be introduced. A new weight-updating formula will be presented in Section 4. In Section 5 the - nal versions of new training algorithms will be presented. For estimating the equalizing factor, four dierent formu- las are suggested, which result in four dierent versions of the LVQ4a algorithm. Then, the LVQ4b and LVQ4c algorithms are presented as implementations of the new weight-updating formula. In Section 6, experimental results 0031-3203/03/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII:S0031-3203(02)00291-1