Eﬃcient Vector Quantization Using the WTA-Rule with Activity Equalization Gunther Heidemann (gheidema@techfak.uni-bielefeld.de) and Helge Ritter (helge@techfak.uni-bielefeld.de) AG Neuroinformatik, Universit¨ at Bielefeld, Germany February 10, 2000 Abstract. We propose a new algorithm for vector quantization, the Activity Equalization Vector quantization (AEV). It is based on the winner takes all rule with an additional supervision of the average node activities over a training interval and a subsequent re-positioning of those nodes with low average activities. The re-positioning is aimed to both an exploration of the data space and a better approximation of already discovered data clusters by an equalization of the node activities. We introduce a learning scheme for AEV which requires as previous knowledge about the data only their bounding box. Using an example of Martinetz et al. [1], AEV is compared with the Neural Gas, Frequency Sensitive Competitive Learning (FSCL) and other standard algorithms. It turns out to converge much faster and requires less computational eﬀort. Keywords: Vector Quantization, Clustering, Competitive Learning, Codebook Generation, Un- supervised Learning, Winner Takes All, Neural Gas Neural Processing Letters, Vol. 13(1): 17–30, 2001 1. Outline of the problem Vector quantization is an important standard method for data compression and data mining [2, 5]. Given a set of data vectors Ω = {x i },x i ∈ IR d , vector quantiza- tion approximates this data distribution by a set of reference vectors (or codewords) Γ= {  w i } with |Γ| < |Ω|. Γ is called the code book, the reference vectors  w are often called nodes in the context of neurally motivated algorithms. A data vector x i is approximated by the best match node  w k(x i ) , which minimizes the distance ‖x i −  w j ‖ with respect to j , where ‖.‖ denotes the chosen distance measure (usually euclidean). Compression is achieved by replacing the data vector x with the index k(x) of the best matching node. Compression is the higher, the smaller the number of nodes N Γ = |Γ| is chosen. On the other hand, with decreasing N Γ the average reconstruction error < ‖x i −  w k(x i ) ‖ > increases. For a given number of nodes, their distribution is crucial for achieving a low mean square reconstruction error E(Ω, Γ) = 1 N Ω  {x i } ‖x i −  w k(x i ) ‖ 2 , with N Ω = |Ω|. (1) c  2005 Kluwer Academic Publishers. Printed in the Netherlands.