Improving the H2MLVQ algorithm by the Cross Entropy Method Abderrahmane Boubezoul, S´ ebastien Paris, and Mustapha Ouladsine Laboratoire des Sciences de l’Information et des Syst` emes Domaine Universitaire de Saint-J´ erˆ ome, avenue Escadrille Normandie-Niemen, 13397 MARSEILLE CEDEX 20, France email: {abderrahmane.boubezoul,sebastien.paris,mustapha.ouladsine}@lsis.org Keywords: Generalized Learning Vector Quantization, Relevance Learning, Cross Entropy method, Initialization sensitiveness Abstract— This paper addresses the use of a stochastic optimization method called the Cross Entropy (CE) Method in the improvement of a recently proposed H2MLVQ (Harmonic to minimum LVQ ) algorithm , this algorithm was proposed as an initialization insensitive vari- ant of the well known Learning Vector Quantization (LVQ) algorithm. This paper has two aims, the first aim is the use of the Cross Entropy (CE) Method to tackle the initial- ization sensitiveness problem associated with the original (LVQ) algorithm and its variants and the second aim is to use a weighted norm instead of the Euclidean norm in or- der to select the most relevant features. The results in this paper indicate that the CE method can successfully be ap- plied to this kind of problems and efficiently generate high quality solutions. Also, good competitive numerical results on several datasets are reported. 1 Introduction Prototype based learning has been an ongoing research problem for last decades and it has been approached in var- ious ways. It has been gaining more interest lately due to its ability to generate fast and intuitive classification models with good generalization capabilities. One prominent algo- rithm of Prototype based learning algorithms is a Learning Vector Quantization (LVQ) introduced by Kohonen ([2]), this algorithm and its variants have been intensively stud- ied because of their robustness, adaptivity and efficiency. The idea of LVQ is to define class boundaries based on prototypes, a nearest neighbor rule and a winner-takes-it- all paradigm. The standard LVQ has some drawbacks: i) basically LVQ adjusts the prototypes using heuristic error correction rules, ii) it does not directly minimize an objec- tive function, thus it cannot guarantee the convergence of the algorithm which leads to instability behavior especially in the case of overlapped data, iii) the results are strongly dependent on the initial positions of the prototypes. In or- der to improve the standard LVQ algorithm several modifi- cations were proposed by the author him self (see[3]) and by other researchers. A good description of the state of the art of Learning Vector Quantization and its variants is given in a recent survey (see [5]). Standard LVQ does not distinguish be- tween more or less informative features due to the usage of the Euclidean distance, to improve that, extensions from various authors are suggested (see [4] [6] and [7]). The previous approaches obey to heuristic update for relevance and prototypes vectors are adapted using simple percep- tron learning which may cause problems for non linear separable data. For these reasons another variant based on minimisation of cost function using stochastic gradient descent method was proposed by Sato and Yamada (see [14]). Hammer and al suggested to modify the GLVQ cost function by using weighted norm instead of Euclidean dis- tance. This algorithm, called Generalized Relevances LVQ, showed in several tasks competitive results compared to SVM and it had been proved that GRLVQ can be consid- ered as a large margin classifier (see [8]). Although the GRLVQ algorithm guarantees convergence and shows better classification performances than other LVQ algorithms, it suffers from initialization sensitiveness due to the presence of numerous local minima incurred as a result of the use of gradient descent method especially for multi-modal problems. The same authors proposed an- other algorithm (Supervised Relevance Neural Gas SRNG) to tackle initialization sensitiveness (see[9]). They propose to combine the GRLVQ with the neighborhood oriented learning in the neural gas (NG). The algorithm mentioned above required choosing several parameters such as learn- ing rate, size of an update neighborhood. A suitable choice of these parameters values may not always be evident and also changed from one data set to another. In this paper, we present an initialization insensitive H2MRLVQ which is based on the well-known and efficient cross entropy (CE) method [10]. We refer to this new algorithm as the Cross Entropy Method (CEMH2MRLVQ). The rest of the paper is structured as follows. In section 2, we introduce the basics of classification and prototype learning; we review both the formulation of GLVQ and H2MLVQ. In section 3, we explain how the CE method can be considered as a stochastic global optimization pro- cedure for H2MLVQ algorithm, In section 4, we present the results of numerical experiments using our proposed algo- rithm on some benchmark data sets and compare with the results obtained using the H2MLVQ and GLVQ stochastic