Simultaneous pattern and variable weighting during topological clustering Nistor Grozavu 1,2 and Youn` es Bennani 1,2 1 Universit´ e Paris 13, 99, av. J-B Cl´ ement, 93430 Villetaneuse 2 LIPN-UMR 7030, Universit´ e Paris 13, 99, av. J-B Cl´ ement, 93430 Villetaneuse, France email: {ﬁrstname.secondname}@lipn.univ-paris13.fr Abstract. This paper addresses the problem of detecting a subset of the most rel- evant features and observations from a dataset through a local weighted learning paradigm. We introduce a new learning approach, which provides simultaneously Self-Organizing Map (SOM) and double local weighting. The proposed approach is computationally simple, and learns a different features vector weights for each cell (relevance vector) and an observation weighting matrix. Based on the lwo- SOM and lwd-SOM [7], we present a new weighting approach allowing to take into account the importance of the observations and of the variables simultane- ously called dlw-SOM. After the learning phase, a selection method is used with weight vectors to prune the irrelevant variables and thus we can characterize the clusters. A number of synthetic and real data are experimented on to show the beneﬁts of the proposed double local weighting using self-organizing models. 1 Introduction The data size can be measured in two dimensions, the size of features and the size of observations. Both dimensions can take very high values, which can cause problems during the exploration and analysis of the dataset. Models and tools are therefore re- quired to process data for an improved understanding. Feature selection is commonly used in machine learning, wherein a subset of the features available from the data are selected for application of a learning algorithm. The best subset contains the features that give the highest accuracy score. In order to ﬁnd out relevant features, we combine feature weighting with variable selection techniques. In variable selection, the task is reduced to simply eliminating variables which are completely irrelevant. Variable weighting is an extension of the se- lection process where the variables are associated to continuous weights which can be regarded as degrees of relevance. Continuous weighting provides a richer feature rele- vance representation. Hence, it is clear that the clustering and variable selection/weight- ing task are coupled, and applying these tasks in sequence can degrade the performance of the learning system. Consequently, it is necessary to develop a simultaneous algo- rithm of clustering and variables weighting/selection.