IEEE TRANSACTIONS ON NEURAL NETWORKS, PRE-PRINT, ACCEPTED FOR PUBLICATION 1 The Growing Hierarchical Self-Organizing Map: Exploratory Analysis of High-Dimensional Data Andreas Rauber, Dieter Merkl, and Michael Dittenbach Department of Software Technology and Interactive Systems Vienna University of Technology Favoritenstr. 9-11 / 188, A-1040 Vienna, Austria e-mail: {andi, dieter, mbach}@ifs.tuwien.ac.at Abstract — The Self-Organizing Map is a very popular unsu- pervised neural network model for the analysis of high-dimensional input data as in data mining ap- plications. However, at least two limitations have to be noted, which are related, on the one hand, to the static architecture of this model, as well as, on the other hand, to the limited capabilities for the representation of hierarchical relations of the data. With our novel Growing Hierarchical Self-Organizing Map presented in this paper we address both limitations. The Growing Hierarchical Self-Organizing Map is an ar- tificial neural network model with hierarchical ar- chitecture composed of independent growing self- organizing maps. The motivation was to provide a model that adapts its architecture during its unsu- pervised training process according to the particu- lar requirements of the input data. Furthermore, by providing a global orientation of the indepen- dently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The benefits of this novel neural network are first, a problem-dependent architecture, and second, the in- tuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion. Keywords: Self-Organizing Map (SOM), Data Mining, Hierarchical Clustering, Exploratory Data Analysis, Pattern Recognition. I. Introduction Data mining, or more generally, pattern recog- nition and knowledge acquisition, heavily depend on suitable unsupervised learning methods. The purpose of these methods is to develop an optimal partitioning, i.e. clustering, of the data set to be analyzed. Cluster analysis is the organization of a collection of patterns, which are usually represented as vectors of measurements or points in a multidi- mensional space, into clusters based on similarity. Intuitively, patterns within a valid cluster are more similar to each other than to a pattern belonging to a different cluster [1]. In other words, the ob- jective of unsupervised learning methods in data mining applications is to identify groupings in an unlabeled set of data vectors that share semantic similarities. This helps the user to build a cogni- tive model of the data, thus fostering the detection of the inherent structure and the interrelationship of data. However, in many applications little to no prior information about underlying models for the data is available. In such a situation clustering provides a particularly appropriate approach to the analysis of data. The Self-Organizing Map (SOM ) is being widely used as a tool for mapping high-dimensional data into a two-dimensional representation space [2]. This mapping retains the relationship between in-