Evolutionary subspace search in biologically inspired optimal niches Ira Assent, Ralph Krieger, Emmanuel Müller, Andreas Steffens and Thomas Seidl Data management and data exploration RWTH Aachen University, D-52056 Aachen, Germany Phone: +49-241-8021901, Fax: +49-241-8022931 email: {assent, krieger, mueller, steffens, seidl}@cs.rwth-aachen.de ABSTRACT: Knowledge discovery in large multimedia databases permits information extraction from ubiquitous data archives. Clustering groups objects based on mutual (dis-)similarity, yet does not scale to high-dimensional databases. In high-dimensional data, clusters are typically hidden by locally irrelevant attributes. Subspace clustering aims to detect clusters in locally relevant subspace of the attributes. We propose to an evolutionary subspace search approach to detect locally relevant attributes for clustering. Our evolutionary subspace search optimizes a multi-objective goal, as more than one subspaces are interesting for clustering. We propose modelling evolutionary niches to ensure that subspace individuals in local optima are preserved while novel evolutionary solutions arise. We analyze our evolutionary model, and provide experimental evaluation to demonstrate the quality of our evolutionary subspace search. KEYWORDS: nature inspired computing, evolutionary subspace search, genetic algorithms, evolutionary multi- objective optimization, biological niches, knowledge discovery in databases, data mining, clustering, subspace search INTRODUCTION Today’s multimedia databases archive huge amounts of application data ranging from magnetic resonance imaging, gene expression data, hydrological models, or sensor data measurements of mobile devices. A common requirement for any of these applications is effective and efficient extraction of information hidden in the data, termed knowledge discovery in databases (KDD). The KDD task of clustering aims at grouping data such that similar objects are summarized and separated from dissimilar ones. The resulting clusters can be used as a resource in detecting similar diagnostic data, finding new gene correlations etc. In high-dimensional databases, where numerous attributes are used to describe multimedia data, clusters tend to be obscured by irrelevant attributes. Detecting clusters in projections of these attributes, i.e. in subspaces, is the focus of subspace clustering. As the number of subspaces is exponential in the number of dimensions, efficient mining is challenging. In this work, we take an evolutionary approach towards subspace search. Our subspace search method focuses clustering on relevant subspaces. We search subspaces which are most relevant for finding clusters. Mimicking evolutionary theory, a population of subspaces is subjected to nature-inspired optimization. Evaluation of subspaces is according to a fitness function which reflects the clustering tendency of the subspace. Not just single optimal subspaces are interesting in subspaces clustering. Different local solutions represent interesting subspace combinations. These combinations are not necessarily related, yet constitute regions of interest to clustering algorithms. Such diverse local optima can be modeled by biological niches in our evolutionary approach. Evolutionary multi-objective optimization (EMOO) aims at detecting a set of near-optimal solutions [3]. Our approach ensures that the population may evolve differently according to locally optimal conditions in such niches. This paper is structured as follows: in the next section we review subspace search for subspace clustering, before presenting our evolutionary subspace search approach. A section on subspace niches details our proposed multiobjective evolutionary search model. In the experiments, we demonstrate the effectiveness of our model and show its ability to successfully detect niches that describe different local subspace optima. Finally, we conclude our paper.