Ensemble Clustering via Random Walker Consensus Strategy D. D. Abdala, P. Wattuya, and X. Jiang Department of Computer Science, University of M ¨ unster, Germany {abdalad, wattuya, xjiang}@math.uni-muenster.de Abstract—In this paper we present the adaptation of a ran- dom walker algorithm for combination of image segmentations to work with clustering problems. In order to achieve it, we pre-process the ensemble of clusterings to generate its graph representation. We show experimentally that a very small neighborhood will produce similar results if compared with larger choices. This fact alone improves the computational time needed to produce the ﬁnal consensual clustering. We also present an experimental comparison between our results against other graph based and well known combination clus- tering methods in order to assess the quality of this approach. Keywords-ensemble clustering; random walker. I. I NTRODUCTION Clustering combination has emerged as a valid option in data clustering. It is an elegant way to deal with the problem of choosing the ﬁttest clustering result in cases where little or nothing is known about the data set. It also works as a way to smooth the ﬁnal result when different clusterings can potentially present dissimilar partitionings. Finally, it is also a valid way to improve the ﬁnal result by gathering correct evidence among all the clusterings and merging it in a ﬁnal consensual result. There is a number of methods already published addressing this topic. One of the most popular is the median partition (MP) formulation [1]. It can be formally stated as follows: Given M clusterings C 1 , ···,C M over a set P of N input patterns and d(·, ·), which is a symmetric distance measure between clusterings, ﬁnd C * such that: C * = arg min C M  i=1 d(C i ,C) (1) This problem is known to be NP-complete [1], directing the research to the development of heuristics to approximate it. Among the relevant works, Golder and Filkov [2] present a collection of six heuristics. Strehl and Gohsh [3] proposed three graph based heuristics to address the combination problem. In [4] the authors explore the idea of evidence accumulation by combining the clustering results of M clustering results into a co-association matrix. This matrix is later used as a new similarity measure for a standard agglom- erative hierarchical clustering algorithm. Finally, Ayad and Kamel [5] proposed three new cumulative voting methods. The problem is formulated as ﬁnding a compressed summary of the estimated distribution that preserves the maximum relevance. In [6] we have presented a combination approach based on a random walker algorithm to fuse multiple image segmentations. In this work we adapt this strategy to address the ensemble clustering problem. The remainder of the paper is organized as follows: Section 2 gives some details of the random walker based image fusion strategy that is needed to understand the remainder of this paper. In section 3 a description of the adaptation to ensemble clustering is presented. Section 4 describes the experiments performed in order to evaluate the validity of the method. Finally, some remarks in Section 5 conclude this paper. II. RANDOM WALKER ALGORITHM FOR I MAGE SEGMENTATION FUSION To better explain the changes needed to adapt our im- age segmentation combination method to deal with general clustering, let’s revisit the original work. Firstly, an ensem- ble is created during the generation step, followed by the consensus step producing the ﬁnal consensual result. During the generation step, different clustering algorithms, different initialization parameters, or different views of the data are used in order to create an ensemble of clusterings with sufﬁcient variability. The combination process developed in [6] follows the general consensus clustering model as presented in Figure 1. Figure 1. General Combination Clustering Process. Once the ensemble of results is gathered, a consensus step combines them all into a ﬁnal consensual result. The con- sensus step can be divided into 3 parts: a) graph generation; b) seed region generation; and c) ensemble combination. A. Graph Generation For our random walker algorithm we need to pre-process the data to generate a graph representation G(V,E,W ). 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.354 1437 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.354 1437 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.354 1433 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.354 1433 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.354 1433