Ensemble Clustering via Random Walker Consensus Strategy
D. D. Abdala, P. Wattuya, and X. Jiang
Department of Computer Science, University of M ¨ unster, Germany
{abdalad, wattuya, xjiang}@math.uni-muenster.de
Abstract—In this paper we present the adaptation of a ran-
dom walker algorithm for combination of image segmentations
to work with clustering problems. In order to achieve it, we
pre-process the ensemble of clusterings to generate its graph
representation. We show experimentally that a very small
neighborhood will produce similar results if compared with
larger choices. This fact alone improves the computational
time needed to produce the final consensual clustering. We
also present an experimental comparison between our results
against other graph based and well known combination clus-
tering methods in order to assess the quality of this approach.
Keywords-ensemble clustering; random walker.
I. I NTRODUCTION
Clustering combination has emerged as a valid option in
data clustering. It is an elegant way to deal with the problem
of choosing the fittest clustering result in cases where little
or nothing is known about the data set. It also works as
a way to smooth the final result when different clusterings
can potentially present dissimilar partitionings. Finally, it is
also a valid way to improve the final result by gathering
correct evidence among all the clusterings and merging it
in a final consensual result. There is a number of methods
already published addressing this topic.
One of the most popular is the median partition (MP)
formulation [1]. It can be formally stated as follows: Given
M clusterings C
1
, ···,C
M
over a set P of N input patterns
and d(·, ·), which is a symmetric distance measure between
clusterings, find C
*
such that:
C
*
= arg min
C
M
i=1
d(C
i
,C) (1)
This problem is known to be NP-complete [1], directing
the research to the development of heuristics to approximate
it. Among the relevant works, Golder and Filkov [2] present
a collection of six heuristics. Strehl and Gohsh [3] proposed
three graph based heuristics to address the combination
problem. In [4] the authors explore the idea of evidence
accumulation by combining the clustering results of M
clustering results into a co-association matrix. This matrix is
later used as a new similarity measure for a standard agglom-
erative hierarchical clustering algorithm. Finally, Ayad and
Kamel [5] proposed three new cumulative voting methods.
The problem is formulated as finding a compressed summary
of the estimated distribution that preserves the maximum
relevance.
In [6] we have presented a combination approach based
on a random walker algorithm to fuse multiple image
segmentations. In this work we adapt this strategy to address
the ensemble clustering problem. The remainder of the paper
is organized as follows: Section 2 gives some details of the
random walker based image fusion strategy that is needed
to understand the remainder of this paper. In section 3
a description of the adaptation to ensemble clustering is
presented. Section 4 describes the experiments performed in
order to evaluate the validity of the method. Finally, some
remarks in Section 5 conclude this paper.
II. RANDOM WALKER ALGORITHM FOR I MAGE
SEGMENTATION FUSION
To better explain the changes needed to adapt our im-
age segmentation combination method to deal with general
clustering, let’s revisit the original work. Firstly, an ensem-
ble is created during the generation step, followed by the
consensus step producing the final consensual result. During
the generation step, different clustering algorithms, different
initialization parameters, or different views of the data are
used in order to create an ensemble of clusterings with
sufficient variability. The combination process developed
in [6] follows the general consensus clustering model as
presented in Figure 1.
Figure 1. General Combination Clustering Process.
Once the ensemble of results is gathered, a consensus step
combines them all into a final consensual result. The con-
sensus step can be divided into 3 parts: a) graph generation;
b) seed region generation; and c) ensemble combination.
A. Graph Generation
For our random walker algorithm we need to pre-process
the data to generate a graph representation G(V,E,W ).
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.354
1437
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.354
1437
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.354
1433
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.354
1433
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.354
1433