E. Corchado et al. (Eds.): HAIS 2012, Part I, LNCS 7208, pp. 255–266, 2012.
© Springer-Verlag Berlin Heidelberg 2012
A Max Metric to Evaluate a Cluster
Hosein Alizadeh
1
, Hamid Parvin
2
, Sajad Parvin
2
, Zahra Rezaei
2
,
and Moslem mohamadi
2
1
Islamic Azad University, Mahdi Shahr Branch, Mahdi Shahr, Iran
halizadeh@iust.ac.ir
2
Islamic Azad University, Nourabad Mamasani Branch, Mamasani Nourabad, Iran
hamidparvin@mamasaniiau.ac.ir,
{s.parvin,rezaei,mohamadi}@iust.ac.ir
Abstract. In this paper a new criterion for clusters validation is proposed. This
new cluster validation criterion is used to approximate the goodness of a cluster.
The clusters which satisfy a threshold of the proposed measure are selected to
participate in clustering ensemble. To combine the chosen clusters, some
methods are employed as aggregators. Employing this new cluster validation
criterion, the obtained ensemble is evaluated on some well-known and standard
datasets. The empirical studies show promising results for the ensemble
obtained using the proposed criterion comparing with the ensemble obtained
using the standard clusters validation criterion. Besides to reach the best results,
the method gives an algorithm based on which one can find how to select the
best subset of clusters from a pool of clusters.
Keywords: Clustering Ensemble, Stability Measure, Extended EAC,
Co-association Matrix, Cluster Evaluation.
1 Introduction
Data clustering or unsupervised learning is an important and very difficult problem.
The objective of clustering is to partition a set of unlabeled objects into homogeneous
groups or clusters [3], [4] and [10]. There are many applications that use clustering
techniques to discover latent structures of data, such as data mining [11], information
retrieval [2], image segmentation [9], linkage learning [15], and machine learning. In
real-world problems, clusters can appear with different shapes, sizes, data
sparseness’s, and degrees of separation. Clustering techniques require the definition
of a similarity measure between patterns. Since there is no prior knowledge about
cluster shapes, choosing a specific clustering method is not easy [16]. Studies in the
last few years have tended to combinational methods. Cluster ensemble methods
attempt to find better and more robust clustering solutions by fusing information from
several primary data partitions [8].
Fern and Lin [8] have suggested a clustering ensemble approach which selects a
subset of solutions to form a smaller but better-performing cluster ensemble than
using all primary solutions. The ensemble selection method is designed based on
quality and diversity, the two factors that have been shown to influence cluster