I.J. Information Technology and Computer Science, 2017, 3, 71-79
Published Online March 2017 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijitcs.2017.03.08
Copyright © 2017 MECS I.J. Information Technology and Computer Science, 2017, 3, 71-79
Priority Based New Approach for Correlation
Clustering
Aaditya Jain
M.Tech Scholar, Department of Computer Science & Engg., R. N. Modi Engineering College,
Rajasthan Technical University, Kota, Rajasthan, India
E-mail: aadityajain58@gmail.com
Dr. Suchita Tyagi
Associate Professor, Department of Computer Science & Engg., Sushila Devi Bansal College of
Technology, Indore, MP, India
E-mail: suchitatyagi625@gmail.com
Abstract—Emerging source of Information like social
network, bibliographic data and interaction network of
proteins have complex relation among data objects and
need to be processed in different manner than traditional
data analysis. Correlation clustering is one such new style
of viewing data and analyzing it to detect patterns and
clusters. Being a new field, it has lot of scope for research.
This paper discusses a method to solve problem of
chromatic correlation clustering where data objects as
nodes of a graph are connected through color-labeled
edges representing relations among objects. Purposed
heuristic performs better than the previous works.
Index Terms—Clustering Problems, Correlation
Clustering, Chromatic Balls, and Priority Based
Chromatic Balls.
I. INTRODUCTION
Clustering is an unsupervised from of machine learning
aiming at grouping of data objects in a way that similar
objects fall in the same group specifically called a
“cluster”. The traditional clustering algorithms like k-
means [1] and fuzzy c-means [2] use the notation of
similarity or closeness among objects to group them.
Thus, they view objects as having binary or fuzzy
relationship between them. The binary relationship
categorizes which clusters are similar and should be
grouped in the same cluster using some similarity /
distance metric between them. The fuzzy relations, on the
other hand, deduce a percentage of similarity between
data objects, with the ones with higher percentage
probable to fall in the same cluster. In real world problem,
the relations among objects are more complex. Like those
existing among people in social networks, who have
varying kind of relationships family, professional,
friendly etc. Such scenarios of complex relations also
exist in authored documents library, protein-protein
interactions etc.
Scenarios discussed above are best described through
categorical relationships among objects, easily
represented through graphs. Using graphic is advocated
due to
They are flexible and intense data structures.
They can be easily ranged from very simple to
very complicated relationships.
They can be used to represent many kinds of
relations, whether independent or co-existing.
Once a graph has been formed, the problem of analysis
is converted into problem of partitioning the graph.
Bansal et al. defined the problem of Correlation
Clustering in [3]. It was successful enough to eradicate all
the issues encountered in the traditional clustering
algorithms so is being used in many applications like
parallel and distributed system, pattern recognition, and
image segmentation. Bonchi et al [4] further extended the
concept of correlation clustering to chromatic correlation
clustering by assigning colors to edges instead of positive
or negative signed labels as used in correlation clustering.
This paper presents a contribution in the direction of
solving chromatic correlation clustering problem through
revisiting the work of Bonchi et al [4, 5]. A Priority
Based Chromatic Balls algorithm is presented to increase
the probability of better solution of the algorithm and
keeping its advantages of speed retained.
The rest of the paper is organized as follows. Section II
describes brief literature search related to this work. In
section III Chromatic Balls algorithm is described with its
drawbacks to show the problem part. Section IV
describes the proposed algorithm with its both versions.
The experimental setup and comparative results are
provided in section V and VI. Finally the paper concludes
in section VII.
II. RELATED WORK
A lot of research is headed in this direction for years by
many authors. Detail analysis and literature search on this
topic is done in my previous work [6]. Some of them
introduced here.
Bansal et al in 2004 [3] introduced the concept of