Journal of Theoretical and Applied Information Technology
20
th
December 2014. Vol.70 No.2
© 2005 - 2014 JATIT & LLS. All rights reserved
.
ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195
241
AN ALGORITHM TO CONSTRAINTS BASED MULTI-
DIMENSIONAL DATA CLUSTERING AIDED WITH
ASSOCIATIVE CLUSTERING
1
B.KRANTHI KIRAN,
2
Dr. A VINAYA BABU
1
Assistant Professor, Department of Computer Science and Engineering,
JNTUHCEJ, Karimnagar, Telangana, India
2
Professor, Department of Computer Science and Engineering
JNTUniversity Hyderabad, Telangana, India
1
kranthikiran9@gmail.com,
2
avb1222@jntuh.ac.in
ABSTRACT
To address the clustering problem related to multi-dimensional data clustering, a number of techniques
have been implemented. A constraint based multi-dimensional data-clustering algorithm is proposed in this
paper which helped with associative clustering can find out the number of clusters optimally present in a
multi-dimensional data set. Now, by bays factor computation process associative constraint based
clustering process is executed. Moreover, genetic algorithm is applied to optimization process to discover
the optimal cluster results. The constraints based proposed algorithm assists in recognizing the right data to
be clustered and the knowledge considering the data regarded as a constraint which enhances the precision
of clustering. The data constraints furthermore assist in indicating the data related to the clustering task.
The result of the proposed optimal associative clustering algorithm is compared with an existing algorithm
on two multi dimensional datasets. Experimental result demonstrates that the proposed method is able to
achieve a better clustering solution when compared with one existing algorithm.
Keywords: Associative Clustering, Genetic Algorithm, Multi-dimensional Data, Bays Factor, Contingency
Table
1. INTRODUCTION
In finding out knowledge unseen in
databases, Data mining develops as a promising
solution. Data Mining has been properly termed as
“the non-trivial extraction of implicit, formerly
unidentified and potentially constructive
information from data in databases” [1], [2]. Data
mining has been exploited for multiple needs both
in the private and public sectors. Accurate usage of
data mining contain market segmentation, fraud
detection, direct marketing, interactive marketing,
market basket analysis, trend analysis and more [3,
4,5,7]. In several pervasive allocated computing
environments, advances in computing and
communication over wired and wireless networks
have resulted. These environments frequently come
with dissimilar distributed sources of data and
computation. Mining in such environments
obviously calls for correct utilization of these
allocated resources. Most off-the-shelf data mining
systems are planned to work as a monolithic
centralized application on the other hand. They
usually download the related data to a centralized
location and next execute the data mining
operations [1-7]. This centralized approach does not
effort well in many of the emerging allocated,
ubiquitous, probably privacy-sensitive data mining
applications. In order to address this problem of
mining data, Distributed Data Mining (DDM)
proposes an alternate approach by distributed
resources [6].
For above forty years, Clustering [16, 26]
has been studied widely in data mining field and
across several disciplines due to its broad
applications. Clustering is the process of allocating
data objects into a set of disjoint groups called
clusters so that objects in each cluster are more
related to each other than objects from dissimilar
clusters. For competent clustering of data, the
literature offers with a vast number of algorithms.
These algorithms can be classified into nearest-
neighbor clustering, fuzzy clustering, partitional
clustering, hierarchical clustering, artificial neural
networks for clustering, statistical clustering
algorithms, density-based clustering algorithm and