International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1784
CANCER DATA PARTITIONING WITH DATA STRUCTURE AND
DIFFICULTY INDEPENDENT CLUSTERING SCHEME
K.R.Kavitha
1
, G. Angeline Prasanna
2
1
Research Scholar, Department of Computer Science, Kaamadhenu Arts and Science College, Tamilnadu, India
2
Head and Assistant Professor, Dept. of Computer Application&IT, Kaamadhenu Arts and Science College, Sathy,
--------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Hidden knowledge extraction is the main
operation of the data mining applications. Decision making
processes are carried out with the support of the discovered
knowledge. Relevant records are grouped by using the
clustering methods. Cancer diagnosis data values are
maintain in high dimensional model. Micro array data
models are adapted to process the high dimensional data
values. Distance measures are estimated to identify the
record relationship levels. The cluster representative
elements are referred as cluster ensembles. All the
relationship analysis is carried out through the ensemble
analysis mechanism.
Cluster ensemble consolidates the transactions of
the individual cluster results. Distributed Computing,
Knowledge Reuse and Quality and Robustness are the key
features of the cluster ensemble models. The ensemble
members are fetched using the Incremental Ensemble
Membership Selection (IEMS) scheme. The clustering
operations are performed with Incremental Semi-Supervised
Cluster Ensemble (ISSCE) framework. The cancer
expressions are compared using the Similarity Functions
(SF). Data and structure dependency is incased in the ISSCE
scheme.
The cancer data partitioning process uses the breast
cancer data values. Noisy data removal and missing value
replacement operations are carried out under the data
preprocess. The Dynamic Ensemble Membership Selection
(DEMS) scheme is build to support data structure and
complexity independent clustering process. Data clustering
operations are performed through the Partition Around
Medoids (PAM) clustering technique. The PAM clustering
technique and DEMS scheme are combined to handle the
ensemble based data partitioning process. The clustering
accuracy level is increased in the healthcare data
partitioning process.
Key Words: ISSCE (Incremental Semi Supervised Cluster
Ensemble, IEMS (Incremental Ensemble Membership
Selection), SF (Similarity Function), DEMS (Dynamic
Ensemble Membership Selection),PAM (Partition Around
Medoids) .
1. INTRODUCTION
1.1 Clustering Concepts
Clustering is the classification of objects into
different groups, or more precisely, the partitioning of a data
set into subsets, so that the data in each subset share some
common trait - often proximity according to some defined
distance measure. Data clustering is a common technique for
statistical data analysis, which is used in many fields,
including machine learning, data mining, pattern recognition,
image analysis and bioinformatics.
It is possible to guarantee that homogeneous
clusters are created by breaking apart any cluster that is
unhomogeneous into smaller clusters that are homogeneous.
Used mostly for consolidating data into a high-level
view and general grouping of records into like
behaviours. Space is defined as default n-
dimensional space, or is defined by the user, or is a
predefined space driven by part.
Besides the term data clustering, there are a
number of terms with similar meanings, including
cluster analysis, automatic classification, numerical
taxonomy, botryology and typological analysis.
The clustering technique is called an unsupervised
learning technique. It is a technique that when they
are run, there is not a particular reason for the
creation of the models to perform predication. In
clustering, there is no particular sense of why
certain records are near each other or why they all
fall into the same cluster.
Use of Clustering in Data Mining
Clustering is often one of the first steps in data
mining analysis. It identifies groups of related records that
can be used as a starting point for exploring further
relationships. This technique supports the development of
population segmentation models, such as demographic-
based customer segmentation. A company that sale a variety
of products may need to know about the sale of all of their
products in order to check that what product is giving
extensive sale and which is lacking. This is done by data
mining techniques. But if the system clusters the products
that are giving fewer sales then only the cluster of such
products would have to be checked rather than comparing
the sales value of all the products. This is actually to facilitate
the mining process.