Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 2, Issue. 4, April 2013, pg.197 – 204
RESEARCH ARTICLE
© 2013, IJCSMC All Rights Reserved 197
AN ADAPTIVE PARTITIONAL CLUSTERING
METHOD FOR CATEGORICAL ATTRIBUTE
USING K-MEDOID
A. Selvakumar
1
1
Assistant Professor of Computer Science, Dept. of Computer Science, Erode, Tamil Nadu, India
1
deesel@rediffmail.com
Abstract— partitioning a large set of objects into homogeneous clusters is a fundamental operation in data
mining. The operation is needed in a number of data mining tasks such as unsupervised classification and
data summation as well as segmentation of large heterogeneous data sets into smaller homogeneous subsets
that can be easily managed, separately modeled and analyzed. Clustering is a popular approach used to
implement this operation. Partitional clustering attempts to directly decompose the data set into a set of
disjoint clusters. More specifically, they attempt to determine an integer number of partitions that optimize as
certain criterion function. The criterion function may emphasize the local or global structure of the data and
its optimization is an iterative procedure. The intention to analyze the fact that partitional clustering
algorithms performs efficiently for numerical attribute rather than categorical attribute. To analyze the
algorithm best suits for a matrix data. They work with larger datasets with many attributes. For analysis the
Iris dataset has been retrieved from UCI data repository and used in K-Medoid. The outcome of the
algorithm is the partition of clusters which can also be visualized in graphical format. The cluster figures
differentiate the cluster in various colors with the centroid measure distinctly. Finally it has been determined
that K-Medoid is the better partitional algorithm.
I. INTRODUCTION
The amount of data kept in computer files and databases is growing at a phenomenal rate. At the same time,
the users of these data are expecting more sophisticated information from them. Simple structured and query
language queries are not adequate to support these increased demands for information. Generally, data mining
(sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and
summarizing it into useful information that can be used to increase revenue. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or
angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of
finding correlations or patterns among dozens of fields in large relational databases. Although data mining is a
relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of
supermarket scanner data and analyze market research reports for years. However, continuous innovations in
computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of
analysis while driving down the cost.
1.1 METHODOLOGY OF DATA MINING
While large-scale information technology has been evolving separate transaction and analytical systems, data
mining provides the link between the two. Data mining software analyzes relationships and patterns in stored