Darshan Sonagara et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.10, October- 2014, pg. 58-61
© 2014, IJCSMC All Rights Reserved 58
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 3, Issue. 10, October 2014, pg.58 – 61
RESEARCH ARTICLE
Comparison of Basic Clustering Algorithms
Darshan Sonagara
1
, Soham Badheka
2
1
Student, G H Patel College of Engineering & Technology, Gujarat, India
²Student, Chandubhai S. Patel Institute of Technology, Charusat, Gujarat, India
1
darshan.sonagara@gmail.com;
2
sohambadheka008@gmail.com
Abstract — This paper presents the results of the theoretical study of some common document clustering
techniques. Clustering is a machine learning technique for data mining which is a grouping of similar data
for analysis purpose in simple words. We have compared the two main approaches of document clustering
that are hierarchical clustering and Partitional clustering algorithm. We have surveyed and listed the
algorithms, its advantages and disadvantages as well. Hierarchical clustering and its two basic approaches
are discussed which are Agglomerative and Divisive. In partitional clustering, various partitions are
generated by the partitioning algorithms like K-Means. However K-Means algorithm is very different from
the hierarchical algorithms. Both of the approaches are better depending on the different situations.
Partitional clustering is faster than the hierarchical clustering and partitional clustering is based on the
stronger assumptions. In contradiction, hierarchical algorithm needs only a similarity measure and does not
require input to be given.
Keywords— Document clustering, Clustering algorithms, K-means algorithm, Hierarchical algorithm,
Partitional algorithm
I. INTRODUCTION
The goal of the survey is to provide a review of two main clustering techniques in data mining. As the data on
the web increases it becomes harder to store them in a meaningful way or to extract some useful information
from them so that we need Document Clustering. This large amount of data can be both structured and
unstructured which needs to be processed and analyzed. Document clustering is the traditional data mining
technique which groups the related documents and organizes them. Today it has become very necessary to apply
these techniques on World Wide Web to give a user better experience and a better organization for business
analysts.
Generally, there are two very basic clustering models. The first one is the connectivity based model which
includes hierarchical based algorithm and another is centroid based model which includes K-Means algorithm.
In the very first section, we are going to mention the classification of the clustering techniques in brief and then
we will discuss the algorithms. Moreover we will compare the algorithms and find the most suitable algorithm
accordingly.