Darshan Sonagara et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.10, October- 2014, pg. 58-61 © 2014, IJCSMC All Rights Reserved 58 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 3, Issue. 10, October 2014, pg.58 – 61 RESEARCH ARTICLE Comparison of Basic Clustering Algorithms Darshan Sonagara 1 , Soham Badheka 2 1 Student, G H Patel College of Engineering & Technology, Gujarat, India ²Student, Chandubhai S. Patel Institute of Technology, Charusat, Gujarat, India 1 darshan.sonagara@gmail.com; 2 sohambadheka008@gmail.com Abstract — This paper presents the results of the theoretical study of some common document clustering techniques. Clustering is a machine learning technique for data mining which is a grouping of similar data for analysis purpose in simple words. We have compared the two main approaches of document clustering that are hierarchical clustering and Partitional clustering algorithm. We have surveyed and listed the algorithms, its advantages and disadvantages as well. Hierarchical clustering and its two basic approaches are discussed which are Agglomerative and Divisive. In partitional clustering, various partitions are generated by the partitioning algorithms like K-Means. However K-Means algorithm is very different from the hierarchical algorithms. Both of the approaches are better depending on the different situations. Partitional clustering is faster than the hierarchical clustering and partitional clustering is based on the stronger assumptions. In contradiction, hierarchical algorithm needs only a similarity measure and does not require input to be given. Keywords— Document clustering, Clustering algorithms, K-means algorithm, Hierarchical algorithm, Partitional algorithm I. INTRODUCTION The goal of the survey is to provide a review of two main clustering techniques in data mining. As the data on the web increases it becomes harder to store them in a meaningful way or to extract some useful information from them so that we need Document Clustering. This large amount of data can be both structured and unstructured which needs to be processed and analyzed. Document clustering is the traditional data mining technique which groups the related documents and organizes them. Today it has become very necessary to apply these techniques on World Wide Web to give a user better experience and a better organization for business analysts. Generally, there are two very basic clustering models. The first one is the connectivity based model which includes hierarchical based algorithm and another is centroid based model which includes K-Means algorithm. In the very first section, we are going to mention the classification of the clustering techniques in brief and then we will discuss the algorithms. Moreover we will compare the algorithms and find the most suitable algorithm accordingly.