IJSRD - International Journal for Scientific Research & Development| Vol. 4, Issue 11, 2017 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 494 A Brief Survey on Clustering Algorithms in Data Mining Sandip S. Kankal 1 Amol R. Dhakne 2 Yogesh R. Tayade 3 1,2,3 Assistant Professor 1,3 Department of Computer Science and Engineering 2 Department of Computer Engineering 1 Maharashtra Institute of Technology, Aurangabad, India 2 Flora Institute of Technology, Pune, India 3 Jawaharlal Nehru Engineering College, Aurangabad, India Abstract— The Data Mining process is used to extract valuable information from large & different categories of data set. Extraction is transformation of information from data set into an understandable structure for further use. Data Mining & Data Analysis applications work on most important concept of Clustering. In clustering data is divided into groups of similar objects. Data is represented by fewer clusters which necessarily involves certain fine details, but achieves simplification. In modern research Clustering Algorithms are vital tools for data analytics. The Clustering algorithms have been applied in variety of fields like neural networks, economics, Image Processing, biology etc. Most challenging problem in clustering is unsupervised grouping of patterns. This paper aims to provide survey of Clustering Algorithms. Key words: Data Mining, Clustering, Data Analysis, Unsupervised, Partitioning, Medoids, Supervised learning I. INTRODUCTION In today’s world of advanced science and technology huge amount of data has been collected and will continue to be. For example, data related to patients being collected by the hospital, data related customer transaction in the banks etc. is growing day by day in the size of gigabytes. This large amount of data to be processed or analyzed by applying different data-mining techniques. Most important technique in the data mining is a Clustering, which is used for making group or cluster from the given data set depending upon similarity between them. A cluster is a group of data objects that are similar to one another within the same cluster or group and are not similar to data objects within another cluster or group. Data is partitioned into a certain number of clusters or subsets. Most researchers define a cluster based on internal homogeneity & external separation [1],[2],[3], i.e. objects in the same cluster should be similar to each other, while objects in the different cluster should not. Similarity and dissimilarity both should be investigated clearly. Aim of Clustering is to divide data into meaningful & useful groups of clusters. Clustering is most vital area of research, which shows applications in the different fields like pattern recognition, data mining, neural networks, image processing, marketing, spatial database applications, Web analysis and many others. The standard clustering technique offers high intra-class similarity and low inter class similarity, that is, higher the similarity of data objects in the cluster, results in better clustering. Complexity of clustering technique has been increased by including large amount of data sets with attributes of different types in data mining. Due to this clustering algorithms should be imposed with the unique computational requirements. A variety of clustering techniques have emerged that satisfy the required computational requirements and may be successfully applied for real-life data mining problems. These Clustering techniques are the subject of survey. II. COMPONENTS OF CLUSTERING TASK Following are the steps in typical clustering activity [7]. 1) Pattern representation (optionally including feature extraction and/or selection), 2) Definition of pattern proximity measure appropriate for the data domain, 3) Clustering, 4) Data abstraction (if required), and 5) Assessment of output (if needed) Fig. 1 depicts the procedure of cluster analysis with four basic steps. 1) Feature Selection and Extraction: As directed by Jain et al. [6], [7] and Bishop [8], feature selection chooses distinguishing features from a set of candidates, while feature extraction utilizes some transformations to generate helpful and reality features from the original ones. 2) Clustering Algorithm Design or Selection: This step is usually applied with the selection of proximity measure and the construction of a criterion function. In order to select or design appropriate clustering algorithm, it is important to properly examine the characteristics of the problem at hand. 3) Cluster Validation: Cluster Validation is the procedure that evaluate the results of cluster analysis in quantitative and objective way. Clustering algorithms or methods are justified based on different application area with the help of ad hoc methods. 4) Results interpretation: The ultimate goal of clustering is to provide users with meaningful insights from the original data, so that users can effectively solve the problems encountered. Experts in the relevant fields interpret the data partition. Further analyzes, even experiments, may be required to guarantee the reliability of extracted knowledge. Fig. 1: Clustering Procedure: Typical Cluster Analysis consist of four steps with a feedback pathway [1].