CLUSTER ANALYSIS – AN OVERVIEW ANURADHA BHATIA 1 & GAURAV VASWANI 2 1 Faculty, Department of Computer, VES Polytechnic, Mumbai, Maharashtra, India 2 Student, Department of Computer Technology, VESIT, Mumbai, Maharashtra, India ABSTRACT Clustering analysis, also called segmentation analysis or taxonomy analysis, aims to identify homogeneous objects into a set of groups, named clusters, by given criteria. Clustering is a very important technique of knowledge discovery for human beings. It has a long history and can be traced back to the times of Aristotle .These days; cluster analysis is mainly conducted on computers to deal with very large-scale and complex datasets. With the development of computer-based techniques, clustering has been widely used in data mining, ranging from web mining, image processing, machine learning, artificial intelligence, pattern recognition, social network analysis, bio-informatics, geography, geology, biology, psychology, sociology, customers behaviour analysis, marketing to e-business and other fields. KEYWORDS: Cluster Analysis, K Mean, Hierarchical, Genes, Microdata, Problems INTRODUCTION The clustering of large sized datasets in data mining is an iterative process involving humans. Thus, the user’s initial estimation of the cluster number is important for choosing the parameters of clustering algorithms for the pre-processing stage of clustering. Also, the user’s clear understanding on cluster distribution is helpful for assessing the quality of clustering results in the post-processing of clustering. All these heavily rely on the user’s visual perception of data distribution. Clearly, visualization is a crucial aspect of cluster exploration and verification in cluster analysis. Visual presentations can be very powerful in revealing trends, highlighting outliers, showing clusters, and exposing gaps in data. Cluster analysis divides data into meaningful or useful groups (clusters). If meaningful clusters are the goal, then the resulting clusters should capture the “natural” structure of the data. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, and to provide a grouping of spatial locations prone to earthquakes. However, in other cases, cluster analysis is only a useful starting point for other purposes, e.g., data compression or efficiently finding the nearest neighbours of points. Whether for understanding or utility, cluster analysis has long been used in a wide variety of fields: psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning, and data mining. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique computational requirements on relevant clustering algorithms. A variety of algorithms have recently emerged that meet these requirements and were successfully applied to real-life data mining problems. They are subject of the survey. Cluster analysis, like factor analysis, makes no distinction between dependent and independent variables. The entire sets of interdependent relationships are examined. Cluster analysis is the obverse of factor analysis. Whereas International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 4, Oct 2013, 143-150 © TJPRC Pvt. Ltd.