International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 2 Issue: 11 3643 - 3646 _______________________________________________________________________________________________ 3643 IJRITCC |November 2014, Available @ http://www.ijritcc.org _______________________________________________________________________________________ A Review of Clustering Algorithms for Clustering Uncertain Data Ajit B. Patil ME Computer Student Department of Computer Engineering JSCOE PUNE Pune, India ajitbpatil99@gmail.com Prof. M. D. Ingle. Assistant Professor Department of Computer Engineering JSCOE PUNE Pune, India ingle.madhav@gmail.com Abstract— Clustering is an important task in the Data Mining. Clustering on uncertain data is a challenging in both modeling similarity between objects of uncertain data and developing efficient computational method. The most of the previous method extends partitioning clustering methods and Density based clustering methods, which are based on geometrical distance between two objects. Such method cannot handle uncertain objects that are cannot distinguishable by using geometric properties and Distribution regarding to object itself is not considered. Probability distribution is an important characteristic is not considered during measuring similarity between two uncertain objects. The well known technique Kullbak-Leibler divergence used to measures the similarity between two uncertain objects. The goal of this paper is to provide detailed review about clustering uncertain data by using different methods & showing effectiveness of each algorithm. Keywords-Clustering, Uncertain data, probability distribution. __________________________________________________*****_________________________________________________ I. INTRODUCTION Clustering is a process of partitioning a set of objects into a set of meaningful subclasses is called cluster. A subset of objects such that the distance between any two objects in the cluster is less than the distance between any object in the cluster and any object not located inside it. Clustering is also called as “unsupervised classification” i.e. there is no predefined classes. A good clustering method produces cluster with high quality in which intra-cluster similarity is high and inter-cluster similarity is low. Clustering used in wide range of applications such as pattern reorganization, clustering web log data to find groups of similar access pattern, create thematic map in GIS by clustering features of spaces. In many modern application ranges, e.g. the clustering of moving objects or sensor databases, only uncertain data is available. Clustering of uncertain data is recognized as important issue. The problem of clustering uncertain data has been studied and find out solutions on this problem. The most of the previous clustering algorithm for clustering uncertain data are extension to the existing clustering algorithms which are designed for clustering uncertain data. But extended existing algorithms to clustering uncertain data are limited because they use geometric distance to measure similarity. II. CLSTERING UNCERTAIN DATA Uncertainty in data brings new in clustering of uncertain data. The most of the methods for clustering of uncertain data is based on a measurement of similarity between two objects of a uncertain data. Only few methods are using a divergence to measuring the similarity between two objects. In this paper we provide a survey on different clustering algorithms for uncertain data. There are three main classes for clustering uncertain data. All this methods are based on a “Measurement of similarity”. 1. Partition based Clustering methods:- Construct various partitions and evaluate them by using some criteria. 2. Density based clustering methods:- Clustering is based on density or radius (local cluster criterion), such as density-connected points 3. Possible world methods: - It is by taking a set of possible world are sampled from an uncertain data set. 1) Partition Based Clustering Construct various partitions and evaluate them by using some criteria. In partition based clustering algorithm uses geometric distance to similarity between two uncertain objects. a) UK-mean:- In this clustering algorithm only center for each object is taken. Extend the k-mean algorithm by using expected distance to measure a similarity between two uncertain data objects.UK-mean [101] is an extension to the traditional K- mean algorithm to handle uncertain data object. UK- mean algorithm require to compute expected distance between each object and to obtaining expected distance is very costly because computation of ED function involves probability function. Probability density functions are different and arbitrary. The major computational cost of the UK-mean algorithm is the evaluation of Expected distance(ED).