International Journal of Computer Applications (0975 – 8887) Volume 5– No.2, August 2010 33 Comparative Analysis of FCM and HCM Algorithm on Iris Data Set Pawan Kumar Deepika Sirohi Deptt. of IT Deptt. of IT M.M.University, M.M.University Mullana Mullana ABSTRACT Clustering is a primary data description method in data mining which group’s most similar data. The data clustering is an important problem in a wide variety of fields. Including data mining, pattern recognition, and bioinformatics. There are various algorithms used to solve this problem. This paper presents the comparison of the performance analysis of Fuzzy C mean (FCM) clustering algorithm and compares it with Hard C Mean (HCM) algorithm on Iris flower data set. We measure Time complexity and space Complexity of FCM and HCM at Iris data [1] set. FCM clustering [2, 3] is a clustering technique which is separated from Hard C Mean that employs hard partitioning. The FCM employs fuzzy portioning such that a point can belong to all groups with different membership grades between 0 and 1. Keywords: Data Mining, Fuzzy C Mean, Hard C Mean 1. Introduction: Clustering algorithm partitions an unlabelled set of data into groups according to the similarity. Compared with the data classification, the data clustering is an unsupervised learning process, it does not need a labeled data set as training data, but the performance of the data clustering algorithm is often much poorer. Although the data classification has better performance, it needs a labeled data set as training data and labeled data for the classification is often very difficult and expensive to obtain. So there are many algorithms are proposed to improve the clustering performance. In this paper, we implement clustering algorithm FCM and HCM in MATLAB. In the first stage, we implement these algorithms and then compare the Time and space complexity of these algorithms. Clustering technique is used for combining observed objects into clusters (groups), which satisfy two main criteria: Each group or cluster should be homogeneous objects that belong to the same group are similar to each other. Each group of cluster should be different from other clusters, that is, objects that belong to one cluster should be different from the objects of other clusters. Clustering can be considered the most important unsupervised learning problem. So, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be the process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects, which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. There are many clustering methods [11] available, and each of them may give a different grouping of a dataset. The choice of a particular method will depend on the type of output desired, the known performance of method with particular types of data, the hardware and software facilities available and the size of the dataset. Apart from Section 1, this paper is organized as follows: Section 2 introduces the FCM and HCM clustering algorithm in detail. Section 3 illustrates some implementation results and we conclude the paper in Section 4. 2. Hard C Mean and Fuzzy C Mean algorithm: In this section we describe the Hard C Mean and Fuzzy C Mean algorithm. 2.1 Hard C Mean clustering algorithm In non fuzzy or hard clustering, data is divided into crisp clusters, where each data point belongs to exactly one cluster. Used to classify data in crisp set Each data point will be assigned to only one cluster Clusters are also known as partitions U is a matrix with c rows and n columns The cardinality gives number of unique c partitions for n data points In this clustering technique partial membership is not allowed. HCM is used to classify data in a crisp sense. By this we mean that each data point will be assigned to one and only one data cluster. In this sense, these clusters are also called as partitions that are partitions of the data. In case of hard c mean each data element can be a member of one and only one cluster at a time. In other words we can say that the sum of membership grades of each data point in all clusters is equal to one and in HCM membership grade of a specific data point in a specific cluster is one and in all the remaining clusters its membership grade is zero. Also number of clusters that is can’t be less than or equal to one and they can’t be equal to or greater than number of data elements because if number of clusters is equal to one than all data elements will lie-in same cluster and if number of clusters is equal to number of data elements than each data elements will lie in its own separate cluster. That is each cluster is having only one data point in this special case. The steps of HCM algorithm given below.