IJSRD - International Journal for Scientific Research & Development| Vol. 2, Issue 07, 2014 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 463 A Novel Penalized and Compensated Constraints based Modified Fuzzy Possibilistic C-Means for Data Clustering Duraisamy K. 1 Haridass K. 2 1 Research Scholar 2 Assistant Professor& Head of Department 1 Department of Computer Science 2 Department of Computer Application 1,2 NGM College Pollachi India AbstractA cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior. Key words: Unsupervised Learning, Fuzzy C-Mean, Fuzzy Possibility C-Means, Penalized and Compensated constraints based FPCM I. INTRODUCTION Clustering (also known as unsupervised learning) is the task of recognizing a finite group of categories (or clusters) to illustrate the data. Therefore, similar objects are clustered to the similar category and dissimilar objects to different clusters. Clustering is also known as unsupervised learning since the data objects are pointed to a collection of clusters which can be interpreted as classes additionally.Clustering is the process of assembling the data records into significant subclasses (clusters) in a way that increases the relationship within clusters and reduces the similarity among two different clusters. Other names for clustering are unsupervised learning (machine learning) and segmentation. Clustering is used to get an overview over a given data set. A set of clusters is often enough to get insight into the data distribution within a data set. Another important use of clustering algorithms is the preprocessing for some other data mining algorithm. Fuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. Fuzzy clustering is a powerful unsupervised method for the analysis of data and construction of models. In many situations, fuzzy clustering is more natural than hard clustering. Objects on the boundaries between several classes are not forced to fully belong to one of the classes, but rather are assigned membership degrees between 0 and 1 indicating their partial membership. The discrete nature of the hard partitioning also causes difficulties with algorithms based on analytic functional, since these functional are not differentiable.The concept of fuzzy partition is essential for cluster analysis, and consequently also for the identification techniques that are based on fuzzy clustering. Fuzzy and possibilistic partitions can be seen as a generalization of hard partition which is formulated in terms of classical subsets. The remainder of this is organized as follows. Section 2 summarizes the concepts and literature survey. Section 3 discusses the proposed method, and section 4 provides the experiments with high accuracy. Finally, Section 5 presents the conclusions of the work. II. LITERATURE SURVEY A.M. Fahim et al., (2006) proposed an enhanced method for assigning data points to the suitable clusters. In the original K-Means algorithm in each iteration the distance is calculated between each data element to all centroids and the required computational time of this algorithm is depends on the number of data elements, number of clusters and number of iterations, so it is computationally expensive. Likas et al., (2003) put forth a global K-Means clustering algorithm. The technique was an incremental move towards to clustering that dynamically includes one cluster center at a particular time in the course of a deterministic global exploration procedure comprises of N (with N being the size of the data set) executions of the K- Means algorithm from appropriate initial positions. Baolin Yi et al., (2010) proposed a new method to find the initial center and improve the sensitivity to the initial centers of K- Means algorithm. Barakbah et al., (2009) proposes a new approach to optimizing the designation of initial centroids for K-Means clustering. Celikyilmaz et al., (2008) proposed a new fuzzy system modeling approach based on improved fuzzy functions to model systems with continuous output variable. Chen Zhang et al., (2009) presented a new clustering method based on K-Means that have avoided alternative randomness of initial center. This approach focused on K-Means algorithm to the initial value of the dependence of K selected from the aspects of the algorithm is improved. Chunhui et al., (2008) presented a Similarity based Fuzzy and Possibilistic C-Means algorithm called SFPCM. It is derived from original fuzzy and FPCM which was proposed by Bezdek. Fang Yuan et al., (2004) investigated the standard K-Means clustering algorithm in this work and give our