IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. V (Jan – Feb. 2015), PP 83-89 www.iosrjournals.org DOI: 10.9790/0661-17158389 www.iosrjournals.org 83 | Page Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents Sunita Sarkar 1 , Arindam Roy 2 , Bipul Syam Purkayastha 3 1,2,3 (Department of Computer Science , Assam University ,India) Abstract: This paper presents a method using hybridization of self organizing map (SOM ), particle swarm optimization(PSO) and k-means clustering algorithm for document clustering. Document representation is an important step for clustering purposes. The common way of represent a text is bag of words approach. This approach is simple but has two drawbacks viz. synonymy and polysemy which arise because of the ambiguity of the words and the lack of information about the relations between the words. To avoid the drawbacks of bag of words approach words are tagged with senses in WordNet in this paper. Sense tagging of words provide exact senses of words. Feature vectors are generated using sense tagged documents and clustering is carried out using proposed hybrid SOM+PSO+K-means algorithm. In the proposed algorithm initially SOM is applied to the feature vectors to produce the prototypes and then K-means clustering algorithm is applied to cluster the prototypes. Particle Swarm Optimization algorithm is used to find the initial centroid for K-means algorithm. Text documents in Nepali language are used to test the hybrid SOM+PSO+K-means clustering algorithm. Keywords: Computational Intelligence, Sense tagging, Self organization map, Particle swarm optimization. I. Introduction Computational intelligence (CI) is a set of nature-inspired computational methodologies and approaches to address complex real-world problems. Paradigms that comprise CI techniques are Neural networks, evolutionary computing, swarm intelligence and fuzzy systems. Computational Intelligence methods have been successfully applied to many fields such as diagnosis of diseases, speech recognition, data mining, composing music, image processing, forecasting, robot control, credit approval, classification, pattern recognition, planning game strategies, compression, combinatorial optimization, fault diagnosis, clustering, scheduling, and time series approximation. control systems, gear transmission and braking systems in vehicles, controlling lifts, home appliances, controlling traffic signals, and many others[1]. This paper concerns with the application of CI methods in document clustering. Document clustering is the process of grouping/dividing a set of documents into subsets (called clusters) so that the documents are similar to one another within the cluster and are dissimilar to documents in other clusters. Vector space model with bag of words is the common approach for representing a text. This approach suffers from two drawbacks viz. several words can have same meanings (synonymy) and same words can have multiple meanings (polysemy). In this paper an attempt has been made to handle these issues by tagging the words with senses. Given a word and its possible senses, as defined in a knowledge base, sense tagging is the process of assigning the most appropriate senses to the words in the corpus within a given context where sense can be defined as semantic value (content) of a word when compared to other words; i.e. when it is part of a group or set of related words[2]. When words are sense tagged, the most appropriate senses are attached to the words. In this work feature vectors are generated using sense tagged document corpus and clustering is done by a hybrid SOM+PSO+K-means clustering algorithm. Self organizing map[3] is an artificial neural network and have been successfully applied to document clustering. The SOM is an algorithm used to visualize and interpret large high-dimensional data sets. It is an unsupervised learning algorithm. It produces a set of prototype vectors representing the data set and carries out a topology preserving projection of the prototypes from the n-dimensional input space onto a low-dimensional grid. PSO is a computational intelligence technique first introduced by Kennedy and Eberhart in 1995[4] . PSO is a population-based stochastic search algorithm which is modeled after the social behavior of a bird flock. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function[5]. In the context of clustering, a single particle represents the N c cluster centroid vectors. That is, each particle x i is constructed as follows: x i =(o i1 ,...,o ij ,...,o iNc ) (1)