(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 7, 2012 125 | Page www.ijacsa.thesai.org Comparative Study between the Proposed GA Based ISODAT Clustering and the Conventional Clustering Methods Kohei Arai Graduate School of Science and Engineering Saga University Saga City, Japan Abstract— A method of GA: Genetic Algorithm based ISODATA clustering is proposed.GA clustering is now widely available. One of the problems for GA clustering is a poor clustering performance due to the assumption that clusters are represented as convex functions. Well known ISODATA clustering has parameters of threshold for merge and split. The parameters have to be determined without any assumption (convex functions). In order to determine the parameters, GA is utilized. Through comparatives studies between with and without parameter estimation with GA utilizing well known UCI Repository data clustering performance evaluation, it is found that the proposed method is superior to the original ISODATA and also the other conventional clustering methods. Keywords- GA; ISODATA; Optimization; Clustering. I. INTRODUCTION Clustering is the method of collecting the comrades of each-other likeness, making a group based on the similarity and dissimilarity nature between object individuals, and classifying an object in the heterogeneous object of a thing [1]. The classified group calls it a cluster. The criteria which measure how many objects are alike have the degree (similarity) of similar, and the degree (dissimilarity) of dissimilarity [2]. The object with high similarity is one where a value is larger more alike like a correlation coefficient in the degree of similar, and the object with low similarity is not one where the value of the degree of dissimilarity is conversely larger ] alike. The degree of dissimilarity is well used in these both. The degree of dissimilarity is also called distance (distance). There is a definition of the distance currently used by clustering how many. The clustering method can be divided into the hierarchical clustering method and the un-hierarchical clustering method [3]. Hierarchical clustering [4] (hierarchical clustering method) is the clustering method for searching for the configurationally structure which can be expressed with a tree diagram or a dendrogram [5], and is method into which it has developed from the taxonomy in biology. A hierarchy method has a shortest distance method, the longest distance method, the median method, a center-of gravity method, a group means method, the Ward method, etc [6]. By a hierarchy method, there are faults, such as the chain effect that computational complexity is large. A non-hierarchy method is the method of rearranging the member of a cluster little by little and asking for the better cluster from the initial state [7],[8],[9]. It is more uniform than this as much as possible within a cluster, and it is a target to make it a classification which differs as much as possible between clusters. The typical method of a non-hierarchy method has the K-means method and the ISODATA method [10]. A method of GA: Genetic Algorithm [11] based ISODATA clustering is proposed. GA clustering is now widely available. One of the problems for GA clustering is a poor clustering performance due to the assumption that clusters are represented as convex functions. Well known ISODATA clustering has parameters of threshold for merge and split [12],[13]. The parameters have to be determined without any assumption (convex functions). In order to determine the parameters, GA is utilized. Through comparatives studies between with and without parameter estimation with GA utilizing well known UCI Repository data clustering performance evaluation, it is found that the proposed method is superior to the original ISODATA. ISODATA based clustering with GA is proposed in the previous paper [14]. In this paper, comparative study of the proposed ISODATA GA clustering method with the conventional clustering methods is described. In the next section, theoretical backgrounds on the widely used conventional clustering methods and Genetic Algorithm: GA 1 is reviewed followed by the proposed clustering method based on ISODAT with GA. Then experimental result with simulation data of concave shaped distribution of data is shown for demonstration of effectiveness of the proposed method followed by experimental results with UCI repository 2 of standard datasets for machine learning. In particular, clustering performance of the proposed GA based ISODATA clustering method is compared to those of the other conventional clustering methods. Finally, conclusion and some discussions are described. Theoretical Background 1 http://www2.tku.edu.tw/~tkjse/8-2/8-2-4.pdf 2 http://archive.ics.uci.edu/ml/support/Iris