(IJARAI) International Journal of Advanced Research in Artificial Intelligence, Vol. 2, No.11, 2013 32 | Page www.ijarai.thesai.org Fisher Distance Based GA Clustering Taking Into Account Overlapped Space Among Probability Density Functions of Clusters in Feature Space Kohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan Abstract—Fisher distance based Genetic Algorithm: GA clustering method which takes into account overlapped space among probability density functions of clusters in feature space is proposed. Through experiments with simulation data of 2D and 3D feature space generated by random number generator, it is found that clustering performance depends on overlapped space among probability density function of clusters. Also it is found relation between cluster performance and the GA parameters, crossover and mutation probability as well as the number of features and the number of clusters. Keywords—GA clustering; Fisher distance; crossover; mutation; overlapped space among probability density functions of clusters I. INTRODUCTION Genetic Algorithm: GA clustering is widely used for image clustering. It allows relatively good clustering performance with marginal computer resources. In particular, Fisher distance based GA clustering is well known [1]. It uses Fisher distance as fitness function of GA. It, however, is not clear the characteristics of Fisher distance based GA clustering. For instance, relation between clustering performance and overlapped space among probability density function of clusters. Also, relation between cluster performance and the GA parameters, crossover and mutation probability as well as the number of features and the number of clusters are unclear [2]. The paper describes the aforementioned characteristics through simulation studies with random number generator derived simulation data with the different parameters. Also, the results from GA based clustering are compared to the Simulated Annealing based clustering [3]. The following section describes fundamental theoretical background of the Fisher distance based GA clustering method followed by some experimental results with simulation data. Then finally, conclusion and remarks are described together with some discussions. II. PROPOSED MODEL A. Fisher distance based GA clustering Fisher distance between two probability density functions of two features is defined as equation (1) (1) where , , , denotes mean and variance of two features. The most appropriate linear discrimination function for multi-dimensional feature space is expressed as equation (2). (2) Discrimination function is illustrated in Fig.1. The line with arrow (linear discrimination border) in the Fig.1 in the orthogonal coordinate is discrimination function between two classes (two clusters). The slant coordinate of probability density functions for two classes implies cross section of the one dimensional probability functions for two classes. Fig. 1. Illustrative view of discrimination function in two dimensional feature space for two clusters