IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 39, NO. 2, FEBRUARY 2001 303 Pixel Classification Using Variable String Genetic Algorithms with Chromosome Differentiation Sanghamitra Bandyopadhyay, Member, IEEE and Sankar K. Pal, Fellow, IEEE Abstract—The concept of chromosome differentiation, commonly witnessed in nature as male and female sexes, is incorporated in genetic algorithms with variable length strings for designing a nonparametric classification methodology. Its significance in partitioning different landcover regions from satellite images, having complex/overlapping class boundaries, is demonstrated. The classifier is able to evolve automatically the appropriate number of hyperplanes efficiently for modeling any kind of class boundaries optimally. Merits of the system over the related ones are established through the use of several quantitative measure. Index Terms—Genetic algorithms, hyperplane fitting, pattern recognition, quantitative indices, remote sensing images. I. INTRODUCTION C LASSIFICATION of pixels for partitioning different land- cover regions is an important problem in the realm of satel- lite imagery. Satellite images usually have a large number of classes with overlapping and nonlinear class boundaries. Fig. 1 shows, as a typical example, the complexity in scatter plot of 932 points belonging to seven classes taken from the Systeme Probatoire d’Observation de la Terre (SPOT) image of a part of the city of Calcutta. Therefore, for appropriate modeling of such nonlinear and overlapping class boundaries, the utility of an ef- ficient search technique is evident. Moreover, it is desirable that the search technique does not need to assume any particular dis- tribution of the data set and/or class a priori probabilities. Genetic algorithms (GAs) [1] are randomized search and op- timization techniques guided by the principles of evolution and natural genetics. They are efficient, adaptive and robust search processes, producing near optimal solutions and have a large amount of implicit parallelism. The utility of GAs in solving problems that are large, multimodal and highly complex has been demonstrated in several areas [2]. Since satellite images usually have highly nonlinear and overlapping class boundaries, application of GAs for searching for the appropriate ones, par- ticularly under nonparametric conditions (i.e., without assuming class distributions and a priori probabilities), seems appropriate and natural. In the present article, such an attempt is made by demon- strating the effectiveness of a GA-based classifier, called the variable string length GA with chromosome differentiation (VGACD)-classifier, in partitioning different landcover re- gions. In designing the VGACD-classifier, the concepts of Manuscript received January 11, 2000; revised May 3, 2000. The authors are with Machine Intelligence Unit, Indian Statis- tical Institute, Calcutta, India (e-mail: sanghami@www.isical.ac.in; sankar@www.isical.ac.in). Publisher Item Identifier S 0196-2892(01)01172-X. Fig. 1. Scatter plot for a training set of SPOT image of Calcutta containing seven classes (1, ,7). variable length strings in GAs (VGAs) [3] and chromosome differentiation into two classes, male (M) and female (F), have been integrated for approximating the class boundaries of a given training data set nonparametrically by an optimum number of hyperplanes such that the number of misclassified points is minimized. Unlike the conventional GAs, in VGACD, the length of a string is not fixed. Moreover, two classes of chromosomes exist in the population. The crossover, imple- menting a kind of restricted mating, and mutation operators are accordingly defined. The fitness function rewards a string with smaller number of misclassified samples, as well as smaller number of hyperplanes. A comparison, in terms of several quantitative measures [4] and visual quality of the classified images, of the VGACD-classifier with VGA-classifier, i.e., the one incorporating variable string lengths but without chromosome differentiation, and those based on the k-NN rule and the Bayes maximum likelihood (ML) ratio is provided for SPOT image of a part of the city of Calcutta. II. DESCRIPTION OF THE VGACD-CLASSIFIER A. Principle of Hyperplane Fitting As mentioned, the classifier attempts to place a number of hyperplanes in the feature space appropriately such that the number of misclassified training points is minimized. From elementary geometry, the equation of a hyperplane in -dimensional space ( ) is given by (1) where . Here, is the angle that the projection of the unit normal in the space makes with the axis. Since 0196–2892/01$10.00 © 2001 IEEE