Hajira Jabeen et al. / International Journal of Engineering Science and Technology Vol.2 (2), 2010, 94-103 Review of Classification Using Genetic Programming HAJIRA JABEEN* AND ABDUL RAUF BAIG National University of Computer and Emerging Sciences, Islamabad, Pakistan ABSTRACT Genetic programming (GP) is a powerful evolutionary algorithm introduced to evolve computer programs automatically. It is a domain independent, stochastic method with an important ability to represent programs of arbitrary size and shape. Its flexible nature has attracted numerous researchers in data mining community to use GP for classification. In this paper we have reviewed and analyzed tree based GP classification methods and propose taxonomy of these methods. We have also discussed various strengths and weaknesses of the technique and provide a framework to optimize the task of GP based classification. Keywords: Data Classification, Genetic Programming, Survey, Taxonomy 1. INTRODUCTION Genetic Programming (GP) introduced by Koza in 1992 is an evolutionary algorithm designed for automatically constructing and evolving computer programs. This innovative flexible and interesting technique has been applied to solve numerous interesting problems. Classification is one of the ways to model the problems of face recognition, speech recognition, fraud detection and knowledge extraction from databases. Data Classification can be defined as assigning a class label to a data instance based upon knowledge gained from previously seen class labeled data. Various classification algorithms have been proposed and are being used depending upon their simplicity, understandability or accuracy. Simpler techniques like decision trees are simple and understandable but applicable to small data sets only. On the other hand statistical techniques or Neural Networks are not easily comprehensible. Evolutionary algorithms like Genetic Algorithms (GA) (1) have been found successful in solving classification problems. GP has emerged as an extension of GA proposed by Cramer (2) and Schmidhuber (3). GP differ from GA in the ability to evolve variable length solutions (computer programs). Later, Koza (4) used the term GP and popularized this technique as a new evolutionary algorithm rather than an extension of GA. GP has emerged as a powerful tool for classifier evolution. To date, many variations of GP have been introduced to handle the classification, this includes Linear GP, Grammar based GP, Graph based GP and Tree based GP (5). These variations differ in representations of solutions. GP works by evolving a population of randomly created initial programs using a fitness measure. It selects fitter ones to take part in the evolution to efficiently search for desired efficient solution. The basic GP algorithm is similar to any evolutionary algorithms and works as follows. Algorithm GP Evolution Step 1. Begin Step 2. Define pop-size as desired population size Step 3. Randomly initialize pop-size population Step 4. While (Ideal best found or certain number of generations met) o Evaluate fitness o While(number of children=population size) o Select parents o Apply evolutionary operators to create children o End while Step 5. End While Step 6. Return Best solution Step 7. End ISSN: 0975-5462 94