Improving GP Classifier Generalization Using a Cluster Separation Metric Ashley George, Malcolm I. Heywood Dalhousie University, Faculty of Computer Science 6050 University Avenue, Halifax, Nova Scotia, Canada, B3H 1W5 ageorge@cs.dal.ca, mheywood@cs.dal.ca ABSTRACT GeneticProgrammingoffersfreedominthedefinitionofthe costfunctionthatisunparalleledamongsupervisedlearning algorithms. However,thisfreedomgoeslargelyunexploited inpreviouswork.Here,werevisitthedesignoffitnessfunc- tionsforgeneticprogrammingbyexplicitlyconsideringthe contributionofthewrapperandcostfunction. Withinthe context of supervised learning, as applied to classification problems,aclusteringmethodologyisintroducedusingcost functions which encourage maximization of separation be- tween in and out of class exemplars. Through a series of empiricalinvestigationsofthenatureofthesefunctions,we demonstrate that classifier performance is much more de- pendable than previously the case under the genetic pro- grammingparadigm. Categories and Subject Descriptors I.2.2[Artificial Intelligence]: AutomaticProgramming General Terms Algorithms,Experimentation,Performance Keywords geneticprogramming,clustering,classification,evaluation 1. INTRODUCTION OneofthepurportedadvantagesofGeneticProgramming (GP)relativetoothersupervisedlearningalgorithmsisthat thereismuchmorefreedominhowthefitness(cost)func- tion is expressed. For example, neural networks typically requireacostfunctionthatissmoothandthereforediffer- entiable[1],whereasnosuchrequirementexistsforGP[3]. To date, however, GP fitness functions do not necessarily build on this freedom in a manner designed to encourage the identification of robust solutions [2]. In this work the design of fitness functions for classification problems is re- visitedbyexplicitlyconsideringthecontributionsmadeby wrapperandcostfunction. Specifically,theGPwrapperis usedtotransformthe’raw’GPoutput(gpout),avaluelim- itedonlybythenumericalrangeofthecomputingplatform, toanintervalappropriatefordistinguishingclass(y ). Here binary classification problems are considered, thus typical rangeswouldbe[0,1]or[-1,1]. Copyright is held by the author/owner(s). GECCO’06, July 8–12, 2006, Seattle, Washington, USA. ACM 1-59593-186-4/06/0007. Table 1: Wrapper-Distance Metrics Label Wrapper ErrorMetric Hits y = 0 if (gpout ≤ 0) 1 otherwise 1 - (di ⊕ yi ) Square y =2 × (1 + exp(-gpout)) -1 - 1 (di - yi ) 2 In the case of a switching wrapper, the ensuing fitness (cost)functionthenmerelycountsthenumberofmisclassi- fiedtrainingexemplars(hits). Thehypothesisofthiswork isthatsuchanapproachtodesigningawrapper-costfunc- tioncombinationresultsinaninefficientsearchprocess,ad- verselyaffectingthegeneralizationoftheresultingclassifier. Insteadwesuggestto’bypass’thewrapper(i.e.thewrapper istheidentityfunction)andinsteadexpresstheproblemof GPclassificationasfindingamappingsuchthatexemplars foreachclassaremappedtodifferentclustersonthe’raw’ GPoutput.Theobjectiveisnowtomaximizetheinter-class separation whilst minimizing the intra-class variance. This corresponds to maximizing the cluster separation distance [4]. 2. FITNESS FUNCTIONS AND WRAPPERS SinceKozapopularizedGeneticProgramming[3],thewrap- perforclassificationproblemshasfrequentlytakentheform of a switching function. Such a wrapper limits the fitness functiontoacountofthenumberofcorrectlyclassifiedex- emplars, or hits (a binary distance metric). Conversely, an activation function that is smooth (and monotonically increasing) provides the basis for exemplar errors that in- crease as the transition point of the activation function is approached,aswellaspenalizingexemplarsthatareexplic- itly misclassified. Moreover, as each error distance is now realvalued,wearealsofreetobuildafitness(cost)function that penalizes or weights errors in different ways. In this workwewillconsiderfitnessfunctionsbasedonasquared errorpenaltyinadditiontotheswitchingtypewrapper.Ta- ble1summarizestheassociationbetweenwrapperanderror metric.Inallcasesthefitnessfunctionismerelythesumof errortakenacrossalltrainingexemplarsforagivenwrapper /errordistancemetriccombination. 2.1 A Fitness Function based on Cluster Separation Asindicatedabove,fora’robust’classifierorgoodgener- 939