JOURNAL zyxwvutsrqpo OF CHEMOMETRICS, VOL. 8, 65-79 zyxwvut (1994) APPLICATION OF A GENETIC ALGORITHM TO FEATURE SELECTION UNDER FULL VALIDATION CONDITIONS AND TO OUTLIER DETECTION RICCARDO LEARDI zyxwv Istituto di Analisi e Tecnologie Farmaceutiche ed Alimentari, Via Brigata Salerno {Ponte), I-16147 zyxw Genova, Italy SUMMARY Genetic algorithms have been proved to be a very efficient method in the feature selection problem. However, as for every other method, if the validation of the results is performed in an incomplete way, erroneous conclusions can be drawn. In this paper a development of a previous genetic algorithm is presented so that a full validation of the results can be obtained. Furthermore, this algorithm has been shown to perform very well also as an outlier detector, allowing easy identification of the presence of outliers even in cases where the ‘classical’ techniques fail. KEY WORDS Genetic algorithms Full validation Feature selection Outlier detection Multivariate analysis 1. INTRODUCTION 1.1. Genetic algorithms Genetic algorithms (GAS)’-’ are being used more and more as optimization techniques in situations where classical techniques such as simplex or experimental design cannot be applied. They try to simulate the evolution of living beings under the basic assumption that the generation deriving from mating of the best individuals will be more fitted to the surrounding environment than the original generation. In GAS this is made possible by the fact that the experimental conditions under which an experiment is performed (i.e. the settings of the variables) are considered to be the genetic material of a living being and the experimental response is considered to be a measure of the fitness to the environment. For the sake of simplicity the genetic material is assumed to be a single chromosome in which each gene corresponds to a variable of the process. Very briefly, in a GA three basic steps are involved. zyxwv 1. Creation of the initial population. A certain number of experimental conditions (chromosomes) are randomly selected by drawing the values of the single variables. 2. Cross-over. From the actual population a new population is obtained by performing a certain number of ‘matings’ between chromosomes, randomly chosen but in such a way that the best individuals (i.e. those producing the best experimental responses) have a 0886-9383/94/010065- 15$12.50 zyxwvu 0 1994 by John Wiley & Sons, Ltd. Received 9 February 1993 Accepted 28 July 1993