A New Signal Classification Technique by Means of Genetic Algorithms and kNN Daniel Rivero, Enrique Fernandez-Blanco, Julian Dorado, Alejandro Pazos Department of Information and Communications Technologies University of A Coruña A Coruña, Spain {daniel.rivero, efernandez, julian, apazos}@udc.es Abstract—Signal classification is based on the extraction of several features that will be used as inputs of a classifier. The selection of these features is one of the most crucial parts, because they will design the search space, and, therefore, will determine the difficulty of the classification. Usually, these features are selected by using some prior knowledge about the signals, but there is no method that can determine that they are the most appropriate to solve the problem. This paper proposes a new technique for signal classification in which a Genetic Algorithm is used in order to automatically select the best feature set for signal classification, in combination with a kNN as classifier system. This method was used in a well known problem and its results improve those already published in other works. Keywords-signal classification; Genetic Algorithms; k-Nearest Neighbor; feature extraction; epileptic signal classification I. INTRODUCTION In almost all of the fields in engineering and science, time- dependent events are generated and recorded. Therefore, the analysis and classification of those events have lead to the development of new signal classification tools [1]. These signal classification tools are based on doing a first analysis on one –or several- domains and extracting some features that will be used as inputs to a classifier system. These features extracted have to be very carefully selected, because they will characterize the different signals. For this reason, if these features are the appropriate ones, the classifier system will take as inputs two (or more) easily separable regions in the search space. If the features extracted are not the most suitable, the search space could have several regions with patterns belonging to different classes mixed and therefore the search and classification would be much more complicated. The problem with the feature extraction is that usually there is no knowledge about which features are the most appropriate for the specific problem to be solved. Instead, general knowledge about the nature of the signals is used to select the features, but without knowing if these features are the most suitable. For this reason, the selected features may not be the best ones for the specific classification task This paper describes a novel signal classification approach that uses a Genetic Algorithm (GA) to automatically select the features for the classification. The features extracted are based on the frequency analysis of the signals and, since they are not based on prior knowledge but automatically selected, are expected to generate a more easily separable search space and therefore the classification performance should be better. These automatically selected features are used as inputs to a k-Nearest Neighbor (kNN) algorithm that will be used as classifier system. These features are extracted based on a frequency analysis of the signals, i.e., they will represent different frequency bands. This classification system was tested with a well-known problem: classification of Electroencephalogram (EEG) signals from epileptic and healthy patients. The results obtained by this system show its good performance with a very high accuracy. These results were also compared with other works and, as a result, this comparison shows that the accuracies obtained here are higher than those of previous works. II. PROBLEM DESCRIPTION The problem to be solved consists of the classification of electroencephalogram (EEG) signals of different people, related to the disease of epilepsy. This is one of the most common neurological disorders and is characterized by the occurrence of seizures in the EEG signal [2]. This signal measures the electrical activity of the brain, and its analysis is one of the most important tools for the diagnosis of neurological disorders. Since the sampling of EEG signals produces a high amount of data, a visual analysis cannot actually be completed. For this reason, there have been many efforts to develop tools that process the EEG signals automatically. The database used in this article is publicly available [3], and consists of five sets, named A-E, each containing 100 single-channel EEG signals. The total length of each segment is 4097 samples, with a sampling frequency of 173.61 Hz. Therefore, each of these signals was recorded in 23.6 seconds. These segments were selected and cut out from continuous multichannel EEG recordings after a visual inspection for artifacts, i.e., muscle activities or eye movements. In addition, the segments had to meet a criterion of stationarity. Sets A and B consisted of segments taken from surface EEG recordings that were carried out on five healthy volunteers using a standardized electrode placement scheme. Volunteers were relaxed in an awake state with eyes open (set A) and eyes closed (set B), respectively. Sets C, D, and E 581 978-1-4244-7835-4/11/$26.00 ©2011 IEEE