A New Signal Classification Technique by Means of
Genetic Algorithms and kNN
Daniel Rivero, Enrique Fernandez-Blanco, Julian Dorado, Alejandro Pazos
Department of Information and Communications Technologies
University of A Coruña
A Coruña, Spain
{daniel.rivero, efernandez, julian, apazos}@udc.es
Abstract—Signal classification is based on the extraction of
several features that will be used as inputs of a classifier. The
selection of these features is one of the most crucial parts, because
they will design the search space, and, therefore, will determine
the difficulty of the classification. Usually, these features are
selected by using some prior knowledge about the signals, but
there is no method that can determine that they are the most
appropriate to solve the problem. This paper proposes a new
technique for signal classification in which a Genetic Algorithm is
used in order to automatically select the best feature set for signal
classification, in combination with a kNN as classifier system.
This method was used in a well known problem and its results
improve those already published in other works.
Keywords-signal classification; Genetic Algorithms; k-Nearest
Neighbor; feature extraction; epileptic signal classification
I. INTRODUCTION
In almost all of the fields in engineering and science, time-
dependent events are generated and recorded. Therefore, the
analysis and classification of those events have lead to the
development of new signal classification tools [1].
These signal classification tools are based on doing a first
analysis on one –or several- domains and extracting some
features that will be used as inputs to a classifier system. These
features extracted have to be very carefully selected, because
they will characterize the different signals. For this reason, if
these features are the appropriate ones, the classifier system
will take as inputs two (or more) easily separable regions in the
search space. If the features extracted are not the most suitable,
the search space could have several regions with patterns
belonging to different classes mixed and therefore the search
and classification would be much more complicated.
The problem with the feature extraction is that usually there
is no knowledge about which features are the most appropriate
for the specific problem to be solved. Instead, general
knowledge about the nature of the signals is used to select the
features, but without knowing if these features are the most
suitable. For this reason, the selected features may not be the
best ones for the specific classification task
This paper describes a novel signal classification approach
that uses a Genetic Algorithm (GA) to automatically select the
features for the classification. The features extracted are based
on the frequency analysis of the signals and, since they are not
based on prior knowledge but automatically selected, are
expected to generate a more easily separable search space and
therefore the classification performance should be better. These
automatically selected features are used as inputs to a k-Nearest
Neighbor (kNN) algorithm that will be used as classifier
system. These features are extracted based on a frequency
analysis of the signals, i.e., they will represent different
frequency bands.
This classification system was tested with a well-known
problem: classification of Electroencephalogram (EEG) signals
from epileptic and healthy patients. The results obtained by this
system show its good performance with a very high accuracy.
These results were also compared with other works and, as a
result, this comparison shows that the accuracies obtained here
are higher than those of previous works.
II. PROBLEM DESCRIPTION
The problem to be solved consists of the classification of
electroencephalogram (EEG) signals of different people,
related to the disease of epilepsy. This is one of the most
common neurological disorders and is characterized by the
occurrence of seizures in the EEG signal [2]. This signal
measures the electrical activity of the brain, and its analysis is
one of the most important tools for the diagnosis of
neurological disorders. Since the sampling of EEG signals
produces a high amount of data, a visual analysis cannot
actually be completed. For this reason, there have been many
efforts to develop tools that process the EEG signals
automatically.
The database used in this article is publicly available [3],
and consists of five sets, named A-E, each containing 100
single-channel EEG signals. The total length of each segment is
4097 samples, with a sampling frequency of 173.61 Hz.
Therefore, each of these signals was recorded in 23.6 seconds.
These segments were selected and cut out from continuous
multichannel EEG recordings after a visual inspection for
artifacts, i.e., muscle activities or eye movements. In addition,
the segments had to meet a criterion of stationarity.
Sets A and B consisted of segments taken from surface
EEG recordings that were carried out on five healthy
volunteers using a standardized electrode placement scheme.
Volunteers were relaxed in an awake state with eyes open (set
A) and eyes closed (set B), respectively. Sets C, D, and E
581 978-1-4244-7835-4/11/$26.00 ©2011 IEEE