Dimension Reduction in EEG Data using Particle Swarm Optimization Adham Atyabi, Martin Luerssen, Sean Fitzgibbon, David M. W. Powers School of Computer Science, Engineering, and Mathematics, Flinders University, Adelaide, Australia {Adham.Atyabi, Martin.Luerssen, Sean.Fitzgibbon, David.Powers}@flinders.edu.au Abstract—EEG data contains high-dimensional data that re- quires considerable computational power for distinguishing dif- ferent classes. Dimension reduction is commonly used to reduces the necessary training time of the classifiers with some degree of accuracy lost. The dimension reduction is usually performed on either feature or electrode space. In this study, a new dimension reduction method that reduce the number of electrodes and features using variations of Particle Swarm Optimization (PSO) is used. The variation is in terms of parameter adjustment and adding a mutation operator to the PSO. The results are assessed based on the dimension reduction percentage, the potential of selected electrodes and the degree of performance lost. An Extreme Learning Machine (ELM) is used as the primary classifier to evaluate the sets of electrodes and features selected by PSO. Two alternative classifiers such as Polynomial SVM and Perceptron are used for further evaluation of the reduced dimension data. The results indicate the potential of variations of PSO for reducing up to 99% of the data with minimal performance lost. I. I NTRODUCTION Particle Swarm Optimization (PSO) introduced by Kennedy and Eberhart is an evolutionary method inspired by animal behavior that models social and cognitive interactions of population members who represent possible solutions ex- tracted/generated from the search space. Features such as code simplicity, fast convergence toward optimum, a small number of parameters to adjust, and implementation flexibility are some of the main reasons for using of PSO in variety of problems including EEG signal classification. EEG is the human brain signal recorded from the scalp that reflects the voltage variation between points in the course of time. The signal is highly contaminated from a variety of noise sources including equipmental, environmental, muscular, and so on. EEG based Brain Computer Interface (BCI) systems recog- nize the subject’s intention through referencing EEG patterns to some predefined tasks. the high dimensionality of EEG recordings is a major problem for asynchronous EEG-based BCI systems due to the extensive time cost of classifier learning. This problem can be addressed by minimizing the dimension of the data to allow classifiers to learn the under- lying pattern in a shorter time. The dimension reduction is a delicate process in which it is necessary to preserve the data points that maximize the classifier’s performance. In the EEG domain, this involve selecting a subset of electrodes (i.e. Electrode Reduction (ER)) that are positioned in locations that are optimal for task performance or selecting feature points (i.e. Feature Reduction (FR)) that are important for identifying the underlying patterns in the classifier. This procedure is task and participant dependent. That is to say the optimal set of electrodes that best capture different tasks can originate from different regions of the scalp due to the nature of the task, and these regions are not necessary consistent among subjects. Decomposition based methods such as Common Spa- tial Pattern (CSP), Principle Component Analysis (PCA), and Singular Value Decomposition (SVD) are common choices for feature and electrode reduction considering that they require the entire trial duration (all epochs), which is not possible in asynchronous systems . In addition, in these methods, the reduction procedure does not take into account the feasibility of the outcome with the classifier. This means that even though the decomposed set represent a subset of the data that usually contains highest portion of variance within the data points, there is no guarantee that this subset improves the classifica- tion performance, since the selection is not made based on the consideration of their interaction with the classifier’s learning rule. Several EEG studies investigated the potential of EA based FR and ER and reported improved classification performance (especially with ER) [17]-[28]; however, to the best of our knowledge, a study in which FE and ER are performed together does not exist. Sabeti et al. in [29] employed two separate GA based methods for FR and ER in a visual potential stimuli EEG for identifying Schizophrenic subjects and reported up to 70% classification accuracy. One of the shortcomings of the study is the fact that FR and ER are performed in two separated stages, which allows the possibility of selecting a subset of features that might not be optimal for the selected subset of electrodes. This problem can be addressed by combining both stages together. In addition to the mentioned shortcoming, the implemented procedure in most studies allows the contamination of training and testing in the classifiers. This is due to the fact that an EA based method assesses the potential of members of its population based on their achieved classification performance with the testing set, which directs the EA method toward gen- erating a population of solutions that best represent the testing set. This problem can be addressed either by introducing a 3 rd set that is only for a final evaluation or by having an extra internal cross-validation step in which two new sets of training U.S. Government work not protected by U.S. copyright WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia IEEE CEC 1158