Abstract— This paper evaluates supervised and unsupervised adaptive schemes applied to online support vector machine (SVM) that classifies BCI data. Online SVM processes fresh samples as they come and update existing support vectors without referring to pervious samples. It is shown that the performance of online SVM is similar to that of the standard SVM, and both supervised and unsupervised schemes improve the classification hit rate. I. INTRODUCTION lectroencephalogram (EEG) is an electrical signal collected from scalp and represents brain activities. A pattern recognition based brain-computer interface (BCI) discriminates EEG patterns and produces pre-defined commands corresponding to the patterns, to accomplish individuals’ intentions in communicating with a computer. Due to subject’s brain conditions or environmental changes, EEG signals are non-stationary. This phenomenon necessitates adaptive schemes that modify BCI classification parameters during run-time [1]. In this regard, various methods of supervised and unsupervised adaptive schemes have been applied to BCI systems with LDA [2][3] or GMM [4] classifiers, including Kalman filter based methods for online adaptation [5][6]. A classifier is the core of a pattern recognition based BCI, and online training that involves modifying the classification criteria to cope with changes in signal patterns can be an option to build an adaptive BCI. For certain classifiers, such as support vector machine (SVM), the online training accommodates two crucial issues: updating online Training Data Set (TDS) with valid samples, and applying training during BCI operation. Updating TDS inserts fresh samples into TDS using supervised or unsupervised methods. This demands repeating training process during run-time. Using the whole TDS for run-time training is computationally expensive and cannot satisfy some real-time constraints. Hence, an online algorithm that uses merely fresh samples for the training process and in the meantime keeps the old trained patterns is required. This paper presents supervised and unsupervised adaptive schemes with the core of online SVM for a BCI system and M. Asghari Oskoei, PhD Candidate, School of CS and EE, University of Essex, UK, URL: http://privatewww.essex.ac.uk/~masgha/ , Email: masgha@essex.ac.uk J.Q. Gan, Reader, School of CS and EE, University of Essex, Colchester CO4 3SQ, UK, URL: http://dces.essex.ac.uk/staff/jqgan/ , Email: jqgan@essex.ac.uk H. Hu, Professor, School of CS and EE, University of Essex, Colchester CO4 3SQ, UK, URL: http://cswww.essex.ac.uk/staff/hhu/ , Email: hhu@essex.ac.uk compares them with the non-adaptive scheme using both synthetic data and real BCI data. The rest of the paper is organized as follows. Section II introduces the online SVM. Adaptive schemes are presented in Section III. Section IV explains the experiments conducted to examine the performance of the adaptive schemes. Finally, Section V contains the conclusion. II. ONLINE SVM SVM is a kernel-based approach with a strong theoretical background, which has become a popular tool for machine learning tasks involving classification and regression. It has been successfully applied to many applications, ranging from face identification and text categorization, to bioinformatics and database mining. SVM has been developed in three stages. At first, it was introduced to construct a linear optimal hyperplane with the widest margin between two classes. Then, it was extended to an optimal hyperplane in a feature space induced by a kernel function that covers nonlinear boundaries between classes. Finally, it was equipped to address noisy data by allowing some samples violating the margin between classes [7]. For a two-class data set (x 1 , y 1 ),…,(x n , y n ), x i ЄR d and y i Є{±1}, separating hyperplanes between two classes in a feature space mapped by φ(x) are defined as: R b R w b x w d ∈ ∈ = + , , 0 ) ( . ϕ (1) A unique hyperplane that yields the maximum margin of separation between two classes and tolerates misplaced samples with distance (ξ i ≥0) is constructed by solving the following quadratic programming (QP) problem: i i i n i i b w b x w y i C w ξ ξ - ≥ + 2200 + ∑ = 1 ) . ( , 2 1 min 1 2 , (2) The constant CЄ[0,∞] is an upper bound for samples that lie on the wrong side of the hyperplane. It works as a controlling parameter to avoid overfitting problem in classification by creating a trade-off between the capacity of the classifier and error in TDS. Given kernel K(x i ,x j )=φ(x i ).φ(x j ) and weights w=∑α i φ(x i ), a way to solve (2) is via its Lagrangian dual that has been simplified to find the multipliers α i : ) , 0 max( ) , , 0 min( , , 0 ) , ( 2 1 ) ( max , i i i i i i i i i j i j i j i i i i Cy B Cy A B A x x K y J = = ≤ ≤ = - = ∑ ∑ ∑ α α α α α α (3) The objective function in (3) slightly deviates from the standard formulation because it makes the coefficients α i positive when y i = +1 and negative when y i = -1. Solving (3) helps to construct optimal hyperplane (1) and build the Adaptive Schemes Applied to Online SVM for BCI Data Classification Mohammadreza Asghari Oskoei, John Q. Gan, and Huosheng Hu E