Semi-Supervised Contrastive Learning for Generalizable Motor Imagery EEG Classification Jinpei Han, Xiao Gu, and Benny Lo Hamlyn Centre, Imperial College London {j.han20, xiao.gu17, benny.lo}@imperial.ac.uk Abstract—Electroencephalography (EEG) is one of the most widely used brain-activity recording methods in non-invasive brain-machine interfaces (BCIs). However, EEG data is highly nonlinear, and its datasets often suffer from issues such as data heterogeneity, label uncertainty and data/label scarcity. To address these, we propose a domain independent, end-to-end semi-supervised learning framework with contrastive learning and adversarial training strategies. Our method was evaluated in experiments with different amounts of labels and an ablation study in a motor imagery EEG dataset. The experiments demon- strate that the proposed framework with two different backbone deep neural networks show improved performance over their supervised counterparts under the same condition. Index Terms—motor imagery, EEG, generalization, semi- supervised learning, contrastive learning I. I NTRODUCTION Motor imagery (MI) based brain-computer interface (BCI) systems allow users to control external devices by mental exe- cution. It plays an important role in rehabilitation engineering, such as interpreting the movement intentions of patients for assistance or therapeutic training. A successful MI BCI system requires a mechanism to record brain signals for interpreta- tion. Among existing recording tools, Electroencephalography (EEG), which captures the brain electrical activities, is the most commonly used method due to its low cost, convenience, non-invasiveness and high temporal resolution. However, as the evoked potential of the brain activities is very weak and can easily be affected by artifact, EEG signals are often inherently noisy, and it is very difficult to relate the noisy signal with mental tasks. Research efforts have been devoted to the development of machine learning algorithms to enable automatic MI classification from EEG signals. Among existing algorithms, conventional ones usually in- volve two steps, feature extraction and classification. Com- monly feature extraction methods, such as Fourier/Wavelet transforms, Common Spatial Patterns [1] are followed by classification methods, such as Linear Discriminant Analy- sis [2], Support Vector Machine (SVM) [3]. These methods typically are deterministic and relatively less complex than deep learning methods and are less prone to overfitting [4]. In recent years, deep learning (DL) models have shown reasonable results in subject dependent classifications of EEG signals. Compared to conventional methods, deep learning models are well suited for end-to-end learning, performing inference from the raw data without prior feature selection [5]. Moreover, DL methods can scale well to large datasets and can simultaneously learn intricate high dimensional features from raw signals. The most commonly used DL models in MI-EEG classification are CNN based models, such as EEGNet [6] and DeepConvNet [5]. They have demonstrated superior performance on many tasks compared to conventional machine learning methods [5]. Despite the success of aforementioned DL methods, there remain several issues with regards to establishing a robust and accurate MI-based BCI system. It remains challenging to get access to large volumes of annotated high-quality data for MI classification training [7], [8]. In fact, knowing what the subjects are actually thinking or doing in cognitive neuroscience experiments could be challenging and which lead to difficulties in obtaining accurate, high-quality annotations and labels for motor-imagery EEG data [7]. Self-supervised learning (SSL) has opened the possibility of making use of self-generated pseudo labels for training on unlabelled data, with limited access to ground truth labels [9]. It performs training on a pretext task that tries to learn effective representations using the unlabeled data and pseudo labels and which is then followed by a downstream discrimination task. Example pretexts including relative positioning, temporal shuffling, contrastive predictive coding for EEG based sleep- staging [7] contrastive muti-segment coding and contrastive multi-lead coding for ECG based arrhythmia detection [10]. However, these pretexts are not suitable for motor imagery datasets since: (1) MI trials are recorded in discrete, short windows instead of continuous recordings like sleep moni- toring or ECG. (2) Researchers usually ask participants to perform different motor imagery tasks in purely random order. (3) Different MI tasks involve activations in different parts of the brain; thus, recordings from different electrodes at one time might not share the same context. Therefore, the conventional SSL assumptions used in biosignals of both temporal and spatial invariance do not hold for MI EEG datasets. Therefore, in our work, a semi-supervised learning structure is proposed, which makes use of a large quantity of unlabeled data and a small number of labels in an end-to-end manner. Inspired by the success of SimCLR for SSL [11], we apply the contrastive learning method to learn representations on unlabelled data. It involves applying different augmentations to the same unlabelled data and contrast against all different sets of data. This approach promotes the model to learn feature