2011 IEEE International Workshop on Machine Learning for Signal Processing September 18-21, 2011, Beijing, China 978-1-4577-1623-2/11/$26.00 c 2011 IEEE DETERMINING THE NUMBER OF SOURCES IN HIGH-DENSITY EEG RECORDINGS OF EVENT-RELATED POTENTIALS BY MODEL ORDER SELECTION Fengyu Cong 1 , Zhaoshui He 2,3 , Jarmo Hämäläinen 4 , Andrzej Cichocki 2 , Tapani Ristaniemi 1 1. Department of Mathematical Information Technology, University of Jyväskylä, 40014, Finland; 2 Lab for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Japan; 3. Faculty of Automation, Guangdong University of Technology, Guangzhou, 510006; 4. Department of Psychology, University of Jyväskylä, 40014, Finland. ABSTRACT To high-density electroencephalography (EEG) recordings, determining the number of sources to separate the signal and the noise subspace is very important. A mostly used criterion is that percentage of variance of raw data explained by the selected principal components composing the signal space should be over 90%. Recently, a model order selection method named as GAP has been proposed. We investigated the two methods by performing independent component analysis (ICA) on the estimated signal subspace, assuming the number of selected principal components composing the signal subspace is equal to the number of sources of brain activities. Through examining wavelet- filtered EEG recordings (128 electrodes) of ERPs, ICA with the reference to GAP decomposed 14 selected principal components reliably into 14 independent components, and ICA decomposition with the variance explained method was not reliable, indicating that the number of sources, as well as the signal subspace, should be well estimated through GAP. Index Terms— Event-related potential, independent /principal component analysis, model order selection, number of sources, reliability, wavelet filter 1. INTRODUCTION Elelctroencephalography (EEG) recordings can be modelled as the linear transformation of latent variables under EEG frequencies [1]. They represent the summation of scaled versions of electrical brain activities and artifacts including eye blinks, muscle activities, and so on, produced by participants during the experiment [2-4]. Thus, it is always desired to remove the artifacts and extract the interesting electrical brain activities from recordings at the scalp. Indeed, EEG can reveal two types of electrical brain activities including the spontaneous ongoing and the event- related potentials (ERPs) [5]. To produce ERPs, EEG recordings of many single trials are often collected and are averaged over those single trials [6]. However, the averaged EEG recordings of ERPs are still mixtures of electrical brain activities. In order to extract ERPs from EEG recordings, the digital filter, wavelet filter, principal component analysis (PCA), independent component analysis (ICA), and so on, have been applied [7]. PCA or ICA is based on the linear transformation model of EEG recordings. In this model, the electrical activities in the brain are the sources, and the EEG recordings at the scalp are the mixtures [2-4]. Regarding EEG collected by the high-density array, it is thought that the number of sources is less than the number of electrodes under the assumption of the discrete source model [8]. In this case, the dimension reduction is often executed before the implementation of ICA [9, 10]. To achieve that, PCA may be firstly implemented on data; then, the principal components corresponding to the first k large eigenvalues are then selected to separate the signal and the noise subspace, assuming that there are k sources in the signal subspace [9-11]. Subsequently, ICA can be performed on the signal space to estimate the desired components of brain activities [9, 10]. Hence, the problem in this context is how many principal components should be chosen. Usually, the number of the selected components is determined by the prior knowledge which is mostly according to people’s experience. For example, the variance explained by the selected principal components is usually over 80% or 90% of the mixtures’ variance [11]. Actually, different people may have different experiences. Therefore, selecting principal components is probably uncertain by using different percentages as the threshold. In fact, ICA has become an important tool in the study of ERPs [12], and EEG recordings are collected with more and more electrodes to completely represent the electrical brain activities [13, 14]. Consequently, to reasonably determine the number of sources for separating the signal and the noise subspace through PCA in high-density EEG recordings becomes very important. However, this problem is not well addressed for ERP studies yet. There are two obstacles to resolve this problem. One is that we do not know the true number of sources in the brain and the other is that how to explicitly validate the effectiveness and rationality of the estimated number is very difficult too. Recently, a simple yet efficient model order selection method to separate the signal and the noise subspace, known as GAP method [15, 16], has been developed to estimate the