Learning Invariances with Stationary Subspace Analysis Frank C. Meinecke 1 , Paul von B¨ unau 1 , Motoaki Kawanabe 2 and Klaus-R. M¨ uller 1 meinecke@cs.tu-berlin.de, buenau@cs.tu-berlin.de, motoaki.kawanabe@first.fraunhofer.de, klaus-robert.mueller@tu-berlin.de 1 Machine Learning Group, Dept. Computer Science, TU Berlin Franklinstr. 28/29, FR 6–9, 10587 Berlin, Germany 2 Intelligent Data Analysis Group, Fraunhofer FIRST.IDA Kekul´ estr. 7, 12489 Berlin, Germany Abstract Recently, a novel subspace decomposition method, termed ’Stationary Subspace Analysis’ (SSA), has been pro- posed by B ¨ unau et al. [10]. SSA aims to find a linear projec- tion to a lower dimensional subspace such that the distribu- tion of the projected data does not change over successive epochs or sub-datasets. We show that by modifying the loss function and the optimization procedure we can obtain an algorithm that is both faster and more accurate. We discuss the problem of indeterminacies and provide a lower bound on the number of epochs that is needed. Finally, we show in an experiment with simulated image patches, that SSA can be used favourably in invariance learning. 1. Introduction In many statistical modelling applications a key assump- tion is that the distribution of the observed data is stationary. Ordinary regression and classification models, for instance, rely on the fact that we can generalize from a sample to the population. Differences in training and test distribu- tions can cause severe drops in performance [8], because the paradigm of minimizing the expected loss approximated by the training sample is no longer consistent. The same holds true for unsupervised approaches such as i.e. cluster- ing or ICA. However, many real data generating processes are inherently non-stationary, calling these assumptions into question. Researchers have long tried to account for that. In infer- ential statistics, the celebrated Heckman [3] bias correction model attempts at obtaining unbiased parameter estimates under sample selection bias; cointegration methods [2] aim at discovering stable relationships between non-stationary time-series. In order to improve predictive performance several methods have been developed, ranging from ex- plicit modelling of the non-stationarity over invariant fea- ture extraction to covariate shift adaption by reweighting [6, 1, 9]. However, many of these approaches make quite re- strictive model assumptions or require specific prior knowl- edge about the application domain. More recently, B¨ unau et al. [10] proposed a novel sub- space decomposition method that separates a multivariate data set into a stationary– and a non-stationary part. The separation is achieved by a linear transformation in the data space yielding two orthogonal subspaces, one of them con- taining stationary– and the other non-stationary data, hence the name Stationary Subspace Analysis (SSA). In the SSA algorithm as proposed in [10], the data set is split into N epochs of data and a subspace is called stationary if the em- pirical distribution (approximated by mean and covariance) in this subspace is identical over all epochs. To find this subspace, the proposed algorithm minimizes a sum of pair- wise Kullback-Leibler Divergences. SSA has already been applied successfully to brain-computer interfacing (BCI) where one aims to transmit information directly from the brain to a computer without the use of peripheral nerves or muscles, e.g. by measuring the cortical electroencephalo- gram (EEG). Extracting a control signal from this complex and non-stationary signal is known to be a challenging clas- sification task and it has been shown that a restriction of the classification to the stationary subspace significantly lowers the error rate. In this workshop contribution, we propose a new formu- lation of SSA based on a simplified cost function. This cost function still has the same optima, but it is easier and much faster to optimize. Most noticeable, the resulting algorithm scales only linearly in the number of epochs (the former one scaled quadratically). We further propose to optimize the projections to the respective subspaces by a whiten- 87 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 978-1-4244-4441-0/09/$25.00 ©2009 IEEE