Learning Invariances with Stationary Subspace Analysis
Frank C. Meinecke
1
, Paul von B¨ unau
1
, Motoaki Kawanabe
2
and Klaus-R. M¨ uller
1
meinecke@cs.tu-berlin.de, buenau@cs.tu-berlin.de,
motoaki.kawanabe@first.fraunhofer.de, klaus-robert.mueller@tu-berlin.de
1
Machine Learning Group, Dept. Computer Science, TU Berlin
Franklinstr. 28/29, FR 6–9, 10587 Berlin, Germany
2
Intelligent Data Analysis Group, Fraunhofer FIRST.IDA
Kekul´ estr. 7, 12489 Berlin, Germany
Abstract
Recently, a novel subspace decomposition method,
termed ’Stationary Subspace Analysis’ (SSA), has been pro-
posed by B ¨ unau et al. [10]. SSA aims to find a linear projec-
tion to a lower dimensional subspace such that the distribu-
tion of the projected data does not change over successive
epochs or sub-datasets. We show that by modifying the loss
function and the optimization procedure we can obtain an
algorithm that is both faster and more accurate. We discuss
the problem of indeterminacies and provide a lower bound
on the number of epochs that is needed. Finally, we show in
an experiment with simulated image patches, that SSA can
be used favourably in invariance learning.
1. Introduction
In many statistical modelling applications a key assump-
tion is that the distribution of the observed data is stationary.
Ordinary regression and classification models, for instance,
rely on the fact that we can generalize from a sample to
the population. Differences in training and test distribu-
tions can cause severe drops in performance [8], because
the paradigm of minimizing the expected loss approximated
by the training sample is no longer consistent. The same
holds true for unsupervised approaches such as i.e. cluster-
ing or ICA. However, many real data generating processes
are inherently non-stationary, calling these assumptions into
question.
Researchers have long tried to account for that. In infer-
ential statistics, the celebrated Heckman [3] bias correction
model attempts at obtaining unbiased parameter estimates
under sample selection bias; cointegration methods [2] aim
at discovering stable relationships between non-stationary
time-series. In order to improve predictive performance
several methods have been developed, ranging from ex-
plicit modelling of the non-stationarity over invariant fea-
ture extraction to covariate shift adaption by reweighting
[6, 1, 9]. However, many of these approaches make quite re-
strictive model assumptions or require specific prior knowl-
edge about the application domain.
More recently, B¨ unau et al. [10] proposed a novel sub-
space decomposition method that separates a multivariate
data set into a stationary– and a non-stationary part. The
separation is achieved by a linear transformation in the data
space yielding two orthogonal subspaces, one of them con-
taining stationary– and the other non-stationary data, hence
the name Stationary Subspace Analysis (SSA). In the SSA
algorithm as proposed in [10], the data set is split into N
epochs of data and a subspace is called stationary if the em-
pirical distribution (approximated by mean and covariance)
in this subspace is identical over all epochs. To find this
subspace, the proposed algorithm minimizes a sum of pair-
wise Kullback-Leibler Divergences. SSA has already been
applied successfully to brain-computer interfacing (BCI)
where one aims to transmit information directly from the
brain to a computer without the use of peripheral nerves or
muscles, e.g. by measuring the cortical electroencephalo-
gram (EEG). Extracting a control signal from this complex
and non-stationary signal is known to be a challenging clas-
sification task and it has been shown that a restriction of the
classification to the stationary subspace significantly lowers
the error rate.
In this workshop contribution, we propose a new formu-
lation of SSA based on a simplified cost function. This cost
function still has the same optima, but it is easier and much
faster to optimize. Most noticeable, the resulting algorithm
scales only linearly in the number of epochs (the former
one scaled quadratically). We further propose to optimize
the projections to the respective subspaces by a whiten-
87
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops
978-1-4244-4441-0/09/$25.00 ©2009 IEEE