Neural Networks 46 (2013) 154–164
Contents lists available at SciVerse ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Genuine cross-correlations: Which surrogate based measure
reproduces analytical results best?
Arlex Oscar Marín García
a,1
, Markus Franziskus Müller
a,b,∗,1
, Kaspar Schindler
c
,
Christian Rummel
d
a
Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, Mexico
b
Centro Internacional de Ciencias, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
c
Department of Neurology, Inselspital, Bern University Hospital, University Bern, Switzerland
d
Support Center for Advanced Neuroimaging (SCAN), Institute of Diagnostic and Interventional Neuroradiology, University Hospital, Inselspital,
University of Bern, Switzerland
article info
Article history:
Received 14 June 2012
Received in revised form 26 March 2013
Accepted 13 May 2013
Keywords:
Genuine correlations
Random correlations
Multivariate analysis
EEG
Epilepsy
abstract
The analysis of short segments of noise-contaminated, multivariate real world data constitutes a
challenge. In this paper we compare several techniques of analysis, which are supposed to correctly
extract the amount of genuine cross-correlations from a multivariate data set. In order to test for the
quality of their performance we derive time series from a linear test model, which allows the analytical
derivation of genuine correlations. We compare the numerical estimates of the four measures with the
analytical results for different correlation pattern. In the bivariate case all but one measure performs
similarly well. However, in the multivariate case measures based on the eigenvalues of the equal-time
cross-correlation matrix do not extract exclusively information about the amount of genuine correlations,
but they rather reflect the spatial organization of the correlation pattern. This may lead to failures
when interpreting the numerical results as illustrated by an application to three electroencephalographic
recordings of three patients suffering from pharmacoresistent epilepsy.
© 2013 Elsevier Ltd. All rights reserved.
1. Introduction
Often, the precise mathematical definition of measures used
for data analysis contains integrals over infinite ranges (like
the Fourier transform, Correlation coefficient, Hilbert transform,
etc.) or limits to zero and/or infinity (e.g. Correlation dimension,
Lyapunov exponent, etc.) (see for example Kantz & Schreiber,
2004). In application to real world data, which are non-stationary
and recorded with finite sampling rate, such requirements cannot
be met. This lack of mathematical precision influences the quality
of the numerical estimates. In the case of cross-correlations the
sampling theorem proves that coarse graining of the data is not
relevant, provided that the highest frequency component of the
signal is smaller than the Nyquist frequency (a property which
often is not checked for). However, replacing the integral over
infinite range with a sum over a finite data segment may cause
a serious side effect called ‘‘random correlations’’ (Laloux, Cizeau,
∗
Corresponding author at: Facultad de Ciencias, Universidad Autónoma del
Estado de Morelos, 62209 Cuernavaca, Morelos, Mexico. Tel.: +52 777 3297020; fax:
+52 777 3297040.
E-mail address: muellerm@buzon.uaem.mx (M.F. Müller).
1
These authors have contributed equally to this work.
Bouchaud, & Potters, 1999; Müller, Baier, Galka, Stephani, & Muhle,
2005; Müller, Baier, Rummel, & Schindler, 2008; Müller et al., 2006;
Plerou et al., 2002; Plerou, Gopikrishnan, Rosenow, Nunes Amaral,
& Stanley, 1999; Rummel, Müller, Baier, Amor, & Schindler, 2010).
Due to the finite size of the data window, the estimate of
the cross-correlation of two completely independent time series
(e.g. independent Gaussian white noise) is generally non-zero. For
the same reason, it is at first also not clear how close the numerical
estimate approximates the correct value in the case when genuine
correlations are present. Even worse, as the amount of random
correlations depends on the relation of the period of the slowest
dominant frequency component of the signal and the length of the
data segment (Rummel et al., 2010), it may change drastically over
time (Müller et al., 2011, 2008). This undesired effect questions the
cross-correlation coefficient as an appropriate technique for the
analysis of real world data.
At this place the question arises why not to simply use an-
other bivariate measure instead of the cross-correlation coeffi-
cient, which additionally to the above mentioned problem only
detects linear interrelationships between two signals. Therefore,
any nonlinear interrelation, which might be expected between sig-
nals measured in real word complex systems, remains unobserved
by definition. However, for two reasons we concentrate on the
numerically robust and computationally cheap cross-correlation
0893-6080/$ – see front matter © 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.neunet.2013.05.009