Neural Networks 46 (2013) 154–164 Contents lists available at SciVerse ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet Genuine cross-correlations: Which surrogate based measure reproduces analytical results best? Arlex Oscar Marín García a,1 , Markus Franziskus Müller a,b,,1 , Kaspar Schindler c , Christian Rummel d a Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, Mexico b Centro Internacional de Ciencias, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico c Department of Neurology, Inselspital, Bern University Hospital, University Bern, Switzerland d Support Center for Advanced Neuroimaging (SCAN), Institute of Diagnostic and Interventional Neuroradiology, University Hospital, Inselspital, University of Bern, Switzerland article info Article history: Received 14 June 2012 Received in revised form 26 March 2013 Accepted 13 May 2013 Keywords: Genuine correlations Random correlations Multivariate analysis EEG Epilepsy abstract The analysis of short segments of noise-contaminated, multivariate real world data constitutes a challenge. In this paper we compare several techniques of analysis, which are supposed to correctly extract the amount of genuine cross-correlations from a multivariate data set. In order to test for the quality of their performance we derive time series from a linear test model, which allows the analytical derivation of genuine correlations. We compare the numerical estimates of the four measures with the analytical results for different correlation pattern. In the bivariate case all but one measure performs similarly well. However, in the multivariate case measures based on the eigenvalues of the equal-time cross-correlation matrix do not extract exclusively information about the amount of genuine correlations, but they rather reflect the spatial organization of the correlation pattern. This may lead to failures when interpreting the numerical results as illustrated by an application to three electroencephalographic recordings of three patients suffering from pharmacoresistent epilepsy. © 2013 Elsevier Ltd. All rights reserved. 1. Introduction Often, the precise mathematical definition of measures used for data analysis contains integrals over infinite ranges (like the Fourier transform, Correlation coefficient, Hilbert transform, etc.) or limits to zero and/or infinity (e.g. Correlation dimension, Lyapunov exponent, etc.) (see for example Kantz & Schreiber, 2004). In application to real world data, which are non-stationary and recorded with finite sampling rate, such requirements cannot be met. This lack of mathematical precision influences the quality of the numerical estimates. In the case of cross-correlations the sampling theorem proves that coarse graining of the data is not relevant, provided that the highest frequency component of the signal is smaller than the Nyquist frequency (a property which often is not checked for). However, replacing the integral over infinite range with a sum over a finite data segment may cause a serious side effect called ‘‘random correlations’’ (Laloux, Cizeau, Corresponding author at: Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, Mexico. Tel.: +52 777 3297020; fax: +52 777 3297040. E-mail address: muellerm@buzon.uaem.mx (M.F. Müller). 1 These authors have contributed equally to this work. Bouchaud, & Potters, 1999; Müller, Baier, Galka, Stephani, & Muhle, 2005; Müller, Baier, Rummel, & Schindler, 2008; Müller et al., 2006; Plerou et al., 2002; Plerou, Gopikrishnan, Rosenow, Nunes Amaral, & Stanley, 1999; Rummel, Müller, Baier, Amor, & Schindler, 2010). Due to the finite size of the data window, the estimate of the cross-correlation of two completely independent time series (e.g. independent Gaussian white noise) is generally non-zero. For the same reason, it is at first also not clear how close the numerical estimate approximates the correct value in the case when genuine correlations are present. Even worse, as the amount of random correlations depends on the relation of the period of the slowest dominant frequency component of the signal and the length of the data segment (Rummel et al., 2010), it may change drastically over time (Müller et al., 2011, 2008). This undesired effect questions the cross-correlation coefficient as an appropriate technique for the analysis of real world data. At this place the question arises why not to simply use an- other bivariate measure instead of the cross-correlation coeffi- cient, which additionally to the above mentioned problem only detects linear interrelationships between two signals. Therefore, any nonlinear interrelation, which might be expected between sig- nals measured in real word complex systems, remains unobserved by definition. However, for two reasons we concentrate on the numerically robust and computationally cheap cross-correlation 0893-6080/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neunet.2013.05.009