Localized short-range correlations in the spectrum of the equal-time correlation matrix Markus Müller, * Yurytzy López Jiménez, Christian Rummel, and Gerold Baier Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, México Andreas Galka Institut für Experimentelle und Angewandte Physik, Christian-Albrechts-Universität, 24098 Kiel, Germany and Institute of Statistical Mathematics (ISM), Minami-Azabu 4-6-7, Minato-ku, Tokyo 106-8569, Japan Ulrich Stephani and Hiltrud Muhle Klinik für Neuropädiatrie, Christian-Albrechts-Universität, 24105 Kiel, Germany Received 15 May 2006; published 24 October 2006 We suggest a procedure to identify those parts of the spectrum of the equal-time correlation matrix C where relevant information about correlations of a multivariate time series is induced. Using an ensemble average over each of the distances between eigenvalues, all nearest-neighbor distributions can be calculated individu- ally. We present numerical examples, where a information about cross correlations is found in the so-called “bulk” of eigenvalues which generally is thought to contain only random correlations and where b the information extracted from the lower edge of the spectrum of C is statistically more signiﬁcant than that extracted from the upper edge. We apply the analysis to electroencephalographic recordings with epileptic events. DOI: 10.1103/PhysRevE.74.041119 PACS numbers: 02.50.Sk, 05.45.Tp, 89.75.-k, 05.10.-a I. INTRODUCTION In recent years, the application of tools known from ran- dom matrix theory RMT to time series analysis has become more and more popular. The application of RMT techniques to a variety of multivariate data sets like ﬁnancial data 1–6, electroencephalographic 7, magnetoencephalographic re- cordings 8, climate data 9, internet trafﬁc 10, and others has been reported. One of the main goals of the employment of RMT measures in time series analysis is to separate in the spectrum of the equal-time correlation matrix C genuine in- formation about the correlation structure from random corre- lations generated by the ﬁnite size of the time window used to construct C and noise. In general, the assumption was made that those eigenvalues  and respective eigenvectors v  of the correlation matrix, which can be described by random matrix ensembles are dominated by random correlations and noise, whereas deviations from the RMT behavior indicate true information about the correlation structure. The natural null hypothesis for the comparison of empirical results is Wishart ensembles WE, i.e., ensembles of correlation ma- trices constructed from signals of independent Gaussian white noise. WE are characterized by two parameters, the length T and the dimension M of the multivariate data set. In Ref. 1, properties of the empirical correlation matrix constructed from ﬁnancial data have been studied. A clear separation of a few large eigenvalues from the remaining ones was observed. It was shown, that apart from the few largest , the level density  of the empirical correlation matrix can be approximately ﬁtted by the analytical formula for the WE, which is valid in the limit T , M →  with T / M = cst.  1. In Ref. 5 it was demonstrated see Fig. 3 of Ref. 5 that the observed deviations from the level density of the corresponding WE are not caused by the ﬁnite size of the empirical data set. It was argued that these deviations are due to the inﬂuence of the few well-separated large eigenvalues. The results of Ref. 1 have been conﬁrmed by a series of subsequent papers, analyzing not only the level density but also applying more sophisticated RMT tools, which measure the correlation properties of the spectrum of eigenvalues, such as, e.g., the nearest-neighbor distribution Ps or the number variance  2 lsee, e.g., Refs. 11,12. In all those papers empirical results have been compared to the analyti- cally known ones of the Gaussian orthogonal ensemble GOE. Strictly speaking, an ensemble of empirical or ran- dom correlation matrices does not belong to the GOE, but as the differences of the statistical properties between WE and GOE decay rapidly as one goes away from zero in the spec- trum of the correlation matrix 5, the GOE seems to be an appropriate choice for a null hypothesis for these measures. Independently of the type of data considered, the applica- tion of RMT tools seemed to conﬁrm the statement that the “bulk” of eigenvalues i.e., all those below a few, well- separated ones are strongly contaminated by noise and con- sequently do not contain any valuable information. Within the statistical errors, the nearest-neighbor statistics as well as the results for the number variance calculated from the em- pirical data coincide very well with the universal properties of the GOE. These results led many authors to the generally accepted conclusion that only the largest eigenvalues and their respective eigenvectors contain relevant information, while the remaining part of the spectrum, the bulk, is mainly dominated by noise. Such statements are in agreement with the philosophy of the principal component analysis PCA, where only a certain number of the largest eigenstates of the covariance matrix are used to project the data onto its prin- cipal components 13. *Electronic address: muellerm@buzon.uaem.mx PHYSICAL REVIEW E 74, 041119 2006 1539-3755/2006/744/0411199 ©2006 The American Physical Society 041119-1