Localized short-range correlations in the spectrum of the equal-time correlation matrix Markus Müller, * Yurytzy López Jiménez, Christian Rummel, and Gerold Baier Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, México Andreas Galka Institut für Experimentelle und Angewandte Physik, Christian-Albrechts-Universität, 24098 Kiel, Germany and Institute of Statistical Mathematics (ISM), Minami-Azabu 4-6-7, Minato-ku, Tokyo 106-8569, Japan Ulrich Stephani and Hiltrud Muhle Klinik für Neuropädiatrie, Christian-Albrechts-Universität, 24105 Kiel, Germany Received 15 May 2006; published 24 October 2006 We suggest a procedure to identify those parts of the spectrum of the equal-time correlation matrix C where relevant information about correlations of a multivariate time series is induced. Using an ensemble average over each of the distances between eigenvalues, all nearest-neighbor distributions can be calculated individu- ally. We present numerical examples, where ainformation about cross correlations is found in the so-called “bulk” of eigenvalues which generally is thought to contain only random correlationsand where bthe information extracted from the lower edge of the spectrum of C is statistically more significant than that extracted from the upper edge. We apply the analysis to electroencephalographic recordings with epileptic events. DOI: 10.1103/PhysRevE.74.041119 PACS numbers: 02.50.Sk, 05.45.Tp, 89.75.-k, 05.10.-a I. INTRODUCTION In recent years, the application of tools known from ran- dom matrix theory RMTto time series analysis has become more and more popular. The application of RMT techniques to a variety of multivariate data sets like financial data 16, electroencephalographic 7, magnetoencephalographic re- cordings 8, climate data 9, internet traffic 10, and others has been reported. One of the main goals of the employment of RMT measures in time series analysis is to separate in the spectrum of the equal-time correlation matrix C genuine in- formation about the correlation structure from random corre- lations generated by the finite size of the time window used to construct Cand noise. In general, the assumption was made that those eigenvalues and respective eigenvectors v of the correlation matrix, which can be described by random matrix ensembles are dominated by random correlations and noise, whereas deviations from the RMT behavior indicate true information about the correlation structure. The natural null hypothesis for the comparison of empirical results is Wishart ensembles WE, i.e., ensembles of correlation ma- trices constructed from signals of independent Gaussian white noise. WE are characterized by two parameters, the length T and the dimension M of the multivariate data set. In Ref. 1, properties of the empirical correlation matrix constructed from financial data have been studied. A clear separation of a few large eigenvalues from the remaining ones was observed. It was shown, that apart from the few largest , the level density of the empirical correlation matrix can be approximately fitted by the analytical formula for the WE, which is valid in the limit T , M with T / M = cst. 1. In Ref. 5it was demonstrated see Fig. 3 of Ref. 5 that the observed deviations from the level density of the corresponding WE are not caused by the finite size of the empirical data set. It was argued that these deviations are due to the influence of the few well-separated large eigenvalues. The results of Ref. 1have been confirmed by a series of subsequent papers, analyzing not only the level density but also applying more sophisticated RMT tools, which measure the correlation properties of the spectrum of eigenvalues, such as, e.g., the nearest-neighbor distribution Psor the number variance 2 lsee, e.g., Refs. 11,12. In all those papers empirical results have been compared to the analyti- cally known ones of the Gaussian orthogonal ensemble GOE. Strictly speaking, an ensemble of empirical or ran- domcorrelation matrices does not belong to the GOE, but as the differences of the statistical properties between WE and GOE decay rapidly as one goes away from zero in the spec- trum of the correlation matrix 5, the GOE seems to be an appropriate choice for a null hypothesis for these measures. Independently of the type of data considered, the applica- tion of RMT tools seemed to confirm the statement that the bulk” of eigenvalues i.e., all those below a few, well- separated onesare strongly contaminated by noise and con- sequently do not contain any valuable information. Within the statistical errors, the nearest-neighbor statistics as well as the results for the number variance calculated from the em- pirical data coincide very well with the universal properties of the GOE. These results led many authors to the generally accepted conclusion that only the largest eigenvalues and their respective eigenvectorscontain relevant information, while the remaining part of the spectrum, the bulk, is mainly dominated by noise. Such statements are in agreement with the philosophy of the principal component analysis PCA, where only a certain number of the largest eigenstates of the covariance matrix are used to project the data onto its prin- cipal components 13. *Electronic address: muellerm@buzon.uaem.mx PHYSICAL REVIEW E 74, 041119 2006 1539-3755/2006/744/0411199©2006 The American Physical Society 041119-1