Localized short-range correlations in the spectrum of the equal-time correlation matrix
Markus Müller,
*
Yurytzy López Jiménez, Christian Rummel, and Gerold Baier
Facultad de Ciencias, Universidad Autónoma del Estado de Morelos, 62209 Cuernavaca, Morelos, México
Andreas Galka
Institut für Experimentelle und Angewandte Physik, Christian-Albrechts-Universität, 24098 Kiel, Germany
and Institute of Statistical Mathematics (ISM), Minami-Azabu 4-6-7, Minato-ku, Tokyo 106-8569, Japan
Ulrich Stephani and Hiltrud Muhle
Klinik für Neuropädiatrie, Christian-Albrechts-Universität, 24105 Kiel, Germany
Received 15 May 2006; published 24 October 2006
We suggest a procedure to identify those parts of the spectrum of the equal-time correlation matrix C where
relevant information about correlations of a multivariate time series is induced. Using an ensemble average
over each of the distances between eigenvalues, all nearest-neighbor distributions can be calculated individu-
ally. We present numerical examples, where a information about cross correlations is found in the so-called
“bulk” of eigenvalues which generally is thought to contain only random correlations and where b the
information extracted from the lower edge of the spectrum of C is statistically more significant than that
extracted from the upper edge. We apply the analysis to electroencephalographic recordings with epileptic
events.
DOI: 10.1103/PhysRevE.74.041119 PACS numbers: 02.50.Sk, 05.45.Tp, 89.75.-k, 05.10.-a
I. INTRODUCTION
In recent years, the application of tools known from ran-
dom matrix theory RMT to time series analysis has become
more and more popular. The application of RMT techniques
to a variety of multivariate data sets like financial data 1–6,
electroencephalographic 7, magnetoencephalographic re-
cordings 8, climate data 9, internet traffic 10, and others
has been reported. One of the main goals of the employment
of RMT measures in time series analysis is to separate in the
spectrum of the equal-time correlation matrix C genuine in-
formation about the correlation structure from random corre-
lations generated by the finite size of the time window used
to construct C and noise. In general, the assumption was
made that those eigenvalues and respective eigenvectors v
of the correlation matrix, which can be described by random
matrix ensembles are dominated by random correlations and
noise, whereas deviations from the RMT behavior indicate
true information about the correlation structure. The natural
null hypothesis for the comparison of empirical results is
Wishart ensembles WE, i.e., ensembles of correlation ma-
trices constructed from signals of independent Gaussian
white noise. WE are characterized by two parameters, the
length T and the dimension M of the multivariate data set.
In Ref. 1, properties of the empirical correlation matrix
constructed from financial data have been studied. A clear
separation of a few large eigenvalues from the remaining
ones was observed. It was shown, that apart from the few
largest , the level density of the empirical correlation
matrix can be approximately fitted by the analytical formula
for the WE, which is valid in the limit T , M → with T / M
= cst. 1. In Ref. 5 it was demonstrated see Fig. 3 of Ref.
5 that the observed deviations from the level density of the
corresponding WE are not caused by the finite size of the
empirical data set. It was argued that these deviations are due
to the influence of the few well-separated large eigenvalues.
The results of Ref. 1 have been confirmed by a series of
subsequent papers, analyzing not only the level density but
also applying more sophisticated RMT tools, which measure
the correlation properties of the spectrum of eigenvalues,
such as, e.g., the nearest-neighbor distribution Ps or the
number variance
2
lsee, e.g., Refs. 11,12. In all those
papers empirical results have been compared to the analyti-
cally known ones of the Gaussian orthogonal ensemble
GOE. Strictly speaking, an ensemble of empirical or ran-
dom correlation matrices does not belong to the GOE, but as
the differences of the statistical properties between WE and
GOE decay rapidly as one goes away from zero in the spec-
trum of the correlation matrix 5, the GOE seems to be an
appropriate choice for a null hypothesis for these measures.
Independently of the type of data considered, the applica-
tion of RMT tools seemed to confirm the statement that the
“bulk” of eigenvalues i.e., all those below a few, well-
separated ones are strongly contaminated by noise and con-
sequently do not contain any valuable information. Within
the statistical errors, the nearest-neighbor statistics as well as
the results for the number variance calculated from the em-
pirical data coincide very well with the universal properties
of the GOE. These results led many authors to the generally
accepted conclusion that only the largest eigenvalues and
their respective eigenvectors contain relevant information,
while the remaining part of the spectrum, the bulk, is mainly
dominated by noise. Such statements are in agreement with
the philosophy of the principal component analysis PCA,
where only a certain number of the largest eigenstates of the
covariance matrix are used to project the data onto its prin-
cipal components 13. *Electronic address: muellerm@buzon.uaem.mx
PHYSICAL REVIEW E 74, 041119 2006
1539-3755/2006/744/0411199 ©2006 The American Physical Society 041119-1