Computational Biology and Bioinformatics 2014; 2(4): 57-62 Published online September 30, 2014 (http://www.sciencepublishinggroup.com/j/cbb) doi: 10.11648/j.cbb.20140204.12 ISSN: 2330-8265 (Print); ISSN: 2330-8281 (Online) Application of hypercorrelated matrices in ecological research Branko Karadžić, Snežana Jarić, Pavle Pavlović, Saša Marinković, Miroslava Mitrović Institute for Biological Research, ‘Siniša Stanković’ of Belgrade University, Despota Stefana 142, 11000, Belgrade, Serbia Email address: branko@ibiss.bg.ac.rs (B. Karadžić), nena2000@ibiss.bg.ac.rs (S. Jarić), ppavle@ibiss.bg.ac.rs (P. Pavlović), grifon@ibiss.bg.ac.rs (S. Marinković), mmit@ibiss.bg.ac.rs (M. Mitrović) To cite this article: Branko Karadžić, Snežana Jarić, Pavle Pavlović, Saša Marinković, Miroslava Mitrović. Application of Hypercorrelated Matrices in Ecological Research. Computational Biology and Bioinformatics. Vol. 2, No. 4, 2014, pp. 57-62. doi: 10.11648/j.cbb.20140204.12 Abstract: Ecological data matrices often require some form of pre-processing so that any undesirable effects (e.g. the variable size effect) may be removed from multivariate analyses. This paper describes hypercorrelation, a simple data transformation that improves ordination methods significantly. Hypercorrelated matrices efficiently eliminate the ‘arch’ (or Guttman) effect, a spurious polynomial relation between ordination axes. These matrices reduce the sensitivity of correspondence analysis to outliers. Canonical analyses (canonical correspondence analysis and redundancy analysis) of hypercorrelated matrices are resistant to undesirable effects of missing data. Finally, the hypercorrelation extends applicability of “linear ordination method” (principal components analysis and redundancy analysis) to sparse (high beta diversity) matrices. Keywords: Arch Effect, Beta Diversity, (Canonical) Correspondence Analysis, Hypercorrelation, Missing Data, Outliers, Principal Components Analysis, Redundancy Analysis 1. Introduction Correspondence analysis (CA) and principal components analysis (PCA) with their canonical forms are the most frequently used ordination methods in ecology [1-6]. A spurious polynomial relation between ordination axes (the arch effect or Guttman effect) is a well-known drawback of CA and PCA. Compared to CA, PCA is more sensitive to the arch effect, especially in the case when beta diversity (species turnover) along spatial or environmental gradients is high. Therefore, the principal components analysis and its canonical variant (redundancy analysis) are inappropriate for analyses of long environmental gradients. Sensitivity to ‘outliers’ is another fault of CA [1-6]. Application of CA to matrices with sparse vectors often produces uninterpretable results. In such cases, CA highlights the importance of outliers (sparse vectors), obscuring the remaining data variability. Application of canonical correspondence analysis (CCA) and redundancy analysis (RDA) to matrices with missing data may produce quite distorted and ecologically uninterpretable results. In this article, we propose a simple solution for these problems. The solution is based on hypercorrelated matrices. Comparative tests with simulated data revealed that hypercorrelated matrices significantly improve the performance of (canonical) correspondence analysis, PCA and RDA. 2. Decorrelated and Hypercorrelated Matrices Suppose that X (nxm) is a matrix that describes the distribution of n species in m sites. The matrix specifies the position of m points in n-dimensional Euclidean space. The axes of referent space are mutually orthogonal, but not necessarily independent. We may assess the statistical dependence between two variables using either squared or absolute value of Pearson correlation coefficient. Both quantities may vary from 0 (if two variables are statistically independent) to 1 (if two variables are linearly dependent and perfectly correlated). We may either eliminate or increase linear dependence between rows of X. Decorrelation and hypercorrelation, two opposite processes that reduce and increase statistical dependence between variables, may be performed using