arXiv:1710.10124v1 [math.ST] 27 Oct 2017 Quantifying the Estimation Error of Principal Components Raphael Hauser , J¨ uri Lember , Heinrich Matzinger , Raul Kangro § July 19, 2018 Abstract Principal component analysis is an important pattern recognition and dimen- sionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance Σ that approximates a pop- ulation covariance Σ, and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance Σ rather than the ground-truth Σ, it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The com- bination of recent results in [7] and [9] yields such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size order. 1 Introduction Consider a random row vector X =[X 1 ,X 2 ,...,X p ], defined over a probability space , P) and representing a data sample of p different items of interest, such as the returns of p different financial assets over a given investment period, the relative frequencies of p different words in a randomly chosen text, or the expression rates of p different genes in a cell line exposed to a randomly chosen chemical compound. In many applications the data are approximately Gaussian with some unknown ground-truth covariance matrix * Mathematical Institute, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, U.K.; Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, U.K.; Pembroke College, St Aldates, Oxford OX1 1DW, U.K.; hauser@maths.ox.ac.uk Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA. matzi@math.gatech.edu § Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia 1