Acta Numerica (2006), pp. 1–57 c Cambridge University Press, 2006 DOI: 10.1017/S0962492904 Printed in the United Kingdom Numerical Linear Algebra in Data Mining Lars Eld´ en Department of Mathematics Link¨ oping University, SE-581 83 Link¨ oping. Sweden E-mail: laeld@math.liu.se Ideas and algorithms from numerical linear algebra are important in several ar- eas of data mining. We give an overview of linear algebra methods in text min- ing (information retrieval), pattern recognition (classification of hand-written digits), and Pagerank computations for web search engines. The emphasis is on rank reduction as a method of extracting information from a data matrix, low rank approximation of matrices using the singular value decomposition and clustering, and on eigenvalue methods for network analysis. CONTENTS 1 Introduction 1 2 Vectors and Matrices in Data Mining 3 3 Data Compression: Low Rank Approximation 7 4 Text Mining 15 5 Classification and Pattern Recognition 31 6 Eigenvalue Methods in Data Mining 40 7 New Directions 50 References 51 1. Introduction 1.1. Data Mining In modern society huge amounts of data are stored in data bases with the purpose of extracting useful information. Often it is not known at the oc- casion of collecting the data what information is going to be requested, and therefore the data base is often not designed for the distillation of any partic- ular information, but rather it is to a large extent unstructured. The science