PAGE PROOFS 2 Mathematical Foundations Behind Latent Semantic Analysis Dian I. Martin Small Bear Technical Consulting, LLC Michael W. Berry University of Tennessee Latent semantic analysis (LSA)is based on the concept of vector space mod- els, an approach using linear algebra for effective yet automated informa- tion retrieval. The vector space model (VSM) was developed to handle text retrieval from a large information database where the text is heterogeneous and the vocabulary varies. One of the first systems to use a traditional VSM was the System for the Mechanical Analysis and Retrieval of Text (SMART; Buckley, Allan, & Salton, 1994; Salton & McGill, 1983). Among the notable characteristics of the VSM, used by SMART, is the premise that the meaning of documents can be derived from its components or types. The underlying formal mathematical model of the VSM defines unique vectors for each type and document, and queries are performed by comparing the query representation to the representation of each document in the vector space. Query-document similarities are then based on concepts or similar seman- tic content (Salton, Buckley, & Allan, 1992). 35