Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 266–274, 2009. © Springer-Verlag Berlin Heidelberg 2009 A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing Wen Zhang 1 , Xijin Tang 2 , and Taketoshi Yoshida 1 1 School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Tatsunokuchi, Ishikawa 923-1292, Japan {zhangwen,yoshida}@jaist.ac.jp 2 Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, P.R. China xjtang@amss.ac.cn Abstract. Recently, singular value decomposition (SVD) and its variants, which are singular value rescaling (SVR), approximation dimension equalization (ADE) and iterative residual rescaling (IRR), were proposed to conduct the job of latent semantic indexing (LSI). Although they are all based on linear algebraic method for tem-document matrix computation, which is SVD, the basic motivations behind them concerning LSI are different from each other. In this paper, a series of experiments are conducted to examine their effectiveness of LSI for the practical application of text mining, including information retrieval, text categorization and similarity measure. The experimental results demonstrate that SVD and SVR have better performances than other proposed LSI methods in the above mentioned applications. Meanwhile, ADE and IRR, because of the too much difference between their approximation matrix and original term-document matrix in Frobenius norm, can not derive good performances for text mining applications using LSI. Keywords: Latent Semantic Indexing, Singular Value Decomposition, Singular Value Rescaling, Approximation Dimension Equalization, Iterative Residual Rescaling. 1 Introduction As computer networks become the backbones of science and economy, enormous quantities of machine readable documents become available. The fact that about 80 percent of business is conducted on unstructured information [1] creates a great demand for the efficient and effective text mining techniques, which aim to discover high quality knowledge from unstructured information. Unfortunately, the usual logic-based programming paradigm has great difficulties in capturing fuzzy and often ambiguous relations in text documents. For this reason, text mining, which is also known as knowledge discovery from texts, is proposed to deal with uncertainness and fuzziness of languages and disclose hidden patterns (knowledge) among documents. Typically, information is retrieved by literally matching terms in documents with terms of a query. However, lexical matching methods can be inaccurate when they are