A framework for a feedback process to analyze and personalize a document vector space in a feature extraction model Kosuke Takano Æ Xing Chen Æ Keisuke Masuda Published online: 26 July 2009 Ó Springer Science+Business Media, LLC 2009 Abstract In this paper, we present a framework for a feedback process to implement a highly accurate document retrieval system. In the system, a document vector space is created dynamically to implement retrieval processing. The retrieval accuracy of the system depends on the vector space. When the vector space is created based on a specific purpose and interest of a user, highly accurate retrieval results can be obtained. In this paper, we present a method for analyzing and personalizing the vector space according to the purposes and interests of users. In order to optimize the document vector space, we defined and implemented functions for the operations of adding, deleting and weighting the terms that were used to create the vector space. By exploiting effec- tively and dynamically the classified-document information related to the queries, our methods allow users to retrieve relevant documents for their interests and purposes. Even if the search results of the initial retrieval space are not appropriate, by applying the proposed feedback operations, our proposed method effectively improves the search results. We also implemented an experimental search system for semantic document retrieval. Several experimental results including comparisons of our method with the traditional relevance feedback method is presented to clarify how retrieval accuracy was improved by the feedback process and how accurately documents that satisfied the purpose and interests of users were extracted. Keywords Information retrieval Semantic search Vector space model Feedback method Feature extraction 1 Introduction In advanced computer, network, and database environ- ments, increasingly large amounts of digital document data are being generated. It is very important to be able to search and retrieve appropriate information from these huge information resources to satisfy as nearly as possible the intentions, purposes and situations of searchers. In the research area of information retrieval, it is gen- erally known that search methods in vector space models effectively allow users to access required information based on similarities among the query and the documents being searched [1, 25, 30]. However, most of the search methods in the vector space model exploit the set of the user’s search keywords and their term-weighting values such as term frequency (TF) and inverse document fre- quency (IDF) in the retrieval process. It is however still a challenging task to retrieve the required documents based on their meaning and content. Latent Semantic Indexing (LSI) [8] achieves semantic document retrieval by reducing the dimensionality of a retrieval vector space; however, LSI relies on calculating the singular value decomposition (SVD) to create the retrieval vector space, and it is known that SVD is computationally expensive. We have proposed a feature extraction model (FEM) [5], which offers a light-weight method for creating a K. Takano (&) X. Chen Department of Information and Computer Sciences, Faculty of Information Technology, Kanagawa Institute of Technology, 1030 Shimo-ogino, Atsugi, Kanagawa 243-0292, Japan e-mail: takano@ic.kanagawa-it.ac.jp X. Chen e-mail: chen@ic.kanagawa-it.ac.jp K. Masuda Graduate School of Information and Computer Sciences, Kanagawa Institute of Technology, 1030 Shimo-ogino, Atsugi, Kanagawa 243-0292, Japan e-mail: s065819@cce.kanagawa-it.ac.jp 123 Inf Technol Manag (2009) 10:151–176 DOI 10.1007/s10799-009-0055-4