Combination retrieval for creating knowledge from sparse document-collection Naohiro Matsumura a,b, * , Yukio Ohsawa a,c,1 , Mitsuru Ishizuka b,2 a PRESTO, Japan Science and Technology Corporation, Kawaguchi Center Building, 4-1-8 Honcho, Kawaguchi-Shi, Saitama 332-0012, Japan b Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan c Graduate School of Business Science, University of Tsukuba, Tokyo, Japan Received 13 January 2003; accepted 16 March 2005 Available online 31 May 2005 Abstract With the variety of human life, people are interested in various matters for each one’s unique reason, for which a machine maybe a better counselor than a human. This paper proposes to help user create novel knowledge by combining multiple existing documents, even if the document-collection is sparse, i.e. if a query in the domain has no corresponding answer in the collection. This novel knowledge realizes an answer to a user’s unique question, which cannot be answered by a single recorded document. In the Combination Retriever implemented here, cost-based abduction is employed for selecting and combining appropriate documents for making a readable and context-reflecting answer. Empirically, Combination Retriever obtained satisfactory answers to user’s unique questions. q 2005 Published by Elsevier B.V. Keywords: Information retrieval; Cost-based abduction; Knowledge creation 1. Introduction People are interested in personal and unique matters, e.g. very rare health condition, friction with friends, etc. They often hesitate to consult a human about such unique matters, and worry in their own minds. In such a case, entering such interests to a search engine and reading the output documents is a convenient way which may serve satisfac- tory information. However, a document-collection of a search engine, even though they may seem to include a lot of documents, is too sparse for answering a unique question: They have only past information not satisfactory for answering novel queries. For overcoming this situation, a search engine is desired to help user create knowledge from sparse documents. For this purpose, we propose a novel information retrieval method named combination retrieval. The basic idea is that an appropriate combination of existing documents may lead to creating novel knowledge, although each one document may be short of answering the novel query. Based on the principle that combining ideas triggers the creation of new ideas [1], we present a system to obtain and present an optimal combination of documents to the user, optimal in that the solution forms a document-set which is the most readable (understandable) and reflecting the user’s context. The remainder of this paper is organized as follows. In Section 2, the meaning of combination retrieval in this paper is shown by comparison with previous information retrieval methods. The mechanism of the implemented system, Combination Retriever, is described, in Section 3. We show the experiments and the results in Section 4, showing the performance of Combination Retriever for medical counseling question-and-answer documents. Knowledge-Based Systems 18 (2005) 327–333 www.elsevier.com/locate/knosys 0950-7051/$ - see front matter q 2005 Published by Elsevier B.V. doi:10.1016/j.knosys.2005.03.003 * Corresponding author present address: Graduate School of Economics, Osaka University, 1–7 Machikaneyama, Toyonaka, Osaka, 560–0043, Japan E-mail addresses: matumura@econ.osaka-u.ac.jp (N. Matsumura), ohsawa@q.t.u-tokyo.ac.jp (Y. Ohsawa), ishizuka@miv.t.u-tokyo.ac.jp (M. Ishizuka). 1 Present address: Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo 2 Present address: Department of Quantum Engineering and System Science, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656