IRQAS: Information Retrieval and Question Answering System Based on A Unified Logical-Linguistic Model TENGKU M. T. SEMBOK* HALIMAH BADIOZE ZAMAN RABIAH ABDUL KADIR *National Defence University of Malaysia Kem Sungai Besi, Kuala Lumpur, Malaysia 57000 MALAYSIA Abstract: - Many existing search engines do not have an important capability, the capability to deduce an answer to a query based on information which reside in various parts of documents. The levels-of-processing theory proposes that there are many ways to process and code information and thus the knowledge representation used as surrogate to documents are qualitatively different. The capability of deduction is much depended on the knowledge representation framework used. We propose a unified logical-linguistic model as knowledge representation framework as a basis for indexing of documents as well as deduction capability to provide answers to queries. The approach applies semantic analysis in transforming and normalising information from natural language texts into a declarative knowledge based representation of first order predicate logic. Retrieval of relevant information can then be performed through plausible logical implication and answer to query is carried out using theorem proving technique. This paper elaborates on the model and how it is used in information retrieval and question answering system as one unified model. Key-Words: - Search Engines, Information Retrieval, Question Answering System, Theorem Proving. 1 Introduction Information Retrieval (IR) can be defined broadly as the study of how to determine and retrieve from a corpus of stored information the portions which are relevant to particular information needs. Let us assume that there is a store consisting of a large collection of information on some particular topics, or combination of various topics. The information may be stored in a highly structured form or in an unstructured form, depending upon its application. A user of the store, at times, seeks certain information which he may not know to solve a problem. He therefore has to express his information need as a request for information in one form or another. Thus IR is concerned with the determining and retrieving of information that is relevant to his information need as expressed by his request and translated into a query which conforms to a specific information retrieval system(IRS) used. An IRS normally stores surrogates of the actual documents to represent the documents and the information stored in them [1](Mizzaro 1998). The information content of the surrogates is one of the main factors that influent the effectiveness of an information retrieval system. We have used logic-based representation to build the surrogates in order to incorporate semantic information of the document [2](Sowa 2000). After the user has obtained the portion of information which deemed relevant by the system, the user might want to investigate further the content of the information by asking more specific questions and get specific answers together with sentences that support the answers. This kind of request for specific information brings us into question and answering (QA) systems arena [3](McGuinnes 2004). Question answering composes of reading comprehension tasks that demonstrates the understanding of the system about the document and means to show and to build up the meaning representation based on syntactic and semantic knowledge as well as the world knowledge on certain domain area under investigation. We have used the same knowledge representation paradigm for this purpose as in representing the surrogates which we term as a Unified Logical Model. In this paper we will describe how document surrogates are built and used to retrieve relevant documents with respect to query. Next we will elaborate how the same surrogates can be used to 7th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING and DATA BASES (AIKED'08), University of Cambridge, UK, Feb 20-22, 2008 ISSN: 1790-5109 Page 460 ISBN: 978-960-6766-41-1