Information Processing % Printed in Great Britain. 1990 03w4573/90 $3.00 + .@I Copyright 0 1990 zyxwvutsrqponmlkjihgfedcbaZ Pergamon Press plc zyxwvutsrqp SILOL: A SIMPLE LOGICAL-LINGUISTIC DOCUMENT RETRIEVAL SYSTEM TENGKU M. T. SEMBOK* and C. J. VAN RIJSBERGEN Department of Computing Science, University of Glasgow Glasgow Cl2 8QQ, United Kingdom (Received 26 April 1989; accepted in final form 6 June 1989) Abstract -This paper introduces a logical-linguistic model of document retrieval systems and describes an implementation of a system called SILOL which is based on this model. SILOL uses a shallow semantic translation of natural language texts into a first order predicate representation in performing a document indexing and retrieval process. Some preliminary experiments have been carried out to test the retrieval effectiveness of this system. The results obtained show improvements in the level of retrieval effectiveness, which demonstrate that the approach of using a semantic theory of natural language and logic in document retrieval systems is a valid one. 1. INTRODUCTION Until now, almost all of the work in information retrieval (IR) has been based on the assumption that a formal notion of meaning is not required to solve IR problems. The key- words approach, where absence or presence of keywords and their distributions are the only information being considered, has been typically assumed by many researchers to be sufficient. However, some have concluded that this assumption is wrong [l]. The keywords approach with statistical techniques has reached its theoretical limit and further attempts for improvement are considered a waste of time. Progress towards new models which incorporate the notion of meaning has been very slow. It has been suggested that some attempt should be made to develop a naive model which uses more than just keywords as the content of each document in the system. This paper is a first attempt to describe a document retrieval system which is based on a sim- ple logical-linguistic framework. In this system the indexing of documents and queries is achieved through semantic translation of natural language into a first order predicate rep- resentation. A retrieval strategy is based on Iogical implication using Prolog matching and unification primitives coupled with meta level constructs to handle uncertainty in evalu- ating similarity values between documents and queries. 2. DEFINITION IR can be defined broadly as the study of how to determine and retrieve from a corpus of stored information the portions which are responsive to particular information needs. Let us assume that there is a store consisting of a large collection of information on some particular topic, or a combination of various topics. The information may be stored in a highly structured form or in an unstructured form, depending upon its application. A user of the store, at times, seeks certain information which he may not know. He, therefore, has to express his information need as a request for information in one form or another. Thus IR is concerned with the determining and retrieving of information that is relevant or likely to be relevant to his information need as expressed by his request. Some recent research in IR has demonstrated a wide range of topics encompassed by this definition, e.g., document retrieval systems, database management systems, office automation, question-answering systems, expert systems, etc. [2]. *T. M. T. Sembok is a lecturer in the Department of Computer Science, National University of Malaysia, Bangi, Malaysia, and currently on study leave at University of Glasgow supported by Malaysian Grant. 111