ARISTA: knowledge engineering with scientific texts J Kontos The paper presents results of experiments in knowledge engineer- ing with scientific texts by the application of the ARISTA method. ARISTA stands for Automatic Representation Indepen- dent Syllogistic Text Analysis. This method uses natural lan- guage text as a knowledge base in contrast with the methods followed by the prevailing approach, which rely on the translation of texts into some knowledge representation formalism. The experiments demonstrate the feasibility of deductive question- answering and explanation generation directly from texts involv- ing mainly causal reasoning. Illustrative examples of the oper- ation of a prototype based on the ARISTA method and imple- mented in Prolog are presented. knowledge engineering, knowledge representation, natural lan- guage processing Reasoning is discourse in which given some premises something different from the given necessarily follows from the premises. (Aristotle, Topics, 4th century BC) Traditionally, knowledge engineering involves the elici- tation of knowledge from domain experts by a know- ledge engineer and the 'manual' or rather 'mental' for- malization of this knowledge using a knowledge representation formalism. The result of such an activity is a knowledge base that can be processed by an inference engine to generate answers and explanations as a res- ponse to user questions. Experts are, however, often unavailable, and therefore an alternative is to use texts as a supplementary source of knowledge. The automatic processing of such texts may provide a method for a least partial solution of the knowledge acquisition bottleneck problem of expert systems. Therefore, currently a new discipline is emerging, which may be called 'Knowledge Engineering with Texts' (KET). The work reported in this paper addresses KET with scientific texts. An important advantage of KET with scientific texts such as (extbooks and journal articles is that these texts are widely available. In particular, the recent appearance of electronic editions of scientific texts facilitates the automatic processing of these texts by computer even more. Department of Informatics, Athens Universityof Economicsand Busi- ness, 76 Pattission Street, Athens 10434,Greece Some initial experiments toward the development of KET with scientific texts are presented here, applying the Automatic Representation Independent Syllogistic Text Analysis (ARISTA) method, which differs from the methods followed by the prevailing approach. KNOWLEDGE ACQUISITION FROM SCIENTIFIC TEXTS The development of systems with the ability to read scientific texts and assimilate or acquire the knowledge contained in them may provide help to solve the know- ledge engineering problems posed by the knowledge acquisition bottleneck in the creation of scientific expert systems. The prevailing approach in the natural language processing literature uses methods that rely on the trans- lation of texts into some knowledge representation for- malism. As an illustrative example of the prevailing approach, a system called GREKA, which uses attribute grammars for the representation of the text content ',2, will be briefly reviewed. GREKA has been applied to the acquisition of causal and other forms of knowledge from texts. The method followed in GREKA involves: • the analysis of scientific texts by considering types of sentences that express causal and other relations between processes as well as relations between parts of objects • the location of the prerequisite background and domain knowledge necessary for answering questions • the translation of the texts delivering the knowledge into an attribute grammar, with its syntactic part modelling the structural knowledge and its 'semantic' part modelling the rest of the knowledge • the generation of answers and their explanations from the grammar generated in terms of the relations of processes and entities involved, building on the early question-answering work done by the author 3. The translation performed by the system is based on lin- guistic knowledge, which consists of the following parts: • rules that recognize the syntactic structures encoun- tered in the body of the scientific texts used • semantic knowledge necessary for analysing these syn- tactic structures • lexical knowledge Vol 34 No 9 September 1992 0950-5849/92/090611~)6 © 1992 Butterworth-Heinemann Ltd 611