ARISTA: knowledge engineering with
scientific texts
J Kontos
The paper presents results of experiments in knowledge engineer-
ing with scientific texts by the application of the ARISTA
method. ARISTA stands for Automatic Representation Indepen-
dent Syllogistic Text Analysis. This method uses natural lan-
guage text as a knowledge base in contrast with the methods
followed by the prevailing approach, which rely on the translation
of texts into some knowledge representation formalism. The
experiments demonstrate the feasibility of deductive question-
answering and explanation generation directly from texts involv-
ing mainly causal reasoning. Illustrative examples of the oper-
ation of a prototype based on the ARISTA method and imple-
mented in Prolog are presented.
knowledge engineering, knowledge representation, natural lan-
guage processing
Reasoning is discourse in which given some premises
something different from the given necessarily follows
from the premises.
(Aristotle, Topics, 4th century BC)
Traditionally, knowledge engineering involves the elici-
tation of knowledge from domain experts by a know-
ledge engineer and the 'manual' or rather 'mental' for-
malization of this knowledge using a knowledge
representation formalism. The result of such an activity
is a knowledge base that can be processed by an inference
engine to generate answers and explanations as a res-
ponse to user questions. Experts are, however, often
unavailable, and therefore an alternative is to use texts as
a supplementary source of knowledge. The automatic
processing of such texts may provide a method for a least
partial solution of the knowledge acquisition bottleneck
problem of expert systems.
Therefore, currently a new discipline is emerging,
which may be called 'Knowledge Engineering with Texts'
(KET). The work reported in this paper addresses KET
with scientific texts. An important advantage of KET
with scientific texts such as (extbooks and journal articles
is that these texts are widely available. In particular, the
recent appearance of electronic editions of scientific texts
facilitates the automatic processing of these texts by
computer even more.
Department of Informatics, Athens Universityof Economicsand Busi-
ness, 76 Pattission Street, Athens 10434,Greece
Some initial experiments toward the development of
KET with scientific texts are presented here, applying the
Automatic Representation Independent Syllogistic Text
Analysis (ARISTA) method, which differs from the
methods followed by the prevailing approach.
KNOWLEDGE ACQUISITION FROM
SCIENTIFIC TEXTS
The development of systems with the ability to read
scientific texts and assimilate or acquire the knowledge
contained in them may provide help to solve the know-
ledge engineering problems posed by the knowledge
acquisition bottleneck in the creation of scientific expert
systems. The prevailing approach in the natural language
processing literature uses methods that rely on the trans-
lation of texts into some knowledge representation for-
malism.
As an illustrative example of the prevailing approach,
a system called GREKA, which uses attribute grammars
for the representation of the text content ',2, will be briefly
reviewed. GREKA has been applied to the acquisition of
causal and other forms of knowledge from texts.
The method followed in GREKA involves:
• the analysis of scientific texts by considering types of
sentences that express causal and other relations
between processes as well as relations between parts of
objects
• the location of the prerequisite background and
domain knowledge necessary for answering questions
• the translation of the texts delivering the knowledge
into an attribute grammar, with its syntactic part
modelling the structural knowledge and its 'semantic'
part modelling the rest of the knowledge
• the generation of answers and their explanations from
the grammar generated in terms of the relations of
processes and entities involved, building on the early
question-answering work done by the author 3.
The translation performed by the system is based on lin-
guistic knowledge, which consists of the following parts:
• rules that recognize the syntactic structures encoun-
tered in the body of the scientific texts used
• semantic knowledge necessary for analysing these syn-
tactic structures
• lexical knowledge
Vol 34 No 9 September 1992 0950-5849/92/090611~)6 © 1992 Butterworth-Heinemann Ltd 611