International Journal of Medical Informatics 53 (1999) 1 – 28
Discourse structures in medical reports — Watch out!
The generation of referentially coherent and valid text
knowledge bases in the MEDSYNDIKATE system
Udo Hahn
a,
*, Martin Romacker
a,b
, Stefan Schulz
a,b
a
Freiburg Uniersity, Computational Linguistics Lab, Werthmannplatz 1, D-79085 Freiburg, Germany
b
Department of Medical Informatics, Freiburg Uniersity Hospital, Stefan -Meier -Str. 26, D-79104 Freiburg, Germany
Received 15 February 1998; received in revised form 20 March 1998; accepted 25 March 1998
Abstract
The automatic analysis of medical narratives currently suffers from neglecting text structure phenomena such as
referential relations between discourse units. This has unwarranted effects on the descriptional adequacy of medical
knowledge bases automatically generated from texts. The resulting representation bias can be characterized in terms
of incomplete, artificially fragmented and referentially invalid knowledge structures. We focus here on four basic types
of textual reference relations, iz. pronominal and nominal anaphora, textual ellipsis and metonymy and show how
to deal with them in an adequate text parsing device. Since the types of reference relations we discuss show an
increasing dependence on conceptual background knowledge, we stress the need for formally grounded, expressive
conceptual representation systems for medical knowledge. Our suggestions are based on experience with MEDSYN-
DIKATE, a medical text knowledge acquisition system designed to properly deal with various sorts of discourse
structure phenomena. © 1999 Elsevier Science Ireland Ltd. All rights reserved.
Keywords: Natural language processing: text understanding; Knowledge acquisition from texts; Knowledge represen-
tation: description logics; Ontology and terminology: pathology domain
1. Introduction
With the overall diffusion of electronic text
processing technology in clinical offices and
at the physician’s workplace and, more re-
cently, the unlimited access to text resources
in the Internet, a vast potential for medical
information supply arises. The natural lan-
guage processing community, therefore, faces
the challenge to meet the requirements of
cursory as well as in-depth analysis of large
* Corresponding author. Tel.: +49 761 2033255; fax: +49
761 2033251; e-mail: hahn@coling.uni-freiburg.de
1386-5056/99/$ - see front matter © 1999 Elsevier Science Ireland Ltd. All rights reserved.
PII S1386-5056(98)00091-4