Exploiting the feature vector model for learning linguistic representations of relational concepts Roberto Basili, Maria Teresa Pazienza, Fabio Massimo Zanzotto University of Rome ”Tor Vergata”, Department of Computer Science, Systems and Production, 00133 Roma (Italy) {basili, pazienza, zanzotto}@info.uniroma2.it Abstract In this paper we focus our attention to the construction of one-to-many mappings between the coarse-grained re- lational concepts and the corresponding linguistic realisa- tions with an eye on the problem of selecting the catalogue of the coarse-grained relational concepts. We here explore the extent and nature of the general semantic knowledge required for the task, and, consequently, the usability of general-purpose resources such as WordNet. We propose an original model, the verb semantic prints, for exploiting ambiguous semantic information within the feature vector model. 1 Introduction Relational concepts and their linguistic realisations are very relevant bits of semantic dictionaries. These equiva- lence classes, often called semantic frames, may enable so- phisticated natural language processing applications as ar- gued in [7] among others. For example, take the relational concept have-revenues(AGENT:X, AMOUNT:Y, TIME:Z) and two related ”generalised” forms X has a positive net income of Y in Z and X reports revenues of Y for Z. This would help in finding answers to very specific factoid ques- tions such as ”Which company had a positive net income in the financial year 2001?” using text fragments as ”Acme Inc. reported revenues of $.9 million for the year ended in December 2001.”. Information Extaction (IE) is based on this notion. Tem- plates are relational concepts and extraction patterns are lin- guistic relatisations of templates or, eventually, of interme- diate relational concepts, i.e. the events. Besides used tech- niques we can say that IE is a semantic-oriented application. Generally, such a kind of applications rely on com- plete semantic models consisting of: a catalogue of named entity classes (relevant concepts) as Company, Currency, and TimePeriod; a catalogue of (generally) coarse-grained relational concepts with their seman- tic restrictions, e.g. have-revenues(AGENT:Company, AMOUNT:Currency, TIME:TimePeriod); a set of rules for detecting named entities realised in texts and assigning them to the correct class; and, finally, a catalogue of one-to- many mappings between the coarse-grained relational con- cepts and the corresponding linguistic realisations. These semantic models are often organised using logical for- malisms (as in [6]). The results are very interesting artifacts conceived to represent equivalences among linguistic forms in a systematic and principled manner. Besides the representational formalism, the actual con- tent of semantic models is a crucial issue. Using semantic- oriented systems requires the definition of the relevant se- mantic classes and their one-to-many mappings with the linguistic realisations within the target knowledge domain. Even if repositories of general knowledge about the world exist both at the concept level (e.g. Wordnet [11]) and at the relational concept level (e.g. Framenet [2]), they can be hardly straightforwardly used. Specific domains and infor- mation needs such as airplane travels in [16] or the company mergers and acquisitions in [1] generally stress their lim- its. Good coverage of phenomena and, consequently, good performances of final applications can be reached when the underlying semantic models are adapted to target domains. It is reasonable to hope that the cost of building domain- specific semantic resources can be drammatically reduced as such a kind of knowledge already exists in ”natural” repositories: the domain corpora. We are interested in in- vestigating this problem relying on a ”terminologial” per- spective [5]. It is our opinion that typical insights of termi- nology studies as admissible surface forms and domain rel- evance help in concentrating the attention on relevant and generalised text fragments when mining large text collec- tions.