Differentiae Specificae in EuroWordNet and SIMuLLDA Maarten Janssen UiL-OTS, Utrecht University Trans 10, 3512 ED Utrecht, The Netherlands m.janssen@let.uu.nl Abstract (Euro)WordNet, like all other semantic network based formalisms, does not contain differentiae specificae. In this article, I will argue that this lack of differentiae specificae leads to a number of unsurmountable problems, not only from a monolingual point of view, but also in a multilingual setting. As an alternative, I will present the framework proposed in my thesis: SIMuLLDA. The SIMuLLDA set-up not just contains differentiae specificae (called definitional attributes), but differentiae specificae form the building blocks of the system: the relations between meanings are derived from the application of Formal Concept Analysis to the set of definitional attributes. 1. Introduction Given the many shortcomings of systems based on semantic primitives, WordNet, like many other lexical databases and knowledge bases, is based on semantic net- works (see for instance Miller (1998)). In semantic net- works, there is no need for anything like semantic markers or, as you would call them from a lexicographers point of view, differentiae specificae, since all information is formu- lated in terms of relations between (in the case of WordNet) synsets. In this article, I will argue that this lack of differ- entiae specificae leads to a number of insurmountable prob- lems, not only from a monolingual point of view, but also in a multilingual setting. As an alternative, I will present the framework proposed in my thesis (Janssen, 2002): SIMuLLDA, a Structured In- terlingua MultiLingual Lexical Database Application. The SIMuLLDA set-up not just contains differentiae specificae (which are called definitional attributes in the system), but differentiae specificae form the building blocks of the sys- tem: the relations between meanings are derived from the application of a logical formalism called Formal Concept Analysis (FCA) to the set of definitional attributes. After the presentation of the framework, I will indi- cate why definitional attributes do not give these traditional problems by showing that the resulting framework should not be viewed as an ontological hierarchy, nor as a knowl- edge base, but as a modest lexical database. In this article, the following notational conventions will be used: meaning-units, in the case of WordNet the synsets, will be typeset in SMALL- CAPS, word-forms are set in sans serif, differentiae specificae, as well as the relations in WordNet, in bold-face. 2. The Need for Differentiae Specificae One of the main aspects of the WordNet system is its ontological hierarchy, provided by the is a links. Although not de facto a separate system (the is a link is just a link as any other), the hierarchy is often presented that way, and many applications of the WordNet database only make use of this ontology. So for the moment I will consider the (ontological) hierarchy of WordNet as a system on its own. The is a relation links a synset to its genus proximum (to use the lexicographer’s term), hence strongly character- ising the meaning of the synset by indicating what kind of meaning it is. But on its own, the is a link does not fully characterise the meaning of the synset: it fails to distinguish the various hyponyms of the same synset. From the point of view of the hierarchy we also need differentiae specificae to keep the meanings/synsets within the same genus apart. In the WordNet approach, this differentiation is done by means of the other links. As an example, one could de- fine the synset ACTRESS by means of an is a relation to ACTOR, and a female relation the other way around (or alternatively a is relation to FEMALE). But although the other links in WordNet do provide additional information about the synset, they are not designed to provide differen- tiae specificae. This shows in two ways: firstly, the other links give information independent of the is a link, which means that they are independent of the information already provided by the is a link. So they cannot structurally sup- plement the information lacking from the is a link. Secondly, not all differentiating information can be modelled by means of these other links. Consider for in- stance the word millpond, which is a AREA OF WATER. But a millpond is not just any area of water, it is specifically one used for driving the wheel of a watermill (according to LDOCE). And there are no WordNet links for this type of differentiating information. So differentiae specificae as such do not exist in Word- Net, even though in some (or many) cases the differentiat- ing information will be present or can be provided some- how. This absence of a structural modelling of differentiae specificae leads to serious problems. Let me illustrate this using three examples. The first example is that, according to Vossen & Copes- take (1993), (Euro)WordNet has problems dealing with verb nominalisations: SMOKER is a hyponym of PERSON, but so are RUNNER, SLEEPER, JOGGER, etc. The point here is not so much that distinguishing these nominalisations is impossible in WordNet: in principle, these can be distin- guished by means of the involved agent relation. So we can express that the involved agent for SMOKE is SMOKER, and hence by means of backward search say that a smoker is a person who smokes. The point is that for synsets with large numbers of hyponyms, there is no structural way of telling them apart: WordNet in many cases depends on the ontological hierarchy, so the less layered it is, the less in- formative it is.