Automatic Acquisition of Domain Information for Lexical Concepts Ernesto D’Avanzo, Alfio Gliozzo, Carlo Strapparava ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica, I-38050 Trento, ITALY {davanzo,gliozzo,strappa}@itc.it Abstract In this paper we adopted Latent Semantic Ker- nels to perform a Term Categorization task, and we applied this technique to assign domain la- bels to monosemous words. Results show that the proposed technique is effective, achieving an accuracy of about 43% for all the monosemus terms in a corpus. We also reported an error analysis showing that most of the misclassification errors are re- lated the the fuzzy nature of domain distinc- tions. In particular we identified a set of “fam- ilies” in the WordNet Domains categories that makes difficult the classification task. 1 Introduction In this paper we will illustrate how to ex- ploit Term Categorization (TC) for automati- cally acquiring domain information for Word- Net synsets by simply restricting our attention to monosemous terms. Monosemous terms are not ambiguous, then they are contained in just one synset of Word- Net. Assigning a domain to a monosemous is equivalent to associate the domain to the WordNet synset in which it is contained. This property allows us to automatically acquire do- main labels for the set of the WordNet synsets such that they contain at least a monosemous word. An automatic technique to acquire domain la- bels for synsets in WordNet can be exploited by several applications. For example the lexi- cal resource can be tuned for a particular do- main by pruning the irrelevant senses for the domain. Another useful application of TC is ontology population. In this case list of terms not yet contained in the lexical resource can be acquired from a corpus. Then they can be au- tomatically labeled by the TC algorithm, and included in the lexical resource in the correct “domain area”. For example named entities in texts, such as Kasparov, can be related to synsets belonging to a specific domain (in this case Chess). Even though the implementation of the full process required for tuning the lexical resource is outside the goal of the present work, it is ev- ident that a correct assignment of domain la- bels to a small subset of synset is a crucial step for the overall tuning process (Magnini et al., 2002a). In this paper we will introduce the use of la- tent semantic kernels for term categorization, demonstrating that they allow us to achieve a reasonably high accuracy for the task of as- signing domain labels to monosemous words, and therefore to their corresponding synsets in WordNet. For the future, we plan to apply this technique to detect an initial set of seed synsets, in order to propagate the domain infor- mation through the whole structure of Word- Net. The paper is structured as follows. In section 2 we introduce the concept of semantic domains. Section 3 describes the use of Latent Semantic Kernels (LSK) for TC. Section 4 illustrates the resources we used for our experiments, while in section 5 we experimentally measured the via- bility of our approach to acquire domain infor- mation for monosemous words. Finally section 6 contains some final remarks. 2 Semantic Domains This section introduces the notion of semantic domains from the computational linguistics per- spective, suggesting that semantic domains pro- vide a useful component for modeling concep- tual structures. Semantic domains are groups of strictly re- lated concepts in the language, whose funda- mental property is to frequently co-occur to- gether in texts. Assigning semantic domains to concepts in a semantic network such as Word- Net is then useful to define domain specific sub-