Learning properties of Noun Phrases: from data to functions Valeria Quochi, Basilio Calderone Istituto di Linguistica Computazionale CNR, Scuola Normale Superiore Via Moruzzi 1 Pisa -Italy, Piazza dei Cavalieri 7 Pisa -Italy valeria.quochi@ilc.cnr.it, b.calderone@sns.it Abstract The paper presents two experiments of unsupervised classification of Italian noun phrases. The goal of the experiments is to identify the most prominent contextual properties that allow for a functional classification of noun phrases. For this purpose, we used a Self Organizing Map is trained with syntactically-annotated contexts containing noun phrases. The contexts are defined by means of a set of features representing morpho-syntactic properties of both nouns and their wider contexts. Two types of experiments have been run: one based on noun types and the other based on noun tokens. The results of the type simulation show that when frequency is the most prominent classification factor, the network isolates idiomatic or fixed phrases. The results of the token simulation experiment, instead, show that, of the 36 attributes represented in the original input matrix, only a few of them are prominent in the re-organization of the map. In particular, key features in the emergent macro-classification are the type of determiner and the grammatical number of the noun. An additional but not less interesting result is an organization into semantic/pragmatic micro-classes. In conclusions, our result confirm the relative prominence of determiner type and grammatical number in the task of noun (phrase) categorization. 1. Introduction We describe here an exploratory study on the acquisition of functional properties of nouns in language use. This work models contextual and morpho-syntactic information in or- der to discover fundamental properties of Noun Phrases (NPs henceforth) in Italian 1 . Context analysis is crucial in our investigation: we assume in fact that nouns per se have no semantic/functional property other than the default referential one. However, depending on the wider context in which they occur, nouns or better noun phrases, may be used in different ways: to predicate, to refer to specific, in- dividuated entities or they can be be more generally type referring (Crof and Cruse, 2004). Our aim in this work is to see whether, given a large set of (psychologically plausible) morpho-syntactic contextual features and an unsupervised learning method, (functional) similarities of nouns emerge from language use. We set up two simulation experiments using a Self-Organizing Map learning protocol (section 3.1.). For the present purposes, we analyze the final organization of a SOM trained with morphosyntactically-defined contexts of noun phrases in order to investigate the prominence of the various morpho- syntactic properties, i.e. the relevant dimensions on the ba- sis of which the map self-organizes and the correlation to linguistic functional properties of noun phrases. The present paper is organized as follows: first we briefly mention some related works on the acquisition of deep lexi- cal properties of nouns in languages other than Italian. Sec- tion 3. presents the methodology adopted: the learning sys- tem, the dataset and the feature extraction and representa- tion process. Section 4. describes the experiments based on noun types and noun tokens and briefly discusses the outcomes. Finally a discussion of the result and the future work is given in Figure 5. 1 The term Noun Phrase (NP) will be used here as a theory- independent general label for various kinds of nominal chunks (noun, determiner+noun, adjective+noun, . . . ). 1.1. Linguistic Background The standard function of nouns is to name portions of re- ality, to label entities. A noun typically denotes the kind of thing that its referent belongs to. Naming is therefore a kind of categorization. Assuming this, we will say that the primary cognitive function of nouns is to form a classifica- tion system of things in the world that we use in referring to them (Dryer, 2004, 50). Nouns, however, are seldom used in isolation; noun phrases (or more generally nominal chunks) may have different, contextual functions. Functions of noun phrases are to sig- nal the countability, new vs. given status, generic or indi- viduated character of the entity referred to, and its degree of referentiality (Crof and Cruse, 2004; Delfitto, 2002). In many languages, the type of determiner present in the NP and the number of the noun are the linguistic cues that are generally held responsible for signaling the function in context (countability, givenness and specificity in particu- lar). However, there is considerable variation both among and within languages. In some theories, determiners are ac- knowledged great importance, they are even considered the heads of noun phrases (i.e. Sugayama and Hudson (2005). In Cognitive Linguistics, instead, they are assigned a fun- damental property, they signal the “grounding” of a noun phrase (its contextual identification within the speech event, (Langacker, 2004, 77-85)). Countability is considered responsible for the construal of an entity as an individuated unit. This difference corre- sponds to the bound/unbound structural schematization in Cognitive Linguistics (Langacker, 1987). Countability may also construe an entity as of a specific type, e.g. chair vs. furniture (Crof and Cruse, 2004). Assuming that naming is categorizing and that categories are not neat, but have fuzzy boundaries, the meaning and function of nouns cannot be totally pre-established, but must be construed dynamically in context. Therefore, the structure of the noun phrase and its surrounding context should reveal the specific construal of the noun. Put in 2596