Selection Restrictions Acquisition from Corpora Pablo Gamallo ⋆ , Alexandre Agustini ⋆⋆ , and Gabriel P. Lopes CENTRIA, Departamento de Inform´atica Universidade Nova de Lisboa, Portgual {gamallo,aagustini,gpl}@di.fct.unl.pt c Springer-Verlag Abstract. This paper describes an automatic clustering strategy for ac- quiring selection restrictions. We use a knowledge-poor method merely based on word cooccurrence within basic syntactic constructions; hence, neither semantic tagged corpora nor man-made lexical resources are needed for generalising semantic restrictions. Our strategy relies on two basic linguistic assumptions. First, we assume that two syntactically re- lated words impose semantic selectional restrictions to each other (co- specification ). Second, it is also claimed that two syntactic contexts im- pose the same selection restrictions if they cooccur with the same words (contextual hypothesis ). In order to test our learning method, preliminary experiments have been performed on a Portuguese corpus. 1 Introduction The general aim of this paper is to describe a particular corpus-based method for semantic information extraction. More precisely, we implement a knowledge- poor system that uses syntactic information to acquire selection restrictions and semantic preferences constraining word combination. According to Gregory Grefenstette [9, 10], knowledge-poor approaches use no presupposed semantic knowledge for automatically extracting semantic informa- tion. They are characterised as follows: no domain-specific information is avail- able, no semantic tagging is used, and no static sources as machine readable dic- tionaries or handcrafted thesauri are required. Hence, they differ from knowledge- rich approaches in the amount of linguistic knowledge they need to activate the semantic acquisition process. Whereas knowledge-rich approaches require previ- ously encoded semantic information (semantic tagged corpora and/or man-made lexical resources [17, 6, 1]), knowledge-poor methods only need a coarse-grained notion of linguistic information: word cooccurrence. In particular, the main aim of knowledge-poor approaches is to calculate the frequency of word cooccur- rences within either syntactic constructions or sequences of n-grams in order to extract semantic information such as selection restrictions [19, 11, 3], and word ontologies [12, 15, 9, 13]. Since these methods do not require previously defined ⋆ Research supported by the PRAXIS XXI project, FCT/MCT, Portugal ⋆⋆ Research sponsored by CAPES and PUCRS - Brazil