Identifying verbal collocations in Wikipedia articles ⋆ Istv´ an Nagy T. 1 and Veronika Vincze 2 1 University of Szeged, Department of Informatics, 6720 Szeged, ´ Arp´ ad t´ er 2., Hungary 2 MTA-SZTE Research Group on Artificial Intelligence, 6720 Szeged, Tisza Lajos krt. 103., Hungary {nistvan,vinczev}@inf.u-szeged.hu Abstract. In this paper, we focus on various methods for detecting ver- bal collocations, i.e. verb-particle constructions and light verb construc- tions in Wikipedia articles. Our results suggest that for verb-particle constructions, POS-tagging and restriction on the particle seem to yield the best result whereas the combination of POS-tagging, syntactic in- formation and restrictions on the nominal and verbal component have the most beneficial effect on identifying light verb constructions. The identification of multiword semantic units can be successfully exploited in several applications in the fields of machine translation or information extraction. Keywords: multiword expressions, verbal collocations, light verb con- structions, verb-particle constructions, Wikipedia 1 Introduction In natural language processing, the proper treatment of multiword expressions (MWEs) is essential for many higher-level applications (e.g. information extrac- tion or machine translation). Multiword expressions are lexical items that can be decomposed into single words and display idiosyncratic features [11]. To put it differently, they are lexical items that contain space or ‘idiosyncratic inter- pretations that cross word boundaries’. They are frequent in language use and because of their unique and idiosyncratic behavior, they often pose a problem to NLP systems. In this work, we focus on various methods for detecting verbal collocations, i.e. verb-particle constructions (VPCs) and light verb constructions (LVCs) in Wikipedia articles. First, we offer a short description on characteristic features of these two types of multiword expressions, then related work is presented. Our methods are later described and results achieved are presented. The paper concludes with a discussion of results and future work. ⋆ This work was supported in part by the National Innovation Office of the Hungarian government within the framework of the projects BELAMI and MASZEKER.