On the semantics of noun compounds Roxana Girju a, * , Dan Moldovan b , Marta Tatu b , Daniel Antohe b a Computer Science Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA b Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75080, USA Received 5 June 2004; received in revised form 6 January 2005; accepted 15 February 2005 Available online 16 March 2005 Abstract This paper provides new insights on the semantic characteristics of two and three noun compounds. An analysis is performed using two sets of semantic classification categories: a list of 8 prepositional para- phrases previously proposed by Lauer [Designing statistical language learners: experiments on noun com- pounds, Ph.D. Thesis, Macquarie University, Australia] and a new set of 35 semantic relations introduced by us. We show the distribution of these semantic categories on a corpus of noun compounds and present several models for the bracketing and the semantic classification of noun compounds. The results are com- pared against state-of-the-art models reported in the literature. Ó 2005 Elsevier Ltd. All rights reserved. 1. Introduction The semantic interpretation of noun compounds (NCs) deals with the detection and semantic classification of the relations between noun constituents. The problem is complex and has been studied intensively in linguistics, psycho-linguistics, philosophy, and computational linguistics for a long time. There are several reasons that make this task difficult. (a) NCs have implicit 0885-2308/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.csl.2005.02.006 * Corresponding author. E-mail addresses: girju@cs.uiuc.edu (R. Girju), moldovan@utdallas.edu (D. Moldovan), marta@hlt.utdallas.edu (M. Tatu), dantohe@hlt.utdallas.edu (D. Antohe). www.elsevier.com/locate/csl Computer Speech and Language 19 (2005) 479–496 COMPUTER SPEECH AND LANGUAGE