Classification of Semantic Relations between Nominals: Description of Task 4 in SemEval-1 Roxana Girju 1 , Marti Hearst 2 , Preslav Nakov 3 , Vivi Nastase 4 , Stan Szpakowicz 5 , Peter Turney 6 , Deniz Yuret 7 July 28, 2006 1. Department of Linguistics, University of Illinois at Urbana-Champaign, girju@cs.uiuc.edu 2. School of Information, University of California, Berkeley, hearst@sims.berkeley.edu 3. Department of Electrical Engineering and Computer Science, University of California, Berkeley, nakov@cs.berkeley.edu 4. School of Information Technology and Engineering, University of Ottawa, vnastase@site.uottawa.ca 5. School of Information Technology and Engineering, University of Ottawa, szpak@site.uottawa.ca 6. Institute for Information Technology, National Research Council of Canada, peter.turney@nrc-cnrc.gc.ca 7. Department of Computer Engineering, Koc University, dyuret@ku.edu.tr 1. Description of the Task There is growing interest in the task of classifying semantic relations between pairs of words. However, many different classification schemes have been used, which makes it difficult to compare the various classification algorithms. We will create a benchmark dataset and evaluation task that will enable researchers to compare their algorithms. Rosario and Hearst (2001) classify noun-compounds from the medical domain, using a set of 13 classes that describe the semantic relation between the head noun and the modifier in a given noun- compound. Rosario et al. (2002) classify noun-compounds using a multi-level hierarchy of semantic relations, with 15 classes at the top level. Nastase and Szpakowicz (2003) present a two-level hierarchy for classifying noun-modifier relations in general domain text, with 5 classes at the top and 30 classes at the bottom. Their class scheme and dataset have been used by other researchers (Turney and Littman, 2005; Turney, 2005; Nastase et al., 2006). Moldovan et al. (2004) use a 35-class scheme to classify relations in noun phrases. The same scheme has been applied to noun compounds (Girju et al., 2005). Chklovski and Pantel (2004) use a 5-class scheme, designed specifically for characterizing verb- verb semantic relations. Stephens et al. (2001) use a 17-class scheme created for relations between genes. Lapata (2002) uses a 2-class scheme for classifying relations in nominalizations. Algorithms for classifying semantic relations have potential applications in Information Retrieval, Information Extraction, Summarization, Machine Translation, Question Answering, Paraphrasing, Recognizing Textual Entailment, Thesaurus Construction, Semantic Network Construction, Word Sense Disambiguation, and Language Modeling. As the techniques for semantic relation classification mature, some of these applications are being tested. Tatu and Moldovan (2005) applied the 35-class scheme of Moldovan et al. (2004) to the PASCAL Recognizing Textual Entailment (RTE) challenge, obtaining significant improvement in a state-of-the-art algorithm. There is no consensus on schemes for classifying semantic relations, and it seems unlikely that any single scheme could be useful for all applications. For example, the gene-gene relation scheme of Stephens et al. (2001) includes relations such as “X phosphorylates Y”, which are not very useful for general domain text. Even if we focus on general domain text, the verb-verb relations of Chklovski and Pantel (2004) are unlike the noun-modifier relations of Nastase and Szpakowicz (2003) or the noun phrase relations of Moldovan et al. (2004). 1