Experiments with ACE 2004 Semantic Relations Fabio Celli CLIC, university of Trento fabio.celli@email.unitn.it July 3, 2009 Abstract In this paper are presented two semantic relation classification ex- periments: one with 7 and one with 3 semantic relation classes. The comparison between the results of the experiments reveal that the pro- posal of 3 semantic relation classes (Role, Location and Social) is a good generalization and it leads to an F-measure of 0.80 using support vector machines and 0.81 using decision trees. 1 Introduction and Related works In a previous experiment [1] about semantic relation classification in Italian it was found that the generalization from 5 to 3 semantic relation classes signifi- cantly improved the classification performance of about 100 examples with J4.5 decision trees in the Weka environment [4]. The present work is an attempt to reproduce such experiment in English using the ACE 2004 annotation. ACE 2004 [3], which is available in English, Arabic and Chinese, is a corpus designed for experiments of automated content extraction. It includes in its annotation 7 named entity types (Person, Organization, Location, Facility, Vehicle, Weapon and Geo-Political Entity), 7 semantic relation classes ( Person-Social, Arti- fact, Physical, Employment-Organization, Person-Affiliation, GPE-affiliation, Discourse) and coreference. Semantic relation classes are further divided into sub-classes: for example the Physical relation is divided into Part-Whole and Location (see ACE 2004 documentation for details). The annotation has been automatically extracted from the original format by using ”ace2txt” [2], a open source python program. In the next section are reported two experiments: one with the original ACE relations and one with the generalizaed categories used in [1]. 2 Experiments About 6000 semantic relations were extracted from ACE 2004. For each seman- tic relation the tag contained a) relation class, b) relation subclass, c) lexical element expressing the relation, d) the text or a portion of it. From the initial dataset 200 semantic relation examples were randomly sampled: these are the examples used in the experiments. 1