Learning Subsumption Relations with CSR: A Classification- based Method for the Alignment of Ontologies 1 Vassilis Spiliopoulos 1, 2 , Alexandros G. Valarakos 1 , George A. Vouros 1 , and Vangelis Karkaletsis 2 1 AI Lab, Information and Communication Systems Engineering Department, University of the Aegean, Samos, 83 200, Greece {vspiliop, alexv, georgev}@aegean.gr 2 Institution of Informatics and Telecommunications, NCSR ”Demokritos”, Greece vangelis@iit.demokritos.gr Abstract. In this paper we propose the "Classification-Based Learning of Subsumption Relations for the Alignment of Ontologies" (CSR) method. Given a pair of concepts from two ontologies, the objective of CSR is to identify patterns of concepts’ features (here, properties) that provide evidence for the subsumption relation among these concepts. This is achieved by means of a classification task using decision trees. For the learning of the decision trees, the proposed method generates training datasets from the source ontologies’, considering each ontology in isolation. The paper describes thoroughly the method, provides experimental results for computing subsumption relations over an extended version of the OAEI 2006 benchmarking series and discusses the potential of the method. Keywords: ontology alignment, subsumption, supervised learning, binary classification. 1 Introduction Although many efforts [1] aim to the automatic discovery of equivalence relations between the elements of ontologies, in this paper we conjecture that this is not enough: To deal effectively with the ontologies’ alignment problem, we have to deal with the discovery of subsumption relations among ontology elements. This is particularly true, when we deal with ontologies whose conceptualizations are at different “granularity levels”: In these cases, elements (concepts and/or properties) of an ontology are more generic than the corresponding elements of another ontology. Although subsumption relations between the elements of two ontologies may be deduced by the equivalence relations of other elements, in extreme cases where no equivalence relations exist, this can not be done. In any case, we conjecture that the discovery of subsumption relations between elements of different ontologies may further facilitate the discovery/filtering of equivalence relations, and vise-versa, augmenting the effectiveness of our ontology alignment and merging methods [2]. This paper presents the "Classification-Based Learning of Subsumption Relations for the Alignment of Ontologies" (CSR) method. CSR computes subsumption relations between concept pairs of two distinct ontologies by means of a classification task, using decision trees, and by exploiting equivalences between properties. Given a pair of concepts, the supervised machine learning method “locates” a hypothesis concerning their relation in a space of hypotheses, which best fits (but not restricted) to the training examples [3], generalizing beyond them. Concept pairs are represented as feature vectors of length equal to the number of the distinct properties of source and target ontologies: Equivalent properties (i.e., properties with equivalent meaning) correspond to the same vector component. The training examples for the learning method are being generated from the target and source ontologies. Although other features may be used, in this paper we study the importance of concepts’ properties to assessing the subsumption between concepts: This is an important first step to assessing subsumption relations among concepts, since (a) it appeals to our intuition about the importance of properties as distinguishing characteristics of classes of entities, (b) it makes the least possible commitment to the precision of any method for the discovery of equivalence relations among ontology elements, (c) it provides a basic method that can be further enhanced with other concepts’ distinguishing features (e.g., concepts in a given vicinity), and can be further combined with other 1 This work is co-financed by E.U.-European Social Fund (75%) and the Greek Ministry of Development-GSRT (25%) (www.ontosum.org).