Reasoning by Analogy in Description Logics through Instance-based Learning Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Dipartimento di Informatica, Universit` a degli Studi di Bari Campus Universitario, Via Orabona 4, 70125 Bari, Italy {claudia.damato | fanizzi | esposito}@di.uniba.it Abstract— This work presents a method founded in instance- based learning for inductive (memory-based) reasoning on ABoxes. The method, which exploits a semantic dissimilarity measure between concepts and instances, can be employed both to answer class membership queries and to predict new assertions that may be not logically entailed by the knowledge base. In a preliminary experimentation, we show that the method is sound and it is actually able to induce new assertions that might be acquired in the knowledge base. I. I NTRODUCTION Most of the research on ontology reasoning has been focusing on methods based on deductive reasoning. However, important tasks that are likely to be provided by new genera- tion knowledge-based systems, such as construction, revision, population and evolution are likely to be supported also by inductive methods. This has brought to an increasing interest in machine learning and knowledge discovery methods applied to ontological representations (see [1], [2] and, more recently, [3], [4], [5], [6]). We propose an algorithm which is based on a notion of concept similarity for performing a form of lazy learning on typical ontological representations. Namely, by combining an instance-based (analogical) approach with a notion of semantic dissimilarity, this paper intends to demonstrate the applicabil- ity of inductive reasoning in this field which can be considered another form of approximate reasoning (see discussion in [7]). In particular, we have adapted the general instance- based learning approach like the k-Nearest Neighbor [8] to the specific multi-relational setting for ontology languages. A couple of technical problems had to be solved for this adaptation to ontology representations: 1) the Open World Assumption (OWA) that is made in this context; 2) in this multi-class problem setting disjunction cannot be assumed by default. The standard ontology languages are founded in Description Logics (henceforth DLs) as they borrow the language con- structors for expressing complex concept definitions. Instance- based methods depend on a similarity measure defined on the instance space. In this perspective, a variety of measures for concept representations have been proposed (see [9] for a survey). As pointed out in [10], most of these measures focus on the similarity of atomic concepts within hierarchies or simple ontologies, based on a few relations. Thus, it becomes necessary to investigate similarity in more complex languages. It has been observed that, adopting richer representations, the structural properties have less and less impact in assessing semantic similarity. Hence, the vision of similarity based only on a structural (graph-based) approach, such as in [11], [12] may fall short. We have proposed some dissimilarity measures for non trivial DL languages, based on the semantics conveyed by the ABox assertions, which are suitable for being used in instance-based methods [13], [14]. These measures elicit the underlying semantics by querying the knowledge base for assessing the concept extensions, estimated through their retrieval [15], as also hinted in [16]. Besides, the overall similarity is also (partially) influenced by the concepts which are related through role restrictions. Moreover, in many other typical tasks (e.g. conceptual clustering or definition), it is necessary to assess the similarity between concepts (resp. individuals). By recurring to the notion of most specific concept of an individual with respect to an ABox [15], as representatives of the individuals at the concept level, the measures for concepts can be extended to such cases. This analogical reasoning procedure like this may be em- ployed to answering class membership queries through analog- ical rather than deductive reasoning which may be more robust with respect to noise and is likely to suggest new knowledge (which was not logically derivable). Specifically we developed the method also for an application of semantic web service discovery where services are annotated in DLs. Another application might regard supporting various tasks for the knowledge engineer, such as the acquisition of can- didate assertions for enriching ontologies with partially pop- ulated ABoxes: the outcomes given by the procedure can be utilized as recommendations. Indeed, as we show in the ex- perimentation, the newly induced assertions are quite accurate (commission errors, i.e. predicting a concept erroneously, were rarely observed). In turn, the outcomes of the procedure may also trigger other related tasks such as induction (revision) of (faulty) knowledge bases. The paper is organized as follows. In the next section, the representation language is briefly presented. Two concept dissimilarity measures are recalled and exploited in a modified version of the k-NN classification procedure. The results of a preliminary experimental evaluation of the method using these two measures are shown and, finally, possible developments of the method are examined.