Learning with Semantic Kernels for Clausal Knowledge Bases Nicola Fanizzi and Claudia d’Amato Computer Science Department – University of Bari {fanizzi|claudia.damato}@di.uniba.it Abstract. Many applicative domains require complex multi-relational represen- tations. We propose a family of kernels for relational representations to produce statistical classifiers that can be effectively employed in a variety of such tasks. The kernel functions are defined over the set of objects in a knowledge base pa- rameterized on a notion of context, represented by a committee of concepts ex- pressed through logic clauses. A preliminary feature construction phase based on genetic programming allows for the selection of optimized contexts. An experi- mental session on the task of similarity search proves the practical effectiveness of the method. 1 Statistical Learning for Complex Representations Many applicative domains, spanning from natural language processing to bio- and chemio-informatics, require complex (multi-)relational representations such as those offered by logic databases (such the deductive databases). Standard tasks involving these kinds of knowledge bases require complex forms of inference (e.g. based on a logic calculus) which hardly scale with their dimensions. In such settings, decisions made by exploiting an induced statistical model may represent a viable alternative for supporting related tasks such as (approximate) retrieval, query answering, etc.. Learning inductive classification models for complex knowledge bases can be per- formed through Statistical Relational Learning (SRL) methods. In this work, we in- tend to adapt efficient non-parametric methods based on kernel functions, originally devised for attribute-value representations, to the multi-relational case required by the mentioned applications. In particular, we will focus on similarity-based methods which are based on density functions which are ultimately grounded on the semantics of the instances of the knowledge bases. Following the rationale behind the KFOIL system [11], efficient learning methods like the kernel machines [17] may be adapted to work on multi-relational spaces, such as clausal spaces investigated in ILP (and SRL). This required the definition of suitable kernel functions which encode a notion of similarity over such spaces. Even more so, the very kernel function can be the preliminary objective of learning, or measure induction and performance evaluation may be intertwined, as in KFOIL. Most of the proposed similarity measures for concept descriptions focus on the similarity of atomic concepts within simple concept hierarchies or are strongly based on the structure of the terms for specific FOL fragments. These approaches have been M. Kryszkiewicz et al. (Eds.): ISMIS 2011, LNAI 6804, pp. 250–259, 2011. c Springer-Verlag Berlin Heidelberg 2011