Learning Probabilistic Description Logics: A Framework and Algorithms Jos´ e Eduardo Ochoa Luna 1 , Kate Revoredo 2 , and Fabio Gagliardi Cozman 1 1 Escola Polit´ ecnica, Universidade de S˜ao Paulo, Av. Prof. Mello Morais 2231, S˜ao Paulo - SP, Brazil 2 Departamento de Inform´atica Aplicada, Unirio Av. Pasteur, 458, Rio de Janeiro, RJ, Brazil eduardo.ol@gmail.com,katerevoredo@uniriotec.br,fgcozman@usp.br Abstract. Description logics have become a prominent paradigm in knowledge representation (particularly for the Semantic Web), but they typically do not include explicit representation of uncertainty. In this paper, we propose a framework for automatically learning a Probabi- listic Description Logic from data. We argue that one must learn both concept deﬁnitions and probabilistic assignments. We also propose algo- rithms that do so and evaluate these algorithms on real data. 1 Introduction Description logics (DLs) [2] form a family of knowledge representation formalisms that model the application domain by deﬁning the relevant concepts of the do- main and then using these concepts to specify properties of objects and relation among concepts. Even though DLs are quite expressive, they have limitations, particularly when it comes to modeling uncertainty. Thus probabilistic exten- sions to DLs have been proposed, deﬁning diﬀerent Probabilistic Description Logics (PDLs). For instance, the PDL crALC [6, 22, 7] allows one to perform probabilistic reasoning by adding uncertainty capabilities to the DL ALC [2]. PDLs have been extensively investigated in the last few years [5,8, 19]. To build a PDL terminology with large amounts of data, one must invest consid- erable resources. Thus, machine learning algorithms can be used in order to automatically learn a PDL. To the best of our knowledge, the only proposals for learning PDLs were described in [20] and [24]. Both focused on learning crALC , but the former focused on learning concept deﬁnitions and the latter on probabilistic inclusions. In this paper, we argue that to completely learn a PDL one must learn its con- cept deﬁnitions and probabilistic inclusions. We expect that learning algorithms can accommodate together background knowledge and deterministic and proba- bilistic concepts, giving each component its due relevance. Therefore, we propose a framework for automatically learning a PDL from relational data. Focusing on crALC , we propose algorithms that do so and evaluate these algorithms on real data; compared to the existing work mentioned in the previous paragraph, we This is a preprint version of a paper published in: Lecture Notes in Artificial Intelligence, vol. 7094, Springer, 2011. When citing, please cite the final version published by Springer.