Querying Factorized Probabilistic Triple Databases Denis Krompaß 1 , Maximilian Nickel 2 , and Volker Tresp 1,3 1 Ludwig Maximilian University, 80538 Munich, Germany Denis.Krompass@campus.lmu.de 2 Massachusetts Institute of Technology, Cambridge, MA and Istituto Italiano di Tecnologia, Genova, Italy mnick@mit.edu 3 Siemens AG, Corporate Technology, Munich, Germany Volker.Tresp@siemens.com Abstract. An increasing amount of data is becoming available in the form of large triple stores, with the Semantic Web’s linked open data cloud (LOD) as one of the most prominent examples. Data quality and completeness are key issues in many community-generated data stores, like LOD, which motivates probabilistic and statistical approaches to data representation, reasoning and querying. In this paper we address the issue from the perspective of probabilistic databases, which account for uncertainty in the data via a probability distribution over all database instances. We obtain a highly compressed representation using the re- cently developed RESCAL approach and demonstrate experimentally that efficient querying can be obtained by exploiting inherent features of RESCAL via sub-query approximations of deterministic views. Keywords: Probabilistic Databases, Tensor Factorization, RESCAL, Querying, Extensional Query Evaluation 1 Introduction The rapidly growing Web of Data, e.g., as presented by the Semantic Web’s linked open data cloud (LOD), is providing an increasing amount of data in form of large triple databases, also known as triple stores. However, the LOD cloud includes many sources with varying reliability and to correctly account for data veracity remains a big challenge. To address this issue, reasoning with inconsistent and uncertain ontologies has recently emerged as a research field of its own [6, 31, 4, 9, 3, 15]. In this paper we approach the veracity issue from the perspective of probabilistic databases (PDB), which consider multiple possible occurrences of a database via a possible worlds semantics and account for un- certainty in the data by assigning a probability distribution over all database instances [27]. As such, querying PDBs has a clear interpretation as generaliza- tions of deterministic relational database queries. When applying PDBs to large triple stores various key challenges need to be addressed. First, consider storage requirements. A common assumption in PDBs