Multidimensional Integration of RDF Datasets Jam Jahanzeb Khan Behan 1,2 , Oscar Romero 1 , and Esteban Zimányi 2 1 Universitat Politècnica de Catalunya, Calle Jordi Girona, 1-3, 08034 Barcelona {behan,oromero}@essi.upc.edu 2 Université libre de Bruxelles, Avenue Franklin Roosevelt 50, 1050 Bruxelles {jbehan,ezimanyi}@ulb.ac.be Abstract. Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteris- tics that enable their integration. However, since each provider has their own data dictionary, identifying common concepts is not trivial and we require costly and complex entity resolution and transformation rules to perform such integration. In this paper, we propose a novel method, that given a set of independent RDF datasets, provides a multidimensional interpretation of these datasets and integrates them based on a common multidimensional space (if any) identified. To do so, our method first identifies potential dimensional and factual data on the input datasets and performs entity resolution to merge common dimensional and fac- tual concepts. As a result, we generate a common multidimensional space and identify each input dataset as a cuboid of the resulting lattice. With such output, we are able to exploit open data with OLAP operators in a richer fashion than dealing with them separately. Keywords: Entity Resolution · Resource Description Framework (RDF) · Data Integration · On-Line Analytical Processing (OLAP) · Multidi- mensional Modeling. 1 Introduction Data availability on the Web is ensured as users constantly upload data. Since multiple users can share the same entity, data duplication and unconnected re- lated data grew on the Web. As a consequence, integration of web sources became a necessity and the Web of Linked Data was obtained. Linked Open Data (LOD) enables the sharing of information, structured querying formats, and facilitates access to data by means of Uniform Resource Identifiers (URIs). Yet, due to the heterogeneity of the Web of Linked Data, it is still problematic to develop Linked Data (LD) applications. Nowadays, we cannot assume that all URI aliases have been explicitly stated as links and therefore data integration is still an open issue. Nevertheless, the size of LOD has been increasing exponentially. A study released in April 2014 highlights that the LD cloud has grown to more than 1000 datasets from just 12 datasets cataloged in 2007 [15] having more than 500 million explicit links between them. Behan, J.; Romero, O.; Zimányi, E. Multidimensional integration of RDF datasets. A: International Conference on Big Data Analytics and Knowledge Discovery. "Big Data Analytics and Knowledge Discovery, 21st International Conference, DaWaK 2019: Linz, Austria, August 26–29, 2019: proceedings". Berlín: Springer, 2019, p. 119-135. The final authenticated version is available online at https:// doi.org/10.1007/978-3-030-27520-4_9