QB2OLAP: Enabling OLAP on Statistical Linked Open Data Jovan Varga *1 , Lorena Etcheverry 2 , Alejandro A. Vaisman 3 , Oscar Romero *4 , Torben Bach Pedersen § 5 and Christian Thomsen § 6 * Universitat Polit` ecnica de Catalunya, BarcelonaTech, Barcelona, Spain Instituto de Computaci´ on, Facultad de Ingenier´ ıa, UdelaR Montevideo, Uruguay Instituto Tecnol´ ogico de Buenos Aires, Buenos Aires, Argentina § Aalborg Universitet, Aalborg, Denmark 1 jvarga@essi.upc.edu, 2 lorenae@fing.edu.uy, 3 avaisman@itba.edu.ar, 4 oromero@essi.upc.edu, 5 tbp@cs.aau.dk, 6 chr@cs.aau.dk Abstract—Publication and sharing of multidimensional (MD) data on the Semantic Web (SW) opens new opportunities for the use of On-Line Analytical Processing (OLAP). The RDF Data Cube (QB) vocabulary, the current standard for statistical data publishing, however, lacks key MD concepts such as dimension hierarchies and aggregate functions. QB4OLAP was proposed to remedy this. However, QB4OLAP requires extensive manual annotation and users must still write queries in SPARQL, the standard query language for RDF, which typical OLAP users are not familiar with. In this demo, we present QB2OLAP, a tool for enabling OLAP on existing QB data. Without requiring any RDF, QB(4OLAP), or SPARQL skills, it allows semi-automatic transformation of a QB data set into a QB4OLAP one via en- richment with QB4OLAP semantics, exploration of the enriched schema, and querying with the high-level OLAP language QL that exploits the QB4OLAP semantics and is automatically translated to SPARQL. I. I NTRODUCTION OLAP analysis [1] is a well-established approach for decision making. Typically used in Data Warehousing (DW), OLAP relies on the MD model which represents data in terms of facts and dimensions. In short, dimensions conform the axes of an MD space in which a set of measures (associated to the fact) are represented. Dimensions provide appropriate contextual meaning to facts, and are organized as hierarchies, providing different levels of data aggregation. By means of an MD algebra, MD data are aggregated and disaggregated (through roll-up and drill-down, respectively), and filtered (through slice and dice operations), among other operations. Initiatives like Open Data 1 are pushing organizations to publish MD data using standards and non-proprietary formats. Two main approaches can be followed for OLAP analysis of SW data. The first one aims at extracting MD data from the Web, and loading them into traditional DWs for OLAP analysis [2]. The second one (that we follow in our work) carries out OLAP-like analysis directly over MD data represented in RDF, following the notion of self-service BI [3]. Statistical data have traditionally been accessed and ana- lyzed by means of OLAP [1]. In the SW, statistical data sets are usually published using the RDF Data Cube Vocabulary 2 This research is funded by the European Commission through the Erasmus Mundus Joint Doctorate IT4BI-DC. 1 http://okfn.org/opendata/ 2 http://www.w3.org/TR/vocab-data-cube/ (QB), a W3C recommendation since January, 2014. However, QB does not support the dimension hierarchies and aggregate functions needed for OLAP analysis. To address this challenge, a new vocabulary called QB4OLAP has been proposed [4]. QB4OLAP allows reusing data already published in QB by defining an MD schema containing the hierarchical structure of the dimensions (and the corresponding instances that populate the dimension levels). Once a data cube becomes published using QB4OLAP, we benefit from all the OLAP advances achieved in order to enable users to perform OLAP operations over the cube at a higher level of abstraction by using an OLAP algebra. In the demo, we present the QB2OLAP tool that can semi-automatically transform a QB data set into a QB4OLAP data set by enriching it with QB4OLAP semantics, explore the enriched schema (i.e., dimensions’ structures and instances), and query the data set using a high-level OLAP language, denoted QL. QB2OLAP semi-automatically discov- ers dimension hierarchies to enrich the original data set, and automatically translates QL queries into SPARQL and executes them on an endpoint. Thus, QB2OLAP is a tool that facilitates data analysis, encouraging the use of MD data on the web. To our best knowledge, it is the first tool enabling native OLAP analysis on Statistical Linked Open Data. Demo Use Case: Mary is a journalist covering the European migration crisis. She wants to analyze historical migration data for the European Union (EU), and knows that these data 3 are provided by the statistical office of the EU (Eurostat) and are also available as Linked Open Data in QB format 4 . Mary wants to compute some basic filtering/ summaries, typical for OLAP, such as aggregate the origin nationality of immigrants per continent. However, due to the limited schema information, she soon realizes that it is not possible to perform OLAP operations. To do so, she would need to enrich the data set (e.g., with dimension hierarchies to roll-up through). Moreover, both enrichment and analysis require the use of SPARQL, a language that she cannot manage although she is quite proficient in OLAP. Fortunately, she knows about QB2OLAP and decides to use it to overcome her lack of technical knowledge on RDF, QB, and SPARQL. This demo shows how QB2OLAP can be used to achieve OLAP- like analysis over existing QB data sets and enable even wider analysis, e.g., analyze migration data according to the kind 3 http://ec.europa.eu/eurostat/statistics-explained/index.php/Asylum statistics 4 http://eurostat.linked-statistics.org/