Leveraging Flexible Data Management with Graph Databases Elena Vasilyeva 1 Maik Thiele 2 Christof Bornhövd 3 Wolfgang Lehner 2 1 SAP AG 2 Database Technology Group 3 SAP Labs, LLC Dresden, Germany Technische Universität Dresden, Germany Palo Alto, CA 94304, USA elena.vasilyeva@sap.com firstname.lastname@tu-dresden.de christof.bornhoevd@sap.com ABSTRACT Integrating up-to-date information into databases from dif- ferent heterogeneous data sources is still a time-consuming and mostly manual job that can only be accomplished by skilled experts. For this reason, enterprises often lack infor- mation regarding the current market situation, preventing a holistic view that is needed to conduct sound data analysis and market predictions. Ironically, the Web consists of a huge and growing number of valuable information from di- verse organizations and data providers, such as the Linked Open Data cloud, common knowledge sources like Freebase, and social networks. One desirable usage scenario for this kind of data is its integration into a single database in order to apply data analytics. However, in today’s business intel- ligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. What we need is a system which 1) provides a flexible storage of heterogeneous information of different degrees of structure in an ad-hoc manner, and 2) supports mass data operations suited for data analytics. In this paper, we will provide our vision of such a system and describe an extension of the well-studied property graph model that allows to “integrate and analyze as you go” external data exposed in the RDF format in a seamless manner. The proposed integration approach ex- tends the internal graph model with external data from the Linked Open Data cloud, which stores over 31 billion RDF triples (September 2011) from a variety of domains. Categories and Subject Descriptors H.2.1 [Logical Design]: Data models, Normal forms, Schema and subschema; H.2.5 [Heterogeneous Databases]: Data translation General Terms Algorithms, Design Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Proceedings of the First International Workshop on Graph Data Manage- ment Experiences and Systems (GRADES 2013), June 23, 2013, New York, NY, USA. Copyright 2013 ACM 978-1-4503-2188-4 ...$15.00. Keywords Graph Database, Linked Open Data, Property Graph 1. INTRODUCTION Data analytics and business intelligence have enjoyed im- mense popularity and success over the last ten years and now play a key role in corporate decision-making. However, due to the ubiquitous presence of data residing beyond the cor- porate boundaries, the requirements posed to data analytics have changed toward more agile and situational analytics. In contrast to conventional data warehouses approaches where all datasets are known at design time, situational analytics demands data provisioning, integration, transformation, and consolidation in an ad-hoc fashion. One popular example of such an external and very valuable data source is the Linked Open Data (LOD) cloud which offers billions of structured and irregularly structured information pieces. To make pro- ductive use of the variety of LOD, two things are required: first, a powerful schema-flexible data store and, second, a way to integrate and analyze external data to bring it into a new context, to mix it with other data sources, and to gain knowledge and insights from it. The call for a timely integration of new data sources, how- ever, confronts today’s data models and architectures with a serious problem. Its integration is typically prevented by heterogeneous data formats and data of different structure and meaning. A traditional approach requires a global rigid schema considering all possible types and formats making the method too inflexible and cost-inefficient. Therefore, we propose an ad-hoc data integration engine on top of SAP’s Active Information Store (AIS) that is able to augment the AIS data store by data from LOD in a seamless manner. In this way, we want to bridge the gap between the analyti- cal world and LOD and want to show the value of LOD for business analytics in general. The rest of this paper is organized as follows: we will start by presenting a use case in Section 2 that motivates the need for a flexible and extensible data model suited for ad-hoc data analytics. We then outline in Section 3.1 the core features of the SAP Active Information Store as well as the underlying data model that allows data integration in a “pay as you go”manner. Additionally, we briefly review the principles of the RDF data model in Section 3.2 and compare both models in Section 3.3. This comparison forms the basis of our architecture which is outlined in Section 4. Finally, we summarize our findings and point out directions for future work in Section 5.