AN APPROACH FOR THE INCREMENTAL EXPORT OF RELATIONAL DATABASES INTO RDF GRAPHS Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Dimitris Kouis Hellenic Academic Libraries Link, National Technical University of Athens Iroon Polytechniou 9, Zografou, 15780, Athens, Greece {nkons, dspanos, dimitriskouis}@seab.gr Nikolas Mitrou School of Electrical and Computer Engineering, National Technical University of Athens Iroon Polytechniou 9, Zografou, 15780, Athens, Greece mitrou@cs.ntua.gr Several approaches have been proposed in the literature for offering RDF views over databases. In addition to these, a variety of tools exist that allow exporting database contents into RDF graphs. The approaches in the latter category have often been proved demonstrating better performance than the ones in the former. However, when database contents are exported into RDF, it is not always optimal or even necessary to export, or dump as this procedure is often called, the whole database contents every time. This paper investigates the problem of incremental generation and storage of the RDF graph that is the result of exporting relational database contents. In order to express mappings that associate tuples from the source database to triples in the resulting RDF graph, an implementation of the R2RML standard is subject to testing. Next, a methodology is proposed and described that enables incremental generation and storage of the RDF graph that originates from the source relational database contents. The performance of this methodology is assessed, through an extensive set of measurements. The paper concludes with a discussion regarding the authors’ most important findings. Keywords: Linked Open Data; Incremental; RDF; Relational Databases; Mapping. 1. Introduction The Linked Data movement has lately gained considerable traction and during the last few years, the research and Web user communities have invested some serious effort to make it a reality. Nowadays, RDF data on a variety of domains proliferates at increasing rates towards a Web of interconnected data. Access to government (data.gov.uk), financial (openspending.org), library (theeuropeanlibrary.org) or news data (guardian.co.uk/data), are only some of the example domains where publishing data as RDF increases its value. Systems that collect, maintain and update RDF data are not always using triplestores at their backend. Data that result in triples are typically exported from other, primary sources into RDF graphs, often relying on systems that have a Relational Database Management System (RDBMS) at their core, and maintained by teams of professionals that trust it for mission- critical tasks. Moreover, it is understood that experimenting with new technologies – as the Linked Open Data (LOD) world can be perceived by people and industries working on less frequently changing environments – can be a task that requires caution, since it is often difficult to change established methodologies and systems, let alone replace by newer ones. Consider, for instance, the library domain, where a whole living and breathing information ecosystem is buzzing around bibliographic records, authorities records, digital object records, e-books, digital articles etc., where maintenance and update tasks are unremitting. In these situations, changes in the way data is produced, assured for its quality and updated affects people’s everyday working activities and therefore, operating newer technologies side-by-side for a period of time before migrating to new technologies seems the only applicable – and sensible – approach. Therefore, in many cases, the only viable solution is to maintain triplestores as an alternative delivery channel, in addition to production systems, a task that becomes increasingly multifarious and performance-demanding, especially when the primary information is rapidly changing. This way the operation of information systems remains intact, while at the same time they expose seamlessly their data as LOD. Several mapping techniques between relational databases and RDF graphs have been introduced in the bibliography, among which various tools, languages, and methodologies. Thus, in order to expose relational database contents as LOD, several policy choices have to be made, since several alternative approaches exist in the literature, without any one-size-fits- all approach 1 . When exporting database contents as RDF, one of the most important factors to be considered is whether RDF content generation should take place in real-time or should database contents be dumped into RDF asynchronously 2 . In other words, the question to be answered is whether the RDF view over the relational database contents should be transient or persistent. Both approaches constitute acceptable, viable approaches, each with its own characteristics, its benefits and its drawbacks.