RDF Based Architecture for Semantic Integration of Heterogeneous Information Sources Richard Vdovjak, Geert-Jan Houben Eindhoven University of Technology Eindhoven, The Netherlands r.vdovjak, g.j.houben @tue.nl Abstract. The proposed integration architecture aims at exploiting data semantics in order to provide a coherent and meaningful (with respect to a given conceptual model) view of the integrated heterogeneous information sources. The architecture is split into five separate layers to assure modularization, providing description, re- quirements, and interfaces for each. It favors the lazy retrieval paradigm over the data warehousing approach. The novelty of the architecture lies in the combination of semantic and on-demand driven retrieval. This line of attack offers several ad- vantages but brings also challenges, both of which we discuss with respect to RDF, the architecture’s underlying model. 1 Introduction, Background, and Related Work With the vast expansion of the World Wide Web during the last few years the integration of heterogeneous information sources has become a hot topic. A solution to this integration problem allows for the design of applications that provide a uniform access to data obtainable from different sources available through the Web. In this paper we address an architecture that combines issues regarding on-demand retrieval and semantic metadata. 1.1 On-demand Retrieval In principle there are two paradigms for information integration: data warehousing and on- demand retrieval. In the data warehousing (eager) approach all necessary data is collected in a central repository before a user query is issued; this however, brings consistency and scalability problems. The on-demand driven (lazy) approach collects the data from the integrated sources dynami- cally during query evaluation. The MIX project [1], for example, implements a (virtual) XML view integration architecture, with a lazy approach to evaluation of queries in an XML query language specifically designed for this purpose. 1.2 Semantic Integration XML 1 in general, has become an enormous success and is widely accepted as a standard means for serializing (semi)structured data. However, with the advent of the Semantic Web 1 http://www.w3.org/TR/REC-xml