Executing SPARQL Queries over the Web of Linked Data Olaf Hartig 1 , Christian Bizer 2 , and Johann-Christoph Freytag 1 1 Humboldt-Universit¨at zu Berlin lastname@informatik.hu-berlin.de 2 FreieUniversit¨atBerlin firstname.lastname@fu-berlin.de Abstract. The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to im- plement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical itera- tors may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The eval- uation of our approach shows its strengths as well as the still existing challenges. 1 Introduction An increasing amount of data is published on the Web according to the Linked Data principles [1,2]. Basically, these principles require the identiﬁcation of enti- ties with URI references that can be resolved over the HTTP protocol into RDF data that describes the identiﬁed entity. These descriptions can include RDF links pointing at other data sources. RDF links take the form of RDF triples, where the subject of the triple is a URI reference in the namespace of one data source, while the object is a URI reference in the namespace of the other. The Web of Linked Data that is emerging by connecting data from diﬀerent sources via RDF links can be understood as a single, globally distributed dataspace [3]. Querying this dataspace opens possibilities not conceivable before: Data from diﬀerent data sources can be aggregated; fragmentary information from multiple sources can be integrated to achieve a more complete view. However, evaluating queries over the Web of Linked Data also poses new challenges that do not arise 287