Ephedra: eﬃciently combining RDF data and services using SPARQL federation Andriy Nikolov, Peter Haase, Johannes Trame, Artem Kozlov metaphacts GmbH, Walldorf, Germany {an, ph, jt, ak}@metaphacts.com Abstract. Knowledge graph management use cases often require addressing hy- brid information needs that involve multitude of data sources, multitude of data modalities (e.g., structured, keyword, geospatial search), and availability of com- putation services (e.g., machine learning and graph analytics algorithms). Al- though SPARQL queries provide a convenient way of expressing data requests over RDF knowledge graphs, the level of support for hybrid information needs is limited: existing query engines usually focus on retrieving RDF data and only support a set of hard-coded built-in services. In this paper we describe represen- tative use cases of metaphacts in the cultural heritage and pharmacy domains and the hybrid information needs arising in them. To address these needs, we present Ephedra: a SPARQL federation engine aimed at processing hybrid queries. Ephedra provides a ﬂexible declarative mechanism for including hybrid services into a SPARQL federation and implements a number of static and runtime query opti- mization techniques for improving the hybrid SPARQL queries performance. We validate Ephedra in the use case scenarios and discuss practical implications of hybrid query processing. 1 Introduction SPARQL has emerged as a standard formalism for expressing information requests in Semantic Web applications where the goal is to retrieve the data stored as RDF. However, in many practical knowledge graph management use cases there is a need to address hybrid information needs. Such needs can be characterized by the following dimensions: – Variety of data sources. There is often a need to integrate data stored in several physical repositories. These repositories can include both native RDF triple stores as well as datasets in other formats presented as RDF (e.g., a relational database exposed using R2RML mappings). – Variety of data modalities. Graph data in RDF often needs to be combined with other data modalities, e.g., textual, temporal or geospatial data. A SPARQL query then needs to support corresponding extensions for full-text, spatial, and other types of search. – Variety of data processing techniques. Retrieved data often has to be further pro- cessed using dedicated domain-speciﬁc services: e.g., graph analytics (ﬁnding the shortest path or interconnected graph cliques), statistical analysis and machine learn- ing (applying a machine learning classiﬁer, ﬁnding similar entities using a vector space model), etc.