Fostering Serendipity through Big Linked Data Muhammad Saleem 1 , Maulik R. Kamdar 2 , Aftab Iqbal 2 , Shanmukha Sampath 2 , Helena F. Deus 3 , and Axel-Cyrille Ngonga 1 1 Universit¨at Leipzig, IFI/AKSW, PO 100920, D-04009 Leipzig lastname@informatik.uni-leipzig.de 2 Digital Enterprise Research Institute, National University of Ireland, Galway. {firstname.lastname}@deri.org 3 Foundation Medicine Inc. One Kendal Square Cambridge, MA hdeus@foundationmedicine.com Abstract. The amount of bio-medical data available over the Web grows exponentially with time. The large volume of the currently available data makes it difficult to explore, while the velocity at which this data changes and the variety of formats in which bio-medical is published makes it difficult to access them in an integrated form. Moreover, the lack of an integrated vocabulary makes querying this data difficult. In this paper, we advocate the use of Linked Data to integrate, query and visualize big bio-medical data. As a proof of concept, we show how the constant flow of bio-medical publications can be integrated with the 7.36 billion large Linked Cancer Genome Atlas dataset (TCGA). Then, we show how we can harness the value hidden in that data by making it easy to explore within a browsing interface. We evaluate the scalability of our approach by comparing the query execution time of our system with that of FedX on Linked TCGA. Keywords: TCGA, PubMed, RDF 1 Introduction Over the last years, the amount of Linked Data published has grown signifi- cantly. Especially the bio-medical data available as RDF is comprised in partly very large datasets, one of the newest additions to this family of datasets being Linked TCGA [2], a 7.6-billion-triples-strong dataset. Making bio-medical data available as Linked Data presents the obvious advantage of easing the integration of this data, which promises to support bio-medical experts during the analysis of, exploration of and extraction of novel knowledge from this data. Yet, the necessary data management solutions for RDF data still need to be perfected to obtain scalable integrated solutions that can deal with Big Linked Data, i.e., Linked Data which display the three main characteristics of Big Data (volume, velocity and value). In this paper, we present a scalable approach that aims to support the serendipitous discovery of bio-medical hypotheses by providing an interface for