Improving access to LOD through Knowledge Patterns Valentina Presutti Consiglio Nazionale delle Ricerche Semantic Technology Lab, ISTC Rome, Italy valentina.presutti@cnr.it Lora Aroyo Free University of Amsterdam Intelligent Information Systems Web and Media The Netherlands Lora Aroyo l.m.aroyo@cs.vu.nl Aldo Gangemi Consiglio Nazionale delle Ricerche Semantic Technology Lab, ISTC Rome, Italy aldo.gangemi@cnr.it ABSTRACT The cloud of Linked Open Data (LOD) appears, in re- cent research, to be an ideal basis for improving user ex- perience when interacting with Web content across dif- ferent applications and domains. Using LOD datasets, however, is not straightforward. They often introduce noisy results and do not follow a unified way of orga- nizing their knowledge, and thus, it is unknown how to query them. To deal with these problems we propose a knowledge patterns (KP) based approach to analyze LOD datasets, and we show how the recognition of KP in datasets can support querying them even if their vo- cabularies are previously unknown. Finally, we discuss results from experiments on three LOD datasets. 1. INTRODUCTION The notion of Linked Data for “connecting data from diverse domains” [2] and make them available as shared knowledge on the Semantic Web indicates that this rich collection of datasets, describing various domains, can be exploited for improving search through related data. Current research [5], [11], [1] as well as large search ini- tiatives like Google and Powerset show the benefits of using explicit semantics and linked data to refine search results. However, using efficiently the explicit knowl- edge of each LOD dataset is not so straightforward. They typically cover diverse domains, do not follow a unified way of organizing the knowledge, differ in size, quality and granularity of the description, which bur- dens their querying. Although analyses of LOD such as [4], [6], [9] help us having a better general under- standing of the LOD cloud nature, they do not support efficient utilization in specific applications. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright ACM ...$10.00 Research such as [11] shows that linked data can be exploited for improving the user interaction with Web content, i.e. through recommending relevant and re- lated content for a specific information goal. However, linked open data datasets are often noisy. The reason- ing with them brings forward trivial facts and fails to surface serendipitous knowledge. One way of looking into this problem is to involve patterns in the usage of the linked data [14]. Knowledge Patterns (KP) embed the most important relations for describing a relevant piece of knowledge in a certain domain. They are - for knowledge representation - the analogous of frames in linguistics, and schemata in cognitive science [3]. In this context, KP research together with the work on analyzing linked data provide the basis for formulating the research question on how to achieve efficient and effective utilization of LOD sources to (1) improve in- teroperability within LOD datasets and detect incom- patibility issues, (2) to be able to compare analysis data about different LOD datasets, and (3) to improve the user-interaction when searching for relevant content. We propose an approach to analyze LOD datasets, which combines both top-down and bottom-up strategies for identifying emerging KP, general KP, and their align- ments. In this paper we discuss the bottom-up strat- egy aiming at modeling, inspecting, and summarizing datasets. Central in this method is the construction of a semantic description, i.e. dataset knowledge archi- tecture, relying on the notions of paths and KP. We identify the central properties and types in a dataset and extract dataset KP based on the central types. In other words, we extract the dataset vocabulary and an- alyze the way the data is used in terms of patterns. We also associate a set of general purpose measures to the knowledge architecture components of a dataset for performing empirical analysis. These measures could be further refined for specific use cases. The overall contribution of this work is the validation of the usefulness of KP for enabling scoping, summa- rization and querying of LOD datasets without prior knowledge of their organization. We show how KP and