LOD4STAT: a scenario and requirements Pavel Shvaiko 1 , Michele Mostarda 2 , Marco Amadori 2 , and Claudio Giuliano 2 1 TasLab, Informatica Trentina S.p.A., Trento, Italy 2 Fondazione Bruno Kessler - IRST, Trento, Italy Abstract. In this short paper we present a scenario and requirements for on- tology matching posed by a statistical eGovernment application, which aims at publishing its data (also) as linked open data. Introduction. Our application domain is eGovernment. By eGovernment we mean an area of application for information technologies to modernize public administration by optimizing work of various public institutions and by providing citizens and businesses with better and new services. More speciﬁcally, we focus on statistical applications for eGovernment. The driving idea is to capitalize on the statistical information in order to increase knowledge of the Trentino region. Releasing statistical data (with disclo- sure control) as linked open data aims at simplifying access to resources in digital for- mats, at increasing transparency and efﬁciency of eGovernment services, etc. The main challenge is the realization of a knowledge base, which is natively enabled to work with RDBMS tables. Despite this approach has been tailored speciﬁcally to the statis- tical database domain, there is substantial room for generalization. In this view, there was a number of initiatives aiming at releasing governmental data as linked open data to be taken into account: in GovWILD [1] links were established automatically with speciﬁcally developed similarity measures, while in [2], the alignment was done semi- automatically with Google Reﬁne. The currently available matching techniques can be well used for automating this process [3]. Scenario. Figure 1 shows the key component, called Statistical Knowledge Base (SKB), of the LOD4STAT system-to-be. The SKB aims at enabling its users to query statistical data, metadata and relations across them without requiring speciﬁc knowledge of the underlying database. Users can issue queries, such as ﬁnd all data related to population age and employment for the municipality of Trento. Speciﬁcally, user query is analyzed in order to extract concepts out of labels. Then, these are matched at run time against the SKB. For the query example, the term population age is connected to Registry Ofﬁce, while employment is connected to Social Security. The system returns a set of tables, metadata and entities from the Registry Ofﬁce (with information about population and age) and from the Social Security (with information about employment) containing data for the city of Trento and will suggest possible joins between columns. The SKB is an interconnected aggregation of ontologies (interpreted in a loose sense), such as WordNet, DBpedia, ESMS 1 what allows both multi-classiﬁcation and multiple views on data. These ontologies have to be matched among them to enable navigation across them through the respective correspondences. The SKB is also able to export query results in several formats, such as RDF Data Cube and JSON-Stat. The SKB is represented by three (horizontal) layers. The upper layer is a collection of on- tologies speciﬁc to the statistics domain, e.g., ESMS. The middle layer is composed 1 http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/metadata