Towards Budget Comparative Analysis: The Need for Fiscal Code Lists as Linked Data Panagiotis-Marios Filippidis Open Knowledge Greece and Aristotle University of Thessaloniki School of Journalism and Mass Communications Thessaloniki, Greece pafilipp@jour.auth.gr Sotirios Karampatakis Open Knowledge Greece and Aristotle University of Thessaloniki School of Mathematics Thessaloniki, Greece sokaramp@auth.gr Lazaros Ioannidis Open Knowledge Greece and Aristotle University of Thessaloniki School of Mathematics Thessaloniki, Greece larjohn@math.auth.gr Jindˇ rich Mynarz Department of Information and Knowledge Engineering, University of Economics W. Churchill Sq. 4 130 67 Prague, Czech Republic jindrich.mynarz@vse.cz Vojt ˇ ech Svátek Department of Information and Knowledge Engineering, University of Economics W. Churchill Sq. 4 130 67 Prague, Czech Republic svatek@vse.cz Charalampos Bratsas Open Knowledge Greece and Aristotle University of Thessaloniki School of Mathematics Thessaloniki, Greece cbratsas@math.auth.gr ABSTRACT Code lists are a key part of budget datasets as they serve for the coding of fiscal concepts within them. However, the great diversity of classifications across countries and con- cepts does not allow to presume upon their actual value, as dimension properties. In this paper we discuss the need for creating code lists Linked Data for the classifications used in fiscal datasets, in three basic steps. First, code lists have to be extracted from fiscal datasets, especially if there are no relevant metadata in the budget description, which could easily identify them. Next, code lists from different datasets or sources have to be represented in the same way, with SKOS vocabulary, thus they can be linked with each other. Finally, linking of similar code lists will also allow the link- ing of the containing datasets, increasing their data analysis and knowledge extraction possibilities. CCS Concepts •Information systems → Resource Description Frame- work (RDF); Ontologies; Extraction, transformation and loading; Hierarchical data models; Information re- trieval diversity; Keywords Linked Data, SKOS, Knowledge Extraction 1. INTRODUCTION Budget datasets contain detailed information about the ways public money is spent to the functions of the gov- ernment. They include fields that refer to fiscal and other c 2016 Copyright held by the author/owner(s). SEMANTICS 2016: Posters and Demos Track September 13-14, 2016, Leipzig, Germany budget related concepts. Many of these fields have a spe- cific range of values. To this end, statistic agencies across Europe have created appropriate code lists, which are pre- scribed controlled vocabularies that contain all the values a specific field can get. Code lists are an essential part of a budget dataset as they serve for the coding of concepts that can be written or described in many ways. The hierarchical structure of the majority of the code lists allows the hierarchy of concepts related to budget, thus they may support aggregated views over data, for example, over a particular expenditure cate- gory, or a municipality administration office. Additionally, standardized code lists are a key device to make fiscal data comparable. This information can be shared across several datasets and be interlinked to external data too, allowing comparisons between budgets of different years and differ- ent organizations as well, as they use the same codes for concepts that they would otherwise describe in a different way. International authorities propose many generic code lists that have been fully or partially adapted to the national budget representation of European countries. The most common case of differentiation is when a national code list contain one or two additional levels of detail in relation to the international classification, according to the respective needs of the country. Furthermore, European countries have also established and use their own code lists. That leads to a variety of classifications for the same concepts, while they may use code lists in different fields of the budget, as well. However, the use of officially proposed code lists in bud- gets is rather limited so far. We noticed that in many cases, the classifications used in budget datasets differ from both international and national proposed code lists and there are no relevant metadata about them. Thus, the ambiguous use of code lists by countries and municipalities is a very com-