Curation module in action -preliminary findings on VLO metadata quality Davor Ostojic ACDH-OEAW Vienna, Austria davor.ostojic @oeaw.ac.at Go Sugimoto ACDH-OEAW Vienna, Austria go.sugimoto @oeaw.ac.at Matej Ďurčo ACDH-OEAW Vienna, Austria matej.durco @oeaw.ac.at 1 Background Metadata quality is central to resource discovery. It determines the discoverability and accessibility of resources for the users and metadata curation plays an essential role to control the quality. CLARIN is not an exception. Its main metadata catalogue of language resources, Virtual Language Observatory (VLO) i suffers from a backlash of the flexibility of Component MetaData Infrastructure (CMDI) ii , which is a standardised metadata framework underlying VLO. In fact, metadata curation has been a long stand- ing issue in CLARIN, hence Metadata Curation Task Force was founded to tackle it. Most recently, we have investigated the variability issues of metadata in VLO (King et al., 2015) and the idea of curation module was formalised to provide a solution to assess the quality of the ingested metadata. By now we know how many CLARIN centres are registered (Centre registry iii ), some of which are data providers of VLO, how many records are ingested into VLO (its home page), how many collections we have received (CMDI harvester web view iv ), and how many metadata concepts (CLARIN Concept Registry v ) and profiles (Component Registry vi ) are created to define and semantically bind different types of re- source descriptions. In addition, extra efforts brought us such valuable information as to the structure of the CMD profiles and the reuse of CMD components and concepts (SMC Browser vii ) and what percent- age of VLO facets are covered (Odijk, 2014 and King et al., 2015). However, it was not possible to systematically and automatically collect statistics about the quality of the CMDI metadata. In 2015, we presented the general functional concept of the curation module in the context of overall VLO data ingestion workflow (King et al., 2015) in accordance with some previous works (Trippel et al., 2014; Kemps-Snijders, 2014). This paper will outline the on-going development of the module in CLARIN- PLUS project viii and demonstrate the first findings on the metadata quality. 2 Curation Module The Curation module ix is a software tool developed as a component of the CLARIN metadata infrastruc- ture for curation, normalisation and quality assessment / benchmarking of CMD records, collections and profiles. It is intended as technical support for human curation work to monitor and improve the metadata quality. The output of the tool is a report in XML format containing various statistics, quality assessment scores, and information about issues encountered during the validation and curation accord- ing to an array of quality criteria. The curation module consists of two parts: a core application that works standalone or can be used in other software as library, and a web application which provides a web-based interface as well as a RESTful API. The module can process resources on the web via URL and profile ID as well as local resources of the CMD records and collections. In addition to the interface for assessing own data, the user can explore pre-processed assessments of public profiles (Figure 1) and collections harvested by the CLARIN aggregator. The curation module heavily depends on the Compo- nent Registry from where it fetches XSD schema files of the CMD profiles. This work is licenced under a Creative Commons Attribution 4.0 International Licence. Licence details: http:// creativecom- mons.org/licenses/by/4.0/