DSToolkit: An architecture for flexible Dataspace Management ⋆ Cornelia Hedeler 1 , Khalid Belhajjame 1 , Lu Mao 1 , Chenjuan Guo 1 , Ian Arundale 1 , Bernadette Farias L´ oscio 2 , Norman W. Paton 1 , Alvaro A.A. Fernandes 1 , and Suzanne M. Embury 1 1 School of Computer Science, The University of Manchester Oxford Road, Manchester M13 9PL, UK chedeler, khalidb, maol, guoc, arundai7, norm, alvaro, embury@cs.manchester.ac.uk 2 Universidade Federal de Pernambuco, Centro de Informtica Cidade Universitria 50740-540 - Recife, PE - Brasil bfl@cin.ufpe.br Abstract. The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you- go’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now. Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexi- bility provided by the approach for the management of schemas repre- sented in heterogeneous models, supports the complete dataspace lifecy- cle, which includes automatic initialisation, maintenance and improve- ment of a dataspace, and allows the user to provide feedback by annotat- ing result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, select- ing and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one. Keywords: Dataspace Management System, Dataspace lifecycle, Incre- mental improvement 1 Introduction 1.1 Motivation Data integration in various forms has been the focus of ongoing research in the database community for over 20 years. The objective is to provide an integrated ⋆ The work reported in this paper was supported by a grant from the EPSRC.