An Approach to Heterogeneous Data Translation based on XML Conversion Paolo Papotti and Riccardo Torlone Dipartimento di Informatica e Automazione Universit`aRomaTre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper, we illustrate a preliminary approach to the translation of Web data between heterogeneous formats. This work fits into a larger project whose aim is the development of a tool for the man- agement of data described according to a large variety of formats used on the Web and the (semi)automatic translation of schemes and instances from one model to another. Data translations operate over XML repre- sentations of instances and rely on a uniform representation of models that we call metamodel. The metamodel shows structural diversities and dictates the needed transformations. Complex translation can be derived by combining a number of predefined basic functions performing XML transformations expressed in XQuery. Practical examples are provided to show the effectiveness of the approach. 1 Introduction Very often, data cooperation and interchange between different organizations is made difficult by the fact that little or no advance standardization exists and data is stored under different formats in distinct heterogeneous sources [1]. Therefore the need arises for an integrated management of heterogeneous data descriptions that allows for easy and flexible data translation from a format to another [6]. This problem is related to, but different from, the problems of data integration [4] and schema matching [20]. Recently, various aspects of the data translation problem has been largely studied in the context of the rela- tional model [9, 10] or in more general settings [16, 18, 19]. However, it is widely recognized that a general solution able to cope the large diversity of the various formats available is a very difficult task [5]. In this framework, the final goal of our research project is the development of a tool for the management of data available on the Web described according to a large variety of formats and models and the (semi)automatic translation of schemes and instances from one model to another. The tool can be seen as an implementation of the “ModelGen” operator proposed by Bernstein in the context of Model Management Systems [5]. In principle, the set of models managed by the tool should include the ma- jority of the formats used to represent data in Web-based applications: semi- structured models, schema languages for XML, specific formats for e.g. scientific