XTaGe: A flexible generation system for complex XML collections Mar´ ıa P´ erez, Ismael Sanz, and Rafael Berlanga Universitat Jaume I, Spain {mcatalan,isanz,berlanga}uji.es Abstract. We introduce XTaGe (XML Tester and Generator), a sys- tem for the synthesis of XML collections meant for testing and micro- benchmarking applications. In contrast with existing approaches, XTaGe focuses on complex collections, by providing a highly extensible frame- work to introduce controlled variability in XML structures. In this paper we present the theoretical foundation, internal architecture and main fea- tures of our generator; we describe its implementation, which includes a GUI to facilitate the specification of collections; we discuss how XTaGe’s features compare with those in other XML generation systems; finally, we illustrate its usage by presenting a use case in the Bioinformatics domain. 1 Introduction Testing is an essential step in the develoment of XML-oriented applications and in most practical settings, this requires the creation of synthetic data. Existing XML generators focus on either the creation of collections of a given size (for stress testing and workload characterization purposes) or with a fixed schema and little variation. These systems do not suit the requirements of an emerging class of important applications in fields such as Bioinformatics and GIS, which have to deal with large collections that present complex structural features, and specialized content such as protein sequences or vectorial map data. In this context, the main drawback of existing systems in our application context is the lack of extensibility, since all systems are limited by the support of a limited number of predefined generation primitives. Another limitation is the uneven support for the introduction of controlled variability in generated structures, useful for example for micro-benchmarking purposes. Finally, the specification of collections is generally done through the manual creation of a text-based specification file, which can be tedious and error-prone. In this paper we introduce XTaGe (XML Tester and Generator), which fo- cuses on the creation of collections with complex structural constraints and domain-specific characteristics. XTaGe contributes (i) a flexible component- based framework to create highly tailored generators, (ii) a ready-made set of components that model common patterns that arise in complex collections, (iii) easy adaptability to new use cases using a high-level language (XQuery itself)