Creation, Population and Preprocessing of Experimental Data Sets for Evaluation of Applications for the Semantic Web Gy¨orgy Frivolt, J´an Suchal, Richard Vesel´ y, Peter Vojtek, Oto Voz´ar, and M´ aria Bielikov´a Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology Ilkoviˇcova 3, 842 16 Bratislava, Slovakia Name.Surname@fiit.stuba.sk Abstract. In this paper we describe the process of experimental ontol- ogy data set creation. Such a semantically enhanced data set is needed in experimental evaluation of applications for the Semantic Web. Our re- search focuses on various levels of the process of data set creation – data acquisition using wrappers, data preprocessing on the ontology instance level and adjustment of the ontology according to the nature of the eval- uation step. Web application aimed at clustering of ontology instances is utilized in the process of experimental evaluation, serving both as an example of an application and visual presentation of the experimental data set to the user. 1 Introduction Exponentially growing volume of information on the Web forces designers and de- velopers to solve navigation and search problems with novel approaches. Faceted browsing, clustering and graph visualizations are only a few possibilities which can be – or even are – currently used. Nevertheless, in order to evaluate how any of these approaches improve navigation or searching, experimental data sets are always needed. Such data sets can be created either by generating artiﬁcial data or – as it is in our case – by acquiring data from existing real data sources. While using data from real data sources seem to be an attractive solution, a creation of such experimental data set comes with problems of its own. We describe the process of creating an experimental evaluation ontology from existing data sources that represents a generalization of the process that we applied in the domain of scientiﬁc publications 1 . This process is inﬂuenced by the fact that the ontology is used as an experimental evaluation data set for clustering in an application for the Semantic Web. The major contribution of this work is design and experimental evaluation of a framework dedicated to data acquisition, ontology creation, data preprocessing and clustering. 1 Project MAPEKUS: http://mapekus.fiit.stuba.sk/