Improving the Usability of Hierarchical Representations for Interactively Labeling Large Image Data Sets Julia Moehrmann 1 , Stefan Bernstein 2 , Thomas Schlegel 3 , Günter Werner 2 and Gunther Heidemann 1 , 1 Intelligent Systems Group, University of Stuttgart, Universitätsstr. 38, 70569 Stuttgart, Germany {moehrmann, heidemann}@vis.uni-stuttgart.de 2 University of Apllied Sciences Mittweida, Technikumplatz 17, 09648 Mittweida, Germany {sbernste,gwerner}@hs-mittweida.de 3 Softwareentwicklung ubiquitärer Systeme, TU Dresden, Nöthnitzer Straße 46, 01187 Dresden, Germany thomas.schlegel@tu-dresden.de Abstract. Image recognition systems require large image data sets for the training process. The annotation of such data sets through users requires a lot of time and effort, and thereby presents the bottleneck in the development of recognition systems. In order to simplify the creation of image recognition systems it is necessary to develop interaction concepts for optimizing the usability of labeling systems. Semi-automatic approaches are capable of solving the labeling task by clustering the image data unsupervised and presenting this ordered set to a user for manual labeling. A labeling interface based on self- organizing maps (SOM) was developed and its usability was investigated in an extensive user study with 24 participants. The evaluation showed that SOM- based visualizations are suitable for speeding up the labeling process and simplifying the task for users. Based on the results of the user study, further concepts were developed to improve the usability. Keywords: Self-organizing map, SOM, user study, image labeling, ground truth data 1 Introduction The importance of image recognition systems increases with the ubiquity of webcams and camera phones. However, the extensive development of image recognition systems for non-industrial purposes fails due to time and cost. One important factor is the labeling (or annotation) of the training images. Labeling the image data is necessary for building a ground truth data set on which the classifier for a specific recognition system can be trained. Since correctness in the ground truth data is crucial