Vis Comput (2009) 25: 923–937 DOI 10.1007/s00371-009-0368-7 ORIGINAL ARTICLE Visual analysis of image collections Danilo M. Eler · Marcel Y. Nakazaki · Fernando V. Paulovich · Davi P. Santos · Gabriel F. Andery · Maria Cristina F. Oliveira · João Batista Neto · Rosane Minghim Published online: 3 June 2009 © Springer-Verlag 2009 Abstract Multidimensional Visualization techniques are invaluable tools for analysis of structured and unstructured data with variable dimensionality. This paper introduces PEx-ImageProjection Explorer for Images—a tool aimed at supporting analysis of image collections. The tool sup- ports a methodology that employs interactive visualizations to aid user-driven feature detection and classification tasks, thus offering improved analysis and exploration capabilities. The visual mappings employ similarity-based multidimen- sional projections and point placement to layout the data on a plane for visual exploration. In addition to its application to image databases, we also illustrate how the proposed ap- proach can be successfully employed in simultaneous analy- sis of different data types, such as text and images, offering a common visual representation for data expressed in differ- ent modalities. Keywords Visual data mining · Image analysis · Biomedical imaging and visualization 1 Introduction Image analysis and image processing applications typically compute feature vectors from images, so that they can Electronic supplementary material The online version of this article (http://dx.doi.org/10.1007/s00371-009-0368-7) contains supplementary material, which is available to authorized users. D.M. Eler · M.Y. Nakazaki · F.V. Paulovich · D.P. Santos · G.F. Andery · M.C.F. Oliveira · J. Batista Neto · R. Minghim () Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Av. Trabalhador São-carlense, 400 São Carlos, SP, Brazil e-mail: rminghim@icmc.usp.br be compared according to content (dis)similarity. Pattern recognition algorithms typically inspect the m-dimensional space defined by the extracted features. Techniques such as neural networks, Support Vector Machines (SVM) and clustering are widely employed for image comparison and classification [13]. A common difficulty is the large num- ber of features, which define a high-dimensional space that strongly affects performance of classification and clustering. Handling this problem typically involves feature selection and/or extraction techniques in order to reduce the number of features. Reduction impacts the behavior of classification algorithms, which must be tuned almost on an individual ba- sis for optimal performance on specific data sets. Multidimensional Projections are commonly applied to generate graphical representations of multidimensional data. They work by projecting data originally defined in an m- dimensional space into a p-dimensional space where p m (typically p = 2, 3). Techniques vary in their approaches, but a common goal is that data representation in the pro- jected space should preserve relevant data relationships de- fined in the original space. In this work we employ projection techniques to generate unsupervised classifications of image data sets aimed at sup- porting interactive user-driven exploratory analysis. Such techniques have been previously employed in Projection Ex- plorer (PEx) [4] to map document sets based on their con- tent similarity, with interesting results [5, 6]. We adapt and extend this underlying framework to support exploration of image data and associated textual information, implement- ing a new tool called Projection Explorer for Images (PEx- Image). We show that integrating both input types, image and text, into a single exploratory environment enhances its ability of revealing interesting cases within the data. Integration is possible as long as both data sets can be expressed as vector spaces, or alternatively, if data instances