A general framework for subspace detection in unordered multidimensional data Leandro A.F. Fernandes a,n , Manuel M. Oliveira b a Instituto de Computac - ~ ao, Universidade Federal Fluminense (UFF), CEP 24210-240 Nitero ´i, RJ, Brazil b Instituto de Informa ´tica, Universidade Federal do Rio Grande do Sul (UFRGS), CP 15064 CEP 91501-970, Porto Alegre, RS, Brazil article info Article history: Received 14 August 2011 Received in revised form 30 December 2011 Accepted 19 February 2012 Available online 5 March 2012 Keywords: Hough transform Geometric algebra Parameter space Subspace detection Shape detection Blade Grassmannian Coordinate chart Line Circle Plane Sphere Conic section Flat Round Quadric abstract The analysis of large volumes of unordered multidimensional data is a problem confronted by scientists and data analysts every day. Often, it involves searching for data alignments that emerge as well- defined structures or geometric patterns in datasets. For example, straight lines, circles, and ellipses represent meaningful structures in data collected from electron backscatter diffraction, particle accelerators, and clonogenic assays. Also, customers with similar behavior describe linear correlations in e-commerce databases. We describe a general approach for detecting data alignments in large unordered noisy multidimensional datasets. In contrast to classical techniques such as the Hough transforms, which are designed for detecting a specific type of alignment on a given type of input, our approach is independent of the geometric properties of the alignments to be detected, as well as independent of the type of input data. Thus, it allows concurrent detection of multiple kinds of data alignments, in datasets containing multiple types of data. Given its general nature, optimizations developed for our technique immediately benefit all its applications, regardless the type of input data. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction Data analysis is a fundamental element in scientific discovery and data mining. In many scientific fields, visual inspection of experimental datasets is often performed in order to identify strong local coherence in the data. Such coherence results from data alignments (in some multidimensional space), and usually emerges as geometric shapes and patterns. For instance, straight lines and circles appear as well-defined structures in the analysis of electron backscatter diffraction (Fig. 1a) and clonogenic essays (Fig. 1c), respectively. However, when large volumes of data need to be analyzed, visual inspection becomes impractical. For this reason, automatic detectors for specific types of data alignments have been broadly applied by scientists in many different areas, such as particle physics [1,2], astronomy [3,4], microbiology [5,6], crystallography [7,8], and medicine [9,10]. Such detectors are also a central component of many computer vision and image proces- sing applications [11–13]. The goal of automatic detectors is to identify certain kinds of alignments that best fit a given unor- dered dataset, even in the presence of noise and discontinuities. We describe a general approach for detecting data alignments in unordered noisy multidimensional data. Our approach is based on the observation that a wide class of alignments, and also input data entries, can be represented as linear subspaces. Thus, instead of defining a different detector for each specific case and input data type, it is possible to design a unifying framework to detect the occurrences of emerging subspaces in multidimensional datasets. In our framework, these datasets may be heterogeneous and contain entries with different dimensionalities (Fig. 2). Our approach has a broad range of applications as a pattern detection tool. For instance, it can be applied, without any changes, to all kinds of data alignments that can be represented Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition 0031-3203/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2012.02.033 n Corresponding author. Tel.: þ55 21 2629 5665; fax: þ55 21 2629 5669. E-mail addresses: laffernandes@ic.uff.br (L.A.F. Fernandes), oliveira@inf.ufrgs.br (M.M. Oliveira). URLS: http://www.ic.uff.br/ ~ laffernandes (L.A.F. Fernandes), http://www.inf.ufrgs.br/ ~ oliveira (M.M. Oliveira). Pattern Recognition 45 (2012) 3566–3579