Fast Spatial Pattern Discovery Integrating Boosting with Constellations of Contextual Descriptors Jaume Amores Universitat Autonoma de Barcelona Spain Nicu Sebe University of Amsterdam The Netherlands Petia Radeva Universitat Autonoma de Barcelona Spain Abstract We present a novel approach for fast object class recogni- tion incorporating contextual information into boosting. The object is represented as a constellation of generalized cor- relograms that integrate both information of local parts and their spatial relations. Incorporating the spatial relations into our constellation of descriptors, we show that an exhaus- tive search for the best matching can be avoided. Combining the contextual descriptors with boosting, the system simulta- neously learns the information that characterize each part of the object along with their characteristic mutual spatial re- lations. The proposed framework includes a matching step between homologous parts in the training set, and learning the spatial pattern after matching. In the matching part two approaches are provided: a supervised algorithm and an un- supervised one. Our results are favorably compared against state-of-the-art results. 1 Introduction Object class recognition has been a challenging area of pattern recognition and computer vision. Difﬁculties arise in the variability of object appearance, accidental condi- tions, and existence of clutter in the images. All this vari- ability demands efﬁcient learning techniques able to sum- marize key properties of the object under different scenar- ios. There has been several approaches recently to address object class recognition in cluttered scenes. Among them, characterizing the object as a collection of parts and their spatial arrangement has proved to be a promising direction [1, 15, 8, 10, 3, 12, 16]. In this work, we focus on efﬁcient spatial pattern of local parts discovery, therefore we regard the object as a constellation of parts together with their mu- tual spatial relationships. Recently, Agarwal et al. [1] pro- posed the use of a dictionary of parts and a Winnow algorithm for learning active features of the object. Schneiderman [15] propose to use an efﬁcient Bayesian network for learning the spatial arrangement. Fergus et al. [8] used a principled unsupervised statistical learning of constellation of parts and spatial relations, and re- port results on several categories of objects with clutter. They use separate probabilistic models for the appearance of parts, and the spatial conﬁguration with maximum-likelihood un- der expectation-maximization. In their work, every possible match between parts in the model and parts in the images is tested, which leads to an exponential cost. They propose to compensate this cost by fast search methods such as A ∗ , and they ﬁnally report a maximum cost of 36 hours for the train- ing stage. Regarding the representation of the parts, local properties are used such as the local appearance. The spa- tial relations are simply described by the difference in spa- tial position. Other authors [10] propose the use of Attribute Relational Graphs (ARGs) for object recognition and spatial pattern discovery. ARG is a common representation for de- scribing an object as local properties of parts and spatial re- lations between them: parts of the object are represented by vertices of the graph, and relations between parts are repre- sented by arcs between vertices. Vertices and arcs have as- sociated feature vectors that describe local information and contextual (spatial) information respectively. Matching be- tween features (parts in an image) and parts in the model is performed by relaxation. Relaxation has a cost of order O(KN 2 M 2 ), where N and M are the number of vertices in the sample and model ARG, respectively, and K is the number of iterations until convergence. This cost is much lower than the one obtained by combinatorial matching but is still prohibitive for a number of vertices of two orders of magnitude, which normally arises in complex images. An important contribution of the latter method is the theoretical derivation of expectation-maximization for modelling ARGs. An important difference between this method and other recent methods (e.g. [8]) is that the former estimates a probability for the joint distribution of spatial relations, while the latter uses separate PDFs for each spatial relation. We aim at building a feature space in which we can gather both the local information describing the parts and the spa- tial relations among every possible pair of parts. Classical contextual representations such as ARGs and constellation of parts deal separately with these two forms of information: lo- cal information is represented by feature vectors associated to each part and contextual information is represented by a set of relative spatial vectors. The way to deal with the match- ing of parts from an instance object to the model is either testing every possible matching (exponential cost) or using the estimated parameters and spatial relations in a structural matching using probabilistic relaxation which also has a high cost. In this work, we propose to use a novel representation of the constellation of parts model, where now the feature vec- tors associated to each part describe not only the local prop- erties of the part but, at the same time, the context of the part.