Approximate Gaussian Mixtures for Large Scale Vocabularies Yannis Avrithis and Yannis Kalantidis National Technical University of Athens iavr,ykalant@image.ntua.gr Abstract. We introduce a clustering method that combines the flexibil- ity of Gaussian mixtures with the scaling properties needed to construct visual vocabularies for image retrieval. It is a variant of expectation- maximization that can converge rapidly while dynamically estimating the number of components. We employ approximate nearest neighbor search to speed-up the E-step and exploit its iterative nature to make search incremental, boosting both speed and precision. We achieve supe- rior performance in large scale retrieval, being as fast as the best known approximate k-means. Keywords: Gaussian mixtures, expectation-maximization, visual vo- cabularies, large scale clustering, approximate nearest neighbor search 1 Introduction The bag-of-words (BoW) model is ubiquitous in a number of problems of com- puter vision, including classification, detection, recognition, and retrieval. The k-means algorithm is one of the most popular in the construction of visual vocab- ularies, or codebooks . The investigation of alternative methods has evolved into an active research area for small to medium vocabularies up to 10 4 visual words. For problems like image retrieval using local features and descriptors, finer vo- cabularies are needed, e.g . 10 6 visual words or more. Clustering options are more limited at this scale, with the most popular still being variants of k-means like approximate k-means (AKM) [1] and hierarchical k-means (HKM) [2]. The Gaussian mixture model (GMM), along with expectation-maximization (EM) [3] learning, is a generalization of k-means that has been applied to vo- cabulary construction for class-level recognition [4]. In addition to position, it models cluster population and shape, but assumes pairwise ‘interaction’ of all points with all clusters and is slower to converge. The complexity per iteration is O(NK) where N and K is the number of points and clusters, respectively, so it is not practical for large K. On the other hand, a point is assigned to the nearest cluster via approximate nearest neighbor (ANN) search in [1], bringing complexity down to O(N log K), but keeping only one neighbor per point. Robust approximate k-means (RAKM) [5] is an extension of AKM where the nearest neighbor in one iteration is re-used in the next, with less effort being spent for new neighbor search. This approach yields further speed-up, since