Score Normalization for HMM-based Word Spotting Using a Universal Background Model Jos´ e A. Rodr´ıguez *+ , Florent Perronnin + + Xerox Research Centre Europe 6, Chemin de Maupertuis, 38240 Meylan (France) * Computer Vision Center (Universitat Autonoma de Barcelona) Edifici O, Campus Bellaterra, 08193 Bellaterra (Spain) jrodriguez@cvc.uab.es, Florent.Perronnin@xrce.xerox.com Abstract Handwritten word spotting (HWS) is traditionally per- formed as an image matching task between one or multi- ple query images and a set of word images in a document. In this article, we address the word spotting problem as a hidden Markov model (HMM) word verification problem and demonstrate the importance of score normalization for improving detection performance. Our main contri- bution is the introduction of a novel score normalization technique in which the conventional HMM filler model is simplified by using a Gaussian mixture model (GMM). The accuracy of the proposed score normalization is on par with the traditional HMM-based score normalization approaches but it has a lower computational cost. We also identify an interesting special case, the semi-continuous HMM, where the proposed score normalization formalism fits very elegantly and comes at a negligible cost. Keywords: word spotting, hidden Markov models, score normalization, Gaussian mixture models, handwrit- ing recognition 1. Introduction Word spotting is the pattern classification task which consists in detecting keywords in document images [10]. This can be formulated as a two-class decision problem: given a word image and a keyword hypothesis, a match is declared if the score of the word image on the keyword model exceeds an application-dependent threshold. Handwritten word spotting (HWS) has been tradition- ally approached from a “query-by-example” perspective. A query image is provided to the system and for each can- didate image in the document a similarity score between them is computed [14]. Two main classes of approaches have been proposed: holistic techniques such as template matching [19] or local approaches such as DTW[14]. The main challenge in both cases is the definition of a score, i.e. a suitable measure of similarity between word images. While query-based approaches can achieve acceptable performance in single-writer scenarios, the combination of multiple examples into a statistical model is expected to increase the retrieval accuracy. There has been much previous research on modelling handwritten words using statistical models, especially hidden Markov models (HMM), leading to superior per- formances. Furthermore, this is a common choice for spotting other types of information, such as spoken words [15] or printed text [3]. But, surprisingly, only few recent works in HWS (like [5, 4]) consider this option. In this work, we adopt this statistical approach to per- form word spotting. Each keyword to spot is represented by a HMM, which is used to determine how likely is any word image to correspond to this class. However, using the raw likelihood value p(X |w) outputted by the HMM, while efficient in closed-world hypotheses (e.g. systems using lexicons), is insufficient for a verification task [2]. Instead, a more correct confidence measure is the poste- rior probability [9]: p(w|X )= p(X |w)p(w) p(X ) . (1) Considering that p(w) can be integrated in the decision threshold and therefore ignored, the posterior probability can be interpreted as a correction of the likelihood p(X |w) with the term p(X ) and is thus called score normalized. However, how to model p(X ) is not trivial. One tra- ditional approach for estimating p(X ) is the use of filler models. The filler model approach consists in using a model identical to the keyword model but trained with all available samples instead of keyword-specific ones. In a HMM framework, filler models are therefore HMMs. The main contribution of this article is the introduction of a novel score normalization method that avoids the use of a HMM for modelling p(X ) and employs a Gaussian mixture model (GMM) instead. A GMM is a simple par- ticular case of HMM with only 1 state, which in practice means that the ordering of the frames is not taken into con- sideration. Our experiments demonstrate that the increase in performance obtained by the GMM score normalization is on par with that of the filler model or, in other words, that considering the order has little impact. But thanks to this simplification, the computational cost is significantly reduced both at training and test time. Similar to other