A String Length Predictor to Control the Level Building of HMMs for Handwritten Numeral Recognition Alceu de S. Britto Jr a,b , Robert Sabourin c,d , Flavio Bortolozzi a , and Ching Y. Suen d a Pontifícia Universidade Católica do Paraná (PUC-PR) Rua Imaculada Conceição, 1155 - Curitiba (PR) 80215-901 - Brazil b Universidade Estadual de Ponta Grossa (UEPG) Pr. Santos Andrade S/N, Centro - Ponta Grossa (PR) 84100-000 - Brazil c École de Technologie Supérieure (ETS), 1100 Rue Notre Dame Ouest - Montreal (QC) H3C 1K3 - Canada d Centre for Pattern Recognition and Machine Intelligence (CENPARMI), 1455 de Maisonneuve Blvd. West, Suite GM 606 - Montreal (QC) H3G 1M8 - Canada Abstract In this paper a two-stage HMM-based method for recognizing handwritten numeral strings is extended to work with handwritten numeral strings of unknown length. We have proposed a Bayesian-based string length predictor (SLP) to estimate the number of digits in a string taking into account its width in pixels. The top 3 decisions of the SLP module are used to control the maximum number of levels to be searched by the Level Building (LB) algorithm. On 12,802 handwritten numeral strings and 2,069 touching digit pairs, this strategy has shown a small loss (0.91%) in terms of recognition performance compared to the results when the string length is considered as known. 1. Introduction The LB search algorithm has been successfully used in the field of speech and text recognition to provide a way of avoiding a prior segmentation of words into characters [1,4,6]. We have used this algorithm to match individual numeral Hidden Markov Models (HMMs) against an unsegmented observation sequence with the objective of obtaining the N best segmentation-recognition paths for handwritten numeral strings of known length [2]. In this paper, we focus on a strategy to control the maximum number of levels in the LB search. The objective is to overcome the constraint related to the necessity of a priori knowledge of the string length (number of digits) for the recognition process. In this direction, Procter and Elms have proposed an adaptative level building [6]. They have tried to control the LB search taking into account the probability of each level. To this end, the LB search should be terminated 2 or 3 levels further the point where the probability of the best match at the current level is lower than that of the previous level. However, this approach brought some loss in terms of recognition rate. They observed best results when the number of levels (L parameter) was fixed at 22. This value of L was appropriate for them (approximately two levels per digit), since it was enough to model each digit and all inter-digit spaces in experiments using strings composed of 2, 3, 4, 5, 6 and 10 digits extracted from the NIST database. In our approach, the L parameter is defined taking into account some contextual information regarding the width of the string (in pixels). The string width is used to estimate the number of digits in the string. With this strategy, we define the L parameter of the LB search dynamically, without a significant loss in terms of recognition performance. 2. System overview A general overview of our method for numeral string recognition is presented in Fig. 1. In the SCB (String Contextual-based) stage, a given numeral string is first preprocessed in order to correct slant, smooth the string contour and calculate the string bounding box. Subsequently, the FF (Foreground Feature) module scans the string image from left to right, while a feature vector based on foreground information is calculated for each column in the string bounding box. This vector is mapped to a discrete symbol available in a previously constructed codebook. The output of the FF module is a sequence of discrete observations representing the entire numeral string. The length of this sequence corresponds to the number of columns in the string bounding box. In the SR (Segmentation-Recognition) module, numeral HMMs trained on isolated digits ( λ λ λ 9 1 0 ..., , , c c c ), but considering