IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 1, JANUARY 1999 77 Lexicon-Driven Handwritten Word Recognition Using Optimal Linear Combinations of Order Statistics Wen-Tsong Chen, Paul Gader, Member, IEEE Computer Society, and Hongchi Shi, Member, IEEE Computer Society Abstract—In the standard segmentation-based approach to handwritten word recognition, individual character-class confidence scores are combined via averaging to estimate confidences in the hypothesized identities for a word. We describe a methodology for generating optimal Linear Combination of Order Statistics operators for combining character class confidence scores. Experimental results are provided on over 1,000 word images. Index Terms—Lexicon-driven, handwritten word recognition, linear combination of order statistics, dynamic programming, normalized edit distance, fuzzy integrals. ———————— F ———————— 1 INTRODUCTION IN this brief paper, we do not describe the handwriting recognition process or previous approaches except as needed to describe the contributions of this paper. Descriptions are easily found [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. In lexicon-driven, segmentation-based handwritten word recognition, a handwritten word image is matched to strings in a lexicon. The string that receives the best match score can be the recognition result or the scores can be used in further con- textual processing. Individual character-class confidence scores are combined to compute the match score. The standard com- bination method is the mean. An alternative method based on optimizing linear combinations of order statistics (LOS) op- erators is described here. Experimental results suggesting that the alternative method can produce higher recognition rates are presented. Choquet integrals are generalizations of standard integrals defined in terms of measures which may not be additive. Pre- vious experimental results demonstrated that a Choquet inte- gral provided higher recognition rates than the mean [3]. Our previous work with the Choquet integral relied on a restricted class of measures. For this class, operators based on the Cho- quet integral are equivalent to a subset of the LOS class of op- erators. In this paper, we extend our previous work to find op- timal LOS operators for combining character class confidence scores. We demonstrate results using several formulations of the optimization problem, involving different choices of de- sired outputs and methods for generating training lexicons. Experimental test results are provided on about 1,000 word images. 1.1 Linear Combination of Order Statistics The mean, median, max, and min are simple examples of LOS operators. The robustness properties of LOS operators are useful in the present application and others [3], [13], [14]. Let x = (x 1 , x 2 , …, x n ) be a vector. The ith order statistic of x is the ith smallest element of x. We denote the ith order statistic as x (i) where x x x n () () ( ) ... 1 2 . Let w = (w 1 , w 2 , …, w n ) be a vector of real numbers constrained so that: w w i n i i n i = = = 1 1 0 1 12 and for , , ..., The LOS operator on x = (x 1 , x 2 , …, x n ) with weight vector w is de- fined as LOS wx i i i n ( ) [ Z = = 16 1 . 1.2 Lexicon-Driven Handwritten Word Recognition In this section, we briefly discuss our existing word recognition system, which we refer to as the baseline system. The details are in the literature [1], [2], [3], [4], [5], [15], [16]. The baseline system has the same structure as several handwritten word recognition algorithms [8], [10], [17], [18]. Therefore, the results described in this paper are broadly applicable in the field of handwriting recognition. The inputs are a binary image of a handwritten word and a lexicon of candidate strings. The output of the algorithm is a sorted lexicon in which each string is assigned a confidence value. The string with the highest confidence value is the recognition result. The following algorithm components are included in the baseline system: Segmentation: Segment the image into a sequence of primi- tive segments. Each primitive segment should either be an image of a character or a subimage of a character. An exam- ple is shown in Fig. 1a. Form Unions: Form all legal unions of primitive segments. A union of primitive segments is legal if it passes a series of tests measuring size, complexity, and spatial attributes. Character Confidence Assignment: Assign confidence that each legal union represents each of the 52 character classes. This process is performed using fuzzy membership generating neural networks [6], [15]. Intercharacter Compatibilities: Assign confidence that each pair of adjacent segments are spatially compatible. The con- fidence is assigned using neural networks trained on pairs of characters [2]. Lexicon Match: For each lexicon string, use dynamic program- ming to find the best match between all possible sequences of unions of segments and the string. The match score is the out- put confidence value. It is a combination of character confi- dence values and intercharacter compatibilities. For each lexicon string, the matching process finds a segmenta- tion of the image as depicted in Fig. 1b and Fig. 1c. Each segment in a segmentation is assigned a score indicating how well the seg- ment matches the associated character in the string. For a given string, the matcher finds the segmentation that produces the high- est average match score. In the figure, scores are shown using the average and an optimal LOS operator with weights as shown. We investigated several methods for determining optimal weights as described in Sections 2 and 4. 1.4 Normalized Edit Distance We use Normalized Edit Distance (NED), described in detail in Marzal and Vidal [19], to set parameters of the optimization prob- 0162-8828/99/$10.00 © 1999 IEEE ²²²²²²²²²²²²²²²² Paul Gader and H. Shi are with the Department of Computer Engineering and Computer Science, 201 Engineering Building West, University of Mis- souri–Columbia, 201 EBW, Columbia, MO 65211. E-mail: {gader; shi}@cecs.missouri.edu. W.-T. Chen is with the Department of Electrical Engineering, University of Missouri–Columbia, 201 EBW, Columbia, MO 65211. E-mail: wchen@ece.missouri.edu. Manuscript received 12 Dec. 1997; revised 2 Nov. 1998. Recommended for accep- tance by R. Plamondon. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number 107603.