IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 1, JANUARY 1999 77
Lexicon-Driven Handwritten Word
Recognition Using Optimal Linear
Combinations of Order Statistics
Wen-Tsong Chen,
Paul Gader, Member, IEEE Computer Society,
and Hongchi Shi, Member, IEEE Computer Society
Abstract—In the standard segmentation-based approach to
handwritten word recognition, individual character-class confidence
scores are combined via averaging to estimate confidences in the
hypothesized identities for a word. We describe a methodology for
generating optimal Linear Combination of Order Statistics operators for
combining character class confidence scores. Experimental results are
provided on over 1,000 word images.
Index Terms—Lexicon-driven, handwritten word recognition, linear
combination of order statistics, dynamic programming, normalized edit
distance, fuzzy integrals.
———————— F ————————
1 INTRODUCTION
IN this brief paper, we do not describe the handwriting recognition
process or previous approaches except as needed to describe the
contributions of this paper. Descriptions are easily found [1], [2],
[3], [4], [5], [6], [7], [8], [9], [10], [11], [12].
In lexicon-driven, segmentation-based handwritten word
recognition, a handwritten word image is matched to strings in
a lexicon. The string that receives the best match score can be
the recognition result or the scores can be used in further con-
textual processing. Individual character-class confidence scores
are combined to compute the match score. The standard com-
bination method is the mean. An alternative method based on
optimizing linear combinations of order statistics (LOS) op-
erators is described here. Experimental results suggesting that
the alternative method can produce higher recognition rates are
presented.
Choquet integrals are generalizations of standard integrals
defined in terms of measures which may not be additive. Pre-
vious experimental results demonstrated that a Choquet inte-
gral provided higher recognition rates than the mean [3]. Our
previous work with the Choquet integral relied on a restricted
class of measures. For this class, operators based on the Cho-
quet integral are equivalent to a subset of the LOS class of op-
erators. In this paper, we extend our previous work to find op-
timal LOS operators for combining character class confidence
scores. We demonstrate results using several formulations of
the optimization problem, involving different choices of de-
sired outputs and methods for generating training lexicons.
Experimental test results are provided on about 1,000 word
images.
1.1 Linear Combination of Order Statistics
The mean, median, max, and min are simple examples of LOS
operators. The robustness properties of LOS operators are useful in
the present application and others [3], [13], [14].
Let x = (x
1
, x
2
, …, x
n
) be a vector. The ith order statistic of x is the
ith smallest element of x. We denote the ith order statistic as x
(i)
where x x x n () () ( ) ... 1 2 ≤ ≤ ≤ .
Let w = (w
1
, w
2
, …, w
n
) be a vector of real numbers constrained
so that:
w w i n
i
i
n
i
=
∑
= ≤ ≤ =
1
1 0 1 12 and for , , ...,
The LOS operator on x = (x
1
, x
2
, …, x
n
) with weight vector w is de-
fined as
LOS wx
i i
i
n
( ) [ Z =
=
∑
16
1
.
1.2 Lexicon-Driven Handwritten Word Recognition
In this section, we briefly discuss our existing word recognition
system, which we refer to as the baseline system. The details
are in the literature [1], [2], [3], [4], [5], [15], [16]. The baseline
system has the same structure as several handwritten word
recognition algorithms [8], [10], [17], [18]. Therefore, the results
described in this paper are broadly applicable in the field of
handwriting recognition.
The inputs are a binary image of a handwritten word and a
lexicon of candidate strings. The output of the algorithm is a
sorted lexicon in which each string is assigned a confidence value.
The string with the highest confidence value is the recognition
result. The following algorithm components are included in the
baseline system:
• Segmentation: Segment the image into a sequence of primi-
tive segments. Each primitive segment should either be an
image of a character or a subimage of a character. An exam-
ple is shown in Fig. 1a.
• Form Unions: Form all legal unions of primitive segments. A
union of primitive segments is legal if it passes a series of
tests measuring size, complexity, and spatial attributes.
• Character Confidence Assignment: Assign confidence that each
legal union represents each of the 52 character classes. This
process is performed using fuzzy membership generating
neural networks [6], [15].
• Intercharacter Compatibilities: Assign confidence that each
pair of adjacent segments are spatially compatible. The con-
fidence is assigned using neural networks trained on pairs
of characters [2].
• Lexicon Match: For each lexicon string, use dynamic program-
ming to find the best match between all possible sequences of
unions of segments and the string. The match score is the out-
put confidence value. It is a combination of character confi-
dence values and intercharacter compatibilities.
For each lexicon string, the matching process finds a segmenta-
tion of the image as depicted in Fig. 1b and Fig. 1c. Each segment
in a segmentation is assigned a score indicating how well the seg-
ment matches the associated character in the string. For a given
string, the matcher finds the segmentation that produces the high-
est average match score. In the figure, scores are shown using the
average and an optimal LOS operator with weights as shown. We
investigated several methods for determining optimal weights as
described in Sections 2 and 4.
1.4 Normalized Edit Distance
We use Normalized Edit Distance (NED), described in detail in
Marzal and Vidal [19], to set parameters of the optimization prob-
0162-8828/99/$10.00 © 1999 IEEE
²²²²²²²²²²²²²²²²
• Paul Gader and H. Shi are with the Department of Computer Engineering
and Computer Science, 201 Engineering Building West, University of Mis-
souri–Columbia, 201 EBW, Columbia, MO 65211.
E-mail: {gader; shi}@cecs.missouri.edu.
• W.-T. Chen is with the Department of Electrical Engineering, University of
Missouri–Columbia, 201 EBW, Columbia, MO 65211.
E-mail: wchen@ece.missouri.edu.
Manuscript received 12 Dec. 1997; revised 2 Nov. 1998. Recommended for accep-
tance by R. Plamondon.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 107603.