AN EFFICIENT ALGORITHM FOR MATCHING A LEXICON WITH A SEGMENTATION GRAPH David Y. Chen (UC Berkeley) Jianchang Mao and K. Mohiuddin (IBM Almaden) IBM Almaden Research Center 650 Harry Road San Jose, CA 95120-6099 Mao@almaden.ibm.com Abstract This paper presents an efficient algorithm for lexicon-driven handwritten word recognition. In this algorithm, a word image is represented by a segmentation graph, and the lexicon is represented by a trie. As opposed to the standard lexicon-driven matching approach, where dynamic programming is invoked independently for matching each entry in the lexicon against the segmentation graph, the proposed algorithm matches the trie with the segmentation graph. Computation is saved by the efficient representation of the lexicon using the trie data structure. The performance of the proposed approach is compared with the standard dynamic programming algorithm. The proposed approach saves about 48.4% (excluding the trie initialization cost) and 15% of computation time from the standard algorithm when a dynamic lexicon is used. Better performance can be expected in static lexicon cases where the trie needs to be constructed only once. 1. Introduction Handwritten word recognition is a challenging problem encountered in many real-world applications [MohMao99], such as postal mail sorting [Sri93, MaoSinMoh98], bank check recognition [SimBarGor94, PaqLec93], and automatic data entry from business forms [GopLorMaoMoh96]. A prevalent technique for off-line cursive word recognition is based on over-segmentation followed by dynamic programming [BozSri89, GadKelKriChiMoh97]. It seems to outperform segmentation-free Hidden Markov Models (HMMs) using a sliding window [MohGad96]. Over-segmentation based HMMs can also be built [CheKun95]. We use over-segmentation followed by dynamic programming approach for cursive word recognition [MaoSinMoh98, SinMao98]. In this approach, the word recognition problem is posed as a problem of finding the best path in a graph named segmentation graph. A set of split points on word strokes is chosen based on heuristics to divide the word into a sequence of graphemes (primitive structures of characters) (see Figure 1). A character may consist of one, two or three graphemes. Each internal node in the graph represents a split point in the word. The leftmost node and rightmost node indicate the word boundary. Each edge represents the segment between the two split points connected by the edge. Since our over-segmentation module rarely produces more than three graphemes for a character, we remove all the edges that cover more than three graphemes (The edge length can be easily extended to four graphemes) (see Figure 2). A character classifier is usually used to assign a cost to each edge in the segmentation graph. The dynamic programming technique is then used for finding the best path from the leftmost node to the rightmost node. A sequence of characters can then be obtained from the sequence of segments on the best path. Note that this sequence of characters may not form a valid word in a dictionary. If a lexicon of limited size is given, the dynamic programming technique is often used to rank every word in the lexicon. The word with the highest rank is chosen as the recognition hypothesis. The time complexity of this algorithm is θ( G x T TOTAL ), to match G graphemes to a lexicon of size L with templates of total length T TOTAL (total number of characters in the lexicon). This complexity increases linearly with the lexicon size due to the flat representation of the lexicon. Other efficient representations as trie and hash tables [AppJac88, Wel90, DenHocHon97] have been used for speeding up the dictionary (lexicon) search. This assumes that we have a recognition hypothesis of the word to be recognized. Such methods are in the paradigm of OCR followed by contextual post-processing, which suffers from the drawback of making unrecoverable pre-mature OCR decisions. Lexicon-driven matching avoid such a drawback by bringing context (lexicon in this case) earlier in segmentation and recognition. Since the segmentation graph contains all the information of the word image (i.e., the word image can be fully re-constructed from the segmentation graph), no pre-mature decision is made on character segmentation and recognition until the lexicon is applied. This requires that no true segmentation point is missing from the set of over-segmentation points. The hashing technique can not be used for matching a lexicon