Segmented Handwritten Text Recognition with Recurrent Neural Network Classifiers Bolan Su 1 , Xi Zhang 2 , Shijian Lu 1 and Chew Lim Tan 2 1. Institute for Infocomm Research, A*Star 1 Fusionopolis Way, 21-01 Connexis (south tower), Singapore 138632 Email: {subl, slu}@i2r.a-star.edu.sg 2. School of Computing, National University of Singapore Computing 1, 13 Computing Drive, Singapore 117417 Email: {zhangxi, tancl}@comp.nus.edu.sg Abstract—Recognition of handwritten text is a useful tech- nique that can be applied in different applications, such as signature recognition, bank check recognition, etc. However, the off-line handwritten text recognition in an unconstrained situation is still a very challenging task due to the high complexity of text strokes and image background. This paper presents a novel segmented handwritten text recognition technique that ensembles recurrent neural network (RNN) classifiers. Two RNN models are first trained that take advantage of the widely used geometrical feature and the Histogram of Oriented Gradient (HOG) feature, respectively. Given a handwritten word image, the optimal recognition result is then obtained by integrating the two trained RNN models together with a lexicon. Experiments on public datasets show the superior performance of our proposed technique. I. I NTRODUCTION With the rapid development of computing devices, sensors, and storage facilities, more and more valuable documents including handwriting documents are digitalized and stored in databases for public access. However, recognition of uncon- strained handwritten documents is always a challenging task and poor recognition results may lead to unreliable retrieval output and accordingly affect the accessibility of those valuable document information. At the same time, huge amount of important documents such as bank checks require handwriting inputs such as payee’s names, money amount, and payer’s signature. Accurate and robust handwriting recognition will help greatly to save the manpower and improve the produc- tivity while handling these various types of documents with handwritten text. Similar to the speech signal, handwritten text can often be viewed as a sequence of continuous signals. A number of techniques that succeeded in speech processing tasks have therefore been applied in the handwritten text recognition domain. Hidden Markov Models (HMMs), one of the most popular technique in speech recognition, has been successfully applied for handwritten text recognition. In particular, Wilfong et. al. proposed to use one HMM to represent one isolated handwritten word [1], though the proposed approach cannot be used for words which do not appear in the training data. Moreover, the approach cannot be scaled to large vocabularies, because a considerable amount of training data is required for each word and every distinct occurring word needs an HMM. To recognize arbitrary words, HMMs are used to represent character models instead of the whole words, and one word or text line are represented by a sequence of linearly connected HMMs [2]. Then the most likely character sequence can be obtained for a given text line by combining the trained HMMs model and the Viterbi algorithm. However, the HMMs methods have a number of lim- itations. In particular, the probability of every observation depends only on the current state, which makes it difficult to incorporate the context information. More importantly, HMMs as a generative modle, may not provide better performance than discriminative models, because handwritten document recognition is essentially a discriminative task. Combining HMMs and neural networks has been proposed as a hybrid approach for handwriting recognition [3], [4], [5], [6], [7]. Different types of neural network architectures have been proposed, such as Multilayer Percetrons (MLP) [4], [5], time delay neural network [6], [3], and Recurrent Neural Networks (RNNs) [7], but most still suffer from the limitations of HMMs though they can capture the context information.. In the recent works, Recurrent Neural Network (RNN), with Connectionist Temporal Classification (CTC) output layer has been applied for the unconstrained handwritten document recognition [8], [9]. Compared with traditional RNNs, the RNNs with CTC output layer requires no pre-segmented input data, namely, the whole unsegmented sequence of the input data can be mapped to the output labels directly. Combined with a dictionary, the RNN plus CTC approach outperforms HMMs for both online and offline recognition tasks [8]. In this paper, we extend the RNNs approach by introducing a new set of Histogram of Oriented Gradient (HOG) fea- tures [10]. Two RNNs are trained on the HOG features and the traditional geometrical features separately. The classification outputs of the two models are then combined for optimal recognition results. The main contribution of our proposed technique can be summarized as below: • First, we use the HOG column feature for better handwritten text recognition accuracy. • Second, we propose an approach to ensemble recog- nition results of different networks for better perfor- mance. • Third, we develop a handwritten word recognition sys- tem based on RNNs and achieve superior performance