International Journal of Computer Applications (0975 – 8887) Volume 46– No.10, May 2012 45 HMM based Offline Handwritten Writer Independent English Character Recognition using Global and Local Feature Extraction Rajib Lochan Das Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, INDIA Binod Kumar Prasad Department of Electronics and Communication Engineering, Bengal College of Engineering and Technology, Durgapur, INDIA Goutam Sanyal Department of Computer Science and Engineering, National Institute of Technology, Durgapur, INDIA ABSTRACT Recognition rate of handwritten character is still limited around 90 percent due to the presence of large variation of shape, scale and format in hand written characters. A sophisticated hand written character recognition system demands a better feature extraction technique that would take care of such variation of hand writing. In this paper, we propose a recognition model based on multiple Hidden Markov Models (HMMs) followed by few novel feature extraction techniques for a single character to tackle its different writing formats. We also propose a post-processing block at the final stage to enhance the recognition rate further. We have created a data-base of 13000 samples collected from 100 writers written five times for each character. 2600 samples have been used to train HMM and the rest are used to test recognition model. Using our proposed recognition system we have achieved a good average recognition rate of 98.26 percent. General Terms Handwritten character recognition, Viterbi algorithm, Baum- Welch algorithm. Keywords Hidden Markov Model, Sobel masks, gradient features, curvature features and projected histogram. 1. INTRODUCTION The off-line handwriting recognition (OHR) still remains an active area for research towards exploring the newer techniques that would help improving recognition accuracy. It is because of the fact that several applications including mail sorting, bank processing, document reading and postal address recognition require offline handwriting recognition systems. Character recognition is nothing but Machine simulation of human reading [1], [2]. It is also known as Optical Character Recognition. It contributes immensely to the advancement of an automation process and can improve the interface between man and machine in numerous applications. Several research works have been focussing on new techniques and methods that would reduce the processing time while providing higher recognition accuracy. Study reveals that the methods of Character Recognition have grown up sequentially [3], [4]. The recognition of isolated handwritten character was first investigated [5], but later whole words [6] were addressed. Most of the systems reported in literature until today consider constrained recognition problems based on vocabularies from specific domain e.g., the recognition of handwritten check amounts [7] or postal address [8]. Free handwritten recognition, without domain specific constraints and large vocabularies was addressed only recently in a few papers [9], [10]. The recognition rate of such system is still low and there is a need of improvement [11]. It is now a well established fact that the direction of character strokes contains vast important information for character recognition .If the strokes in certain directions that occur at certain positions could be precisely described in the character image, the character will be easily categorized. Many statistical features used in character recognition are designed according to this idea [12].Previous researchers [13][14] demonstrated that among direction features the gradient features [15] outperform various other directional features. That’s why we have given due stress on Gradient features by finding it both globally and locally for a character image. Gradient features are further supported by Projection features and curvature features. Projection features consist of mean, variance and entropy of the projected histograms on both X- and Y-axes. The tool to train the system with the obtained feature vectors is taken to be HMM because OHR systems based on HMM have been shown to outperform segmentation based approaches [16]-[19]. With the usage of HMM models for the pattern recognition or character recognition, a HMM model keeps information for a character when the model is trained properly and the trained model can be used to recognize an unknown character. The advantage with HMM based systems is that they are segmentation free that is no pre- segmentation of word/line images into small units such as sub-words or characters is required [20]. On the other hand, HMM based approaches have been found to possess some limitations also. These limitations are due to two reasons-(a) the assumptions of conditional independence of the observations given the state sequence and (b) the restriction on feature extraction imposed by frame based observations [21]. However, the rest of the paper has been arranged as follows- Section 2 shows the proposed model, section 3 details out pre- processing, section 4 deals with feature extraction methods; section 5 describes the classifier whereas in section 6, post- processings are described. Section 7 is about the experiments and results. Conclusions have been drawn in section 8 and finally, in section 9, a single set of collected data has been shown.