Journal of Theoretical and Applied Information Technology 15 th July 2018. Vol.96. No 13 © 2005 – ongoing JATIT & LLS ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 4191 A NOVEL HYBRID APPROACH FOR FEATURE EXTRACTION IN MALAYALAM HANDWRITTEN CHARACTER RECOGNITION 1 AJAY JAMES, 2 SUJALA K, 3 CHANDRAN SARAVANAN 1 Assistant Professor, Department of Computer Science and Engineering, Govt. Engineering College Thrissur, India 2 Mtech Student, Department of Computer Science and Engineering, Govt. Engineering College Thrissur, India 3 Associate Professor, Department of Computer Science and Engineering, National Institute of Technology West Bengal, India E-mail: 1 ajayjames80@gmail.com, 2 sujala58@gmail.com, 3 dr.cs1973@gmail.com ABSTRACT Optical Character Recognition (OCR) is defined as the process of segregating textual scripts from a scanned document. To develop a digitally empowered society, information is made available in digital form. The OCR software assists in digitization of documents in different languages. Many researches are working on digitization of documents particularly to develop effective and error free character recognition models. To develop a digitally empowered society, information should be made digitally available. There arises the need for an OCR software in different languages. Malayalam handwritten character recognition precision is still inhibited around 90% due to the confrontations in Malayalam character set. The omnipresence of two different scripts old and new script, huge character set, ubiquity of similar shaped characters makes Malayalam handwritten character recognition more difficult. Feature extraction for each language may vary depending on various characteristics of the language. By observing the shape patterns in each language, different novel methods are developed to extract features and also to recognize the same. In this research, a novel hybrid approach is proposed which uses a combination of statistical and structural features (SSF). The statistical features are those derived from the statistical dissipating of pixels. Structural features are based on the topological and geometrical properties of the character. This study gives insight to the fact that combination of statistical and structural features gives more accuracy in Malayalam character recognition. Keywords— Optical Character Recognition, Binarization, Feature Extraction, Classification, Machine Recognition, Decision Tree. 1. INTRODUCTION Optical character recognition has many applications in day to day life. Nowadays, digitization is in high priority and machine recognition of characters is part of it. Researches in this area helps to develop a digitally empowered society. Many old documents in written or printed form are easily converted to editable text by means of digitization. OCR is a technology that enables us to convert different types of documents, such as scanned paper documents into editable and searchable data. For example, a written or printed document to be sent to our partner via email or other electronic medium by digitizing as an image other file forms. OCR helps to scan the document and store it as editable document for modifying the scanned document. In order to extract and repurpose data from scanned documents, OCR software is required that would single out letters on the image, put them into words and then words into sente`1nces, thus enabling accessing and editing the content of the original document. Handwritten character extraction and recognition can ameliorate the human computer interaction and better integrate computers into human society. The handwritten text recognition