ORIGINAL RESEARCH Degraded offline handwritten Gurmukhi character recognition: study of various features and classifiers Anupam Garg 1 Manish Kumar Jindal 2 Amanpreet Singh 3 Received: 20 June 2019 / Accepted: 16 November 2019 Ó Bharati Vidyapeeth’s Institute of Computer Applications and Management 2019 Abstract Recognition of degraded offline handwritten characters of Gurmukhi script is very challenging task due to the complex structural properties of the script, which is not matter-of-fact in majority of other scripts. A study based on the combination of various feature extraction techniques for character recognition has been presented in this paper. By extracting statistical features in hierarchical order from the pre-segmented degraded offline handwritten Gurmukhi characters, the potential results are analyzed for the recognition. Four types of feature extraction tech- niques, namely, zoning, diagonal, peak extent based fea- tures (horizontally and vertically) and shadow features have been considered in the present study. For classifica- tion, three classifiers, specifically, k-NN, decision tree and random forest are employed to demonstrate the effect on the problem of degraded offline handwritten Gurmukhi character recognition. Authors have collected 8960 sam- ples which are partitioned using the partitioning strategy and fivefold cross validation technique. In partitioning strategy, 80% of data is taken as the training dataset and remaining 20% data is considered as the testing dataset. Various parameters for performance measures such as recognition accuracy, false rejection rate (FRR), area under curve (AUC) and root mean square error (RMSE) are also used for analyzing the performance of features and classifiers. Keywords Degraded character recognition Á Handwriting recognition Á Feature extraction Á Classification 1 Introduction Ever since the era of computing technology begun, there was a stringent need for the digitization of handwritten documents. Earlier, due to the lack of recognition systems, the documents were converted into digital format by re- typing the text matter. But nowadays, it becomes easier with the evolution of artificial intelligence and pattern recognition. Document analysis and recognition (DAR) played a crucial part in the discipline of pattern recogni- tion, to analyze and recognize the text in printed or hand- written text document. In order to understand the OCR, one must also understand the basics of the language under consideration and the script used for that language. In the present era, with the technological advancement and many cycles of refinements of algorithms, the optical character recognition (OCR) software have come of age and have shown greater precision and accuracy and goal is by associating symbolic identities with images of characters, emulate the human ability to read at a much faster rate. On the documented image, various articles such as symbols, alphabets, images or numbers are drawn which are bridged with a signification in character recognition. The OCR (optical character recognition) mechanism can be classified based on the input image i.e. for printed documents and for handwritten documents. In case of handwritten characters, there are further different & Anupam Garg er.anugarg@gmail.com 1 Research Scholar, Department of Computer Science and Engineering, I. K. G. Punjab Technical University, Jalandhar, Punjab, India 2 Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, Punjab, India 3 Department of Electronics and Communication Engineering, I. K. G. Punjab Technical University, Jalandhar, Punjab, India 123 Int. j. inf. tecnol. https://doi.org/10.1007/s41870-019-00399-3