© Springer International Publishing Switzerland 2015
S.C. Satapathy et al. (eds.), Proc. of the 3rd Int. Conf. on Front. of Intell. Comput. (FICTA) 2014
551
– Vol. 1, Advances in Intelligent Systems and Computing 327, DOI: 10.1007/978-3-319-11933-5_62
Word-Level Script Identification from Handwritten
Multi-script Documents
Pawan Kumar Singh
*
, Arafat Mondal, Showmik Bhowmik,
Ram Sarkar, and Mita Nasipuri
Department of Computer Science and Engineering,
Jadavpur University, Kolkata, India
pawansingh.ju@gmail.com
Abstract. In this paper, a robust word-level handwritten script identification
technique has been proposed. A combination of shape based and texture based
features are used to identify the script of the handwritten word images written in
any of five scripts namely, Bangla, Devnagari, Malayalam, Telugu and Roman.
An 87-element feature set is designed to evaluate the present script recognition
technique. The technique has been tested on 3000 handwritten words in which
each script contributes about 600 words. Based on the identification accuracies
of multiple classifiers, Multi Layer Perceptron (MLP) has been chosen as the
best classifier for the present work. For 5-fold cross validation and epoch size
of 500, MLP classifier produces the best recognition accuracy of 91.79% which
is quite impressive considering the shape variations of the said scripts.
Keywords: Script identification, Handwritten Indic scripts, Texture based fea-
ture, Shape based feature, Multiple Classifiers.
1 Introduction
India is a multi-lingual country where people reside at different sections use different
languages/scripts. Each script has its own characteristics which is very different from
other scripts. Therefore, in this multilingual environment, to develop a successful
Optical Character Recognition (OCR) system for any script, separation or identifica-
tion of different scripts beforehand is utmost important because it is perhaps impossi-
ble to design a single recognizer which can identify a variety of scripts/languages.
Script identification facilitates many important applications such as sorting the docu-
ment images, selecting appropriate script specific OCR system and searching digi-
tized archives of document images containing a particular script, etc.
Resemblances among the character set of different scripts are more feasible for
handwritten documents rather than for the printed ones. Cultural differences, individ-
ual differences, and even differences in the way people write at different times, en-
large the inventory of possible word shapes seen in handwritten documents. Also,
*
Corresponding author.