ISSN: 2229-6956(ONLINE) DOI: 10.21917/ijsc.2012.0041 ICTACT JOURNAL ON SOFT COMPUTING, JANUARY 2012, VOLUME: 02, ISSUE: 02 265 COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT M. J. Baheti 1 and K. V. Kale 2 1 Department of Computer Science and Engineering, Shri Neminath Jain Brahmacharyashram’s Late Sau. Kantabai Bhavarlalji Jain College of Engineering, Maharashtra, India E-mail: mamtaji_61079@rediffmail.com 2 Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Maharashtra, India E-mail: kvkale91@gmail.com Abstract With the advent of technological era, conversion of scanned document (handwritten or printed) into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM) and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier. Keywords: Support Vector Machine, Fuzzy Classifier, Gujarati Handwritten Numerals 1. INTRODUCTION Across the globe, almost more than 50 million people speak Gujarati, a language from Indo-Aryan family. In major, Gujarati is spoken and used as official language in Gujarat, a state in India. Irrespective of its wide popularity and use, Gujarati finds less documentation on recognition. It is derived from Devanagari and shares some appearances as that of Devanagari, Sanskrit, Marathi, etc. There is a wide variety in numerals in Indian languages. Fig.1 shows that the numerals belong to Gujarati language. Fig.1. Gujarati Numerals 0 to 9 As mentioned earlier, Gujarati numerals show some same appearances like other numerals in Devanagari. Numerals like 0,2,3,4,7and 8 are same as that in Devanagari but numeral 1 has a bit tilt in the centre in Devanagari whereas it is straight line incase Gujarati 1. Numerals 0, 3, and 7 share confusion among Gujarati numerals while numerals 2 and 4 too are confusing. Confusion arises among 1 and 6. For numerals 1 and 5 confusion may arise due to closed loop for numeral 1 and open in case of numeral 5. Simultaneously numeral 8 and 9 share confusion in shapes. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image is having almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM) and Fuzzy classifiers are used for numeral classification. This paper is organized in following sections; Section 2 describes brief literature survey done for Indian languages recognition. Section 3 details the steps taken for preprocessing. Section 4 describes algorithm which we have used to implement the paper. Section 5 elaborates the feature extraction done. Section 6 describes SVM and Fuzzy based numeral recognition. Section 7 details the conclusion of work done. 2. LITERATURE SURVEY With the advent of technological era, conversion of scanned document (handwritten or printed) into machine editable format has attracted many researchers. Much work has been contributed with the continued effort for recognition of scripts in India. But less amount of work has been surveyed that addresses the recognition of Gujarati language. Although recognition of handwritten numerals is well researched topic but not much work has been reported on Gujarati handwritten numerals, in recent times. The efforts for Gujarati character recognition started by the primitive effort of Antani and Agnihotri [1] in 1999 for printed characters. The authors used Euclidean and hamming distance classifiers for classification of various printed Gujarati characters. Dholakia [2] added his contribution in Gujarati character recognition by giving combined approach of wavelet feature extraction and neural net architecture to classify printed Gujarati characters. Desai [3] has reported recognition of Gujarati handwritten numerals employing skew correction, normalization and then direction profiles as feature vectors using neural net architecture to classify the numerals. Devanagari got its primitive work in 1979 by Sinha and Mahabala [4]. They reported the structural characteristics of Devanagari script. Satish [5] studied Zernike moments and used it for Devanagari handwritten character recognition. Veena [6] described the method to describe the shapes of Devanagari characters and use them for recognition. Bhoumik et. al. [7] proposed an HMM based recognition scheme for handwritten Oriya numerals. Roy et al. [8] used chain code histogram contour points of the segmented numeral and applied neural network and quadratic classifier. Rao et al. [9] adopted feature based approach for isolated Telugu characters. Lakshmi et al. [10] addressed the recognition of printed basic symbols of Telugu language. They used seven moments for feature extraction and KNN as the classifier. Kurian et al. [11] has reported his effort for isolated Malayalam digit recognition using SVM. Jagadeesh Kannan [12] have fused HMM and SVM and used neural network to predict the correct character from Tamil script. Mahmud et al. [13] used free man chain code for scaled character and classified using feed forward neural network based recognition scheme.