International Journal of Computer Applications (0975 888) Volume 47No.18, June 2012 28 Combination of Different Feature Sets and SVM Classifier for Handwritten Gurumukhi Numeral Recognition Anita Rani Rajneesh Rani Renu Dhir Department of Computer Science and Engineering Dr B.R. Ambedkar National Institute of Technology Jalandhar- 144011, Punjab (India) ABSTRACT A lot of research has been done in recognizing handwritten characters in many languages like Chinese, Arabic, Devnagari, Urdu and English. This paper focuses on the problem of recognition of isolated handwritten numerals in Gurumukhi script. We have used different feature extraction techniques such as projection histograms, background directional distribution (BDD) and zone based diagonal features. Projection Histograms count the number of foreground pixels in different directions such as horizontal, vertical, left diagonal and right diagonal creating 190 features. In Background Directional Distribution (BDD) features background distribution of neighbouring background pixels to foreground pixels in 8-different directions is considered forming a total of 128 features. In the computation of diagonal features, image is divided into 64 equal zones each of size 4×4 pixels then features are extracted from the pixels of each zone by moving along its diagonal, thus consisting of total 64 features. Different combinations of these features are used for forming different feature vectors. These feature vectors are classified using SVM classifier as 5-fold cross validation with RBF (radial basis function) kernel. The highest accuracy achieved is 99.4% of whole database using combination of background directional distribution and diagonal features with SVM classifier. General Terms Pattern Recognition, OCR, Handwritten Character Recognition, Feature Extraction. Keywords Handwritten Gurumukhi Numeral Recognition, Feature Extraction, Projection Histograms, Background Directional Distribution (BDD) Features, Diagonal Features, SVM classifier, RBF kernel. 1. INTRODUCTION Optical character recognition, abbreviated as OCR, is the process of converting the images of handwritten, typewritten or printed text(usually captured by a scanner) into machine editable text or computer process able format, such as ASCII code. Applications of OCR include postal code recognition, automatic data entry into large administrative systems, banking,3D object recognition, digital libraries, invoice and receipt processing, reading devices for blind and personal digital assistants. The three main features that characterize a good OCR system are accuracy, flexibility and speed. The basic process of an OCR system consists of phases such as: Image acquisition, preprocessing, segmetation,feature extraction , classification and recognition and post processing as shown in figure 1. Figure 1.Basic Process of an OCR 1. Image Acquisition: In Image acquisition, the recognition system acquires a scanned image as an input image. The image should have a specific format such as JPEG, BMT etc. This image is acquired through a scanner, digital camera or any other suitable digital input device. 2. Preprocessing: The image after acquisition may carry some unwanted noise. The preprocessing stage takes in a raw image, reduces noise and distortion, removes skewness and performs skeltonizing of the image. After preprocessing phase, we get a cleaned image which is used in the segmentation phase. 3. Segmentation: The segmentation stage takes in the image after preprocessing and different logical parts, like lines of a paragraph, words of line and characters of a word are separated in this phase. 4. Feature Extraction: After segmentation, a set of features is required for each character. In feature extraction stage every character is assigned a feature vector to identify it. This vector is used to distinguish the character from other characters. Various feature extraction methods are designed like zoning, PCA, Central moments, structural features, Gabor filters and Directional Distance Distribution. Feature extraction is the process of selection of the type and the set of features. Feature extraction is the most important factor in character recognition. 5. Classification: Classification is the main decision making stage of OCR system. It uses the features extracted in the Image Acquisition Pre-processing Segmentation Feature Extraction Classification Post Processing