Handwritten Character Recognition Using PCA of State Space Point Distribution Lajish VL TCS Innovation Lab - Mumbai, TCS Yantra Park, Thane, Mumbai, India Lajish.VL@TCS.Com Sita G Department of Electrical Engineering Indian Institute of Science, Bangalore Sita.G@TCS.Com Sunil Kumar Kopparapu TCS Innovation Lab - Mumbai, TCS Yantra Park, Thane, Mumbai, India SunilKumar.Kopparapu@TCS.Com Abstract—In this paper, we investigate the utility of a feature set derived using the principles of non-linear dynamics applied to gray scale images for handwritten character recognition. The feature set is derived using the principles of non-linear dynamics from the state-space-map (SSM) generated from gray- image. The State-Space Point Distribution (SSPD) parameters are then extracted from the SSM. We use principal component analysis (PCA) on the SSPD features in order to decorrelate the features and for dimension reduction. We studied the recognition results using Nearest Neighbor (NN) classifier, with two different distance measures, namely Euclidean and Mahalanobis distances, and a Bayesian classifier. Experimental results obtained using different number of principal components, demonstrate that novel feature set holds promise and is effective for handwritten character recognition. I. I NTRODUCTION Handwritten Character Recognition (HCR) system can im- prove human computer interaction and has been an active research area. Promising results are reported in the area of Handwritten Character Recognition (HCR) for languages like English, Chinese, Korean, Japanese, Arabic; and for Devana- gari, Bangla, Tamil, Telugu, Kannada in Indian languages. HCR research in Malayalam is still in its infancy and this work experiments with Malayalam character recognition, though the framework is general and can be used for any language. We describe a novel approach for feature extraction from gray- scale images of the handwritten characters and then use it to recognize isolated handwritten characters. A handwritten character image, denoted by a two- dimensional function f (x, y), is treated as a non-linear dynam- ical process with the original scalar measurements including the pixel intensity value f and the spatial co-ordinates (x, y). The method of feature extraction exploits the topographic structure in a gray-scale image. A state-space activity map based on the pixel intensity distribution of both the foreground and the background of the gray-scale image is constructed. Parameters obtained from the state-space point distribution (SSPD) are used as the feature set to represent the char- acter. We investigate the performance of these features on Malayalam handwritten character recognition using Nearest Neighbor (NN) with Euclidean and Mahalanobis distances and a Bayes classifier. Principal Component Analysis (PCA) is used to reduce the dimensionality of the SSPD feature set. This reduced dimensional PCA features are used in all classifiers. The paper is organized as follows. Section II describes the theory behind reconstructed state-space and its application to handwritten character images. Section III explains the feature extraction and the classifiers used. Section IV presents the experimental results obtained with various experiments along with a discussion of the results. In the last section (Section V) we present the conclusions. II. THE RECONSTRUCTED STATE-SPACE FOR HANDWRITTEN CHARACTER I MAGES A number of methods exist in literature for offline hand- written character recognition which are based on gray-scale image based features [1], [2], [3]. We look at this problem from a non-linear dynamical system point of view and investigate the performance of the features obtained by application of principles of non-linear dynamics to gray-scale images for recognition purposes. In the case of purely deterministic systems, once it’s state is fixed, then the states at all future times can be determined as well. Thus by all accounts it is significant to establish a vector space called State-Space for the system such that, specifying a point in this space specifies the state of the system. This helps us study the dynamics of the system by studying the dynamics of the corresponding state-space points. The concept of the state of a system is powerful for non- deterministic systems also. Takens theorem [4] states that under certain assumptions, state-space of a dynamical system can be reconstructed through the use of time delayed (space varying) versions of the original scalar measurements. This new state-space is commonly referred in the literature as a reconstructed state-space. A reconstructed state-space can be treated as a powerful signal-processing domain, specially when the dynamical system of interest is non-linear or even chaotic [5], [6]. A reconstructed state-space for a dynamical system can be produced from a measured state variable, I n where n = 1, 2, ··· ,N , via the method of delays by creating vectors given by I n =[i n ,i n+τ ,i n+2τ , ··· ,i n+(d-1)τ ] where d is the embedding dimension and τ is the chosen time or space delay value. The row vector I n defines the position of a single point in the reconstructed state-space. The row vectors NCC 2009, January 16-18, IIT Guwahati