S. Chaudhury et al. (Eds.): PReMI 2009, LNCS 5909, pp. 406–413, 2009. © Springer-Verlag Berlin Heidelberg 2009 Resolving Ambiguities in Confused Online Tamil Characters with Post Processing Algorithms A.G. Ramakrishnan and Suresh Sundaram Medical Intelligence and Language Engineering Laboratory Indian Institute of Science, Bangalore, India {ramkiag,suresh}@ee.iisc.ernet.in Abstract. This paper addresses the problem of resolving ambiguities in frequently confused online Tamil character pairs by employing script specific algorithms as a post classification step. Robust structural cues and temporal in- formation of the preprocessed character are extensively utilized in the design of these algorithms. The methods are quite robust in automatically extracting the discriminative sub-strokes of confused characters for further analysis. Ex- perimental validation on the IWFHR Database indicates error rates of less than 3 % for the confused characters. Thus, these post processing steps have a good potential to improve the performance of online Tamil handwritten character recognition. Keywords: Confusion Pairs, Sub stroke Extraction and analysis, Fourier De- scriptors, Online handwritten character recognition. 1 Introduction Tamil is a popular classical language spoken by a significant population in South East Asian countries. There are 156 distinct symbols in Tamil [1]. As far as earlier work on recognition of online Tamil characters is concerned, Deepu et al. [2] generate class specific subspaces using principal component analysis, while Niranjan et al. [1] have employed dynamic time warping for matching unequal length feature sequences. Hidden Markov models for recognition have also been reported in [3] [4]. In a recent work, we have studied the performance of the 2DPCA Algorithm [5], which was originally proposed for face recognition. Each of the above schemes is found to give nearly similar generalization perform- ances on a given test data. Most of the misclassifications of the given data, in general, are attributed to the fact that Tamil has many symbols that look visually similar. Any classifier that works on features at a global level fails to capture finer nuances that make these symbols distinct. One way to circumvent this drawback would be to in- corporate a post processing step that employs local features to reduce the degree of confusion between frequently confused characters, and thereby improves the overall performance of the recognition. Specifically, this paper proposes algorithms for dis- ambiguating frequently confused symbols. The approaches are developed, taking into account, the popular writing / lexemic styles of modern Tamil script. They can be applied irrespective of the nature of the classifier used for the recognition.